Atomizer Pattern

from Riehle et al (PLoP'96)

Synopsis

Read arbitrarily complex object structures from and write them to varying data structure-based backends. Efficiently store and retrieve objects from different backends, such as flat files, relational databases, and RPC buffers.

Context

You want to copy, print, or store an arbitrarily complex object structure.

Forces

Solution

Objects load or store themselves using an Atomizer object with a stylized interface. The Atomizer converts between the internal format and the backend. The object decides containment vs. acquaintance vs. temporary and the Atomizer handles the rest.

An object is represented as a class ID and a sequence of fields. Each field contains a value and an optional name. Omitting names entails less storage but also less flexibility. Values are either

  1. Primitives, like integers, floats, and strings.
  2. Embedded objects, i.e. nested sequences of fields.
  3. Object references, for pointing to external objects. They can be global names or embedded Proxy objects.
  4. Object tokens, for referring to previously stored objects.

The Atomizer supports methods for writing primitives, embedded objects, and object references. The object calls these to serialize itself. To write an embedded object, the Atomizer tells the embedded object to serialize itself using the Atomizer, i.e. it is a recursive call. Eventually, every object in the structure will be represented using only primitives. The object can choose to omit temporaries, or the Atomizer can have a method for writing a temporary to be stored at the Atomizer's whim.

To resurrect a stored object, the class is instantiated from its ID, then the object initializes itself using the Atomizer. The Atomizer only supports reading primitives and reading objects. The rest is handled automatically. Fields can be read in order or by name, depending on how the object was stored.

To handle both sharing and circular references, the Atomizer keeps a table of objects which have already been written. If an object is to be written a second time, a token is written instead, referring back to the first occurrence. On reading, the Atomizer automatically substitutes the previously-read object for the token.

If the eventual reader has access to the class definition, then the object's methods can be omitted. Otherwise, the methods must be sent as fields, using code as the value. See Implementation for more on this issue.

Consequences

Implementation

For more information:

Known Uses


Thomas Minka
Last modified: Fri Sep 02 17:10:03 GMT 2005