Atomizer Pattern
from Riehle et al
(PLoP'96)
Synopsis
Read arbitrarily complex object structures from and write them to varying
data structure-based backends. Efficiently store and retrieve objects from
different backends, such as flat files, relational databases, and RPC
buffers.
Context
You want to copy, print, or store an arbitrarily complex object structure.
Forces
-
In general, contained objects should be contained in the output,
acquaintances should be acquaintances of the output, and temporaries, such
as caches, should not appear in the output.
-
Objects may be shared in the structure, i.e. pointed to multiple times.
Sharing should be preserved in the output.
-
The structure may contain loops, e.g. self-reference, which should also be
preserved.
-
The same code should work for multiple backends: memory copies, the screen,
files, databases, RPC buffers, etc.
Solution
Objects load or store themselves using an Atomizer object with a stylized
interface. The Atomizer converts between the internal format and the
backend. The object decides containment vs. acquaintance vs. temporary and
the Atomizer handles the rest.
An object is represented as a class ID and a sequence of fields. Each
field contains a value and an optional name. Omitting names entails less
storage but also less flexibility. Values are either
- Primitives, like integers, floats, and strings.
- Embedded objects, i.e. nested sequences of fields.
- Object references, for pointing to external objects.
They can be global names or embedded Proxy objects.
- Object tokens, for referring to previously stored objects.
The Atomizer supports methods for writing primitives, embedded objects, and
object references. The object calls these to serialize itself. To write
an embedded object, the Atomizer tells the embedded object to serialize
itself using the Atomizer, i.e. it is a recursive call. Eventually, every
object in the structure will be represented using only primitives. The
object can choose to omit temporaries, or the Atomizer can have a method
for writing a temporary to be stored at the Atomizer's whim.
To resurrect a stored object, the class is instantiated from its ID, then
the object initializes itself using the Atomizer. The Atomizer only
supports reading primitives and reading objects. The rest is handled
automatically. Fields can be read in order or by name, depending on how
the object was stored.
To handle both sharing and circular references, the Atomizer keeps a table
of objects which have already been written. If an object is to be written
a second time, a token is written instead, referring back to the first
occurrence. On reading, the Atomizer automatically substitutes the
previously-read object for the token.
If the eventual reader has access to the class definition, then the
object's methods can be omitted. Otherwise, the methods must be sent as
fields, using code as the value. See Implementation for more on this
issue.
Consequences
-
Since the Atomizer's interface is abstract, it is easy to add a new backend
(representation format).
-
The Atomizer automatically handles sharing and circular references.
However, it does not automatically determine which fields
denote containment.
-
Storing acquaintances requires a persistent naming mechanism, such as that
provided by CORBA.
-
For copying, the Atomizer can simply store to a memory buffer, from which
another object reads. In this case, it might be advantageous to include
temporaries, since the communication is cheap.
-
The Atomizer can be used for reflection, e.g. a field inspector or object
structure browser. Fields should be named in this case. Temporaries
should probably be preserved.
-
System resources, e.g. file handles, held by the stored object may need to
be re-acquired when the object is resurrected.
-
The Atomizer pattern only makes a physical copy of the object. A
shared memory mechanism must be added for actually migrating the object.
Implementation
-
Object tokens can simply be integers, counting back over the number of
fields written since the first occurrence (relative indexing), or they can
count from the beginning of the record (absolute indexing).
-
Using named fields allows partial representation of the object, e.g.
a change notification or an incremental backup. The receiver can read just
the named fields.
-
The Atomizer can utilize hints about the serialized object's destination.
For example, if the object is being copied to somewhere in the same process,
local names (pointers) can be used instead of global names.
-
Storing methods is difficult in C++. You can use machine code, but this
isn't portable. You can use a library name, but this requires a shared
filesystem. You can use source code, but this requires compilation on
resurrection. The best solution is probably to send intermediate-level
code, e.g. Python or Java bytecode.
-
The Atomizer's methods for reading and writing can be combined.
For example, the object could simply call
DoInteger
on the
Atomizer with a pointer to the integer variable to be read or written.
The Atomizer can read from the variable or write to the variable, depending
on what type of Atomizer it is.
-
Another way to implement reading is for all objects to support the
Atomizer's writing interface. Thus the stored data is written to the
memory object the same way the memory object writes to storage. This
integrates the Atomizer pattern with the Builder pattern from Design
Patterns. Note that copying can now be done without an intermediate
buffer at all. This technique is particularly effective with named fields.
For more information:
Known Uses
-
ET++ uses the Atomizer pattern to copy, store, and transmit
object data to other applications. Shared libraries are
used for the object methods.
-
The CORBA externalization service uses an Atomizer for object persistence.
There is no current support for storing methods, though the Java language
has been proposed for it.
The CORBA relationship service is used to automatically determine containment
vs. acquaintance.
-
JavaBeans uses Java's reflective abilities to provide automatic
serialization for object data and methods. Data fields can be tagged as
temporary via the
transient
keyword in the class declaration.
However, there is no way to denote acquaintance except by writing a custom
serialization routine.
-
XDR, used in Sun RPC, uses the Atomizer pattern to serialize arguments to
remote procedures. It uses the combined-method trick.
Thomas Minka
Last modified: Fri Sep 02 17:10:03 GMT 2005