Streams Pattern

(This material is not based on the material in PLoP'94.)

Also Known As

Pipes and Filters, Dataflow

Synopsis

Process a stream of information by feeding it through a network of independent and reusable processing units.

Context

You want to apply a series of operations to one or more streams of data. For example, filtering an image, analyzing an oscilloscope signal, summarizing a movie, or formatting simple text files.

Forces

Solution

Give data sources and data operators the same interface. Use the Decorator pattern to construct a dataflow network where operators decorate the sources of their data.

Each data stream is indexed by time, which may be multidimensional. Computation is initiated by calling the Get(time_range) method on a node of the network. If the node is a data source, this instructs it to return the block of data corresponding to the given time range. If the node is an operator, this instructs it to fetch data from its source and return the processed output corresponding to the given time range.

This is the "pull" solution. See below for the "push" solution.

Consequences

Implementation

Some of the optimizations used in the Interpreter pattern are automatic with Streams. For example, dependencies are explicitly reflected in the structure of the network. All nodes can run in parallel (though perhaps on different time slices). Dead values are never computed by lazy nodes since they are never asked for. Effect caching and partial evaluation can still be done, however. For example, when an identical block of data is input to a node, the output will also be identical. Unfortunately, such matches seem rare in applications.

Alternative solution

A different solution to the Streams problem is sometimes used, where what comes out of the network is simply determined by what goes in. This "push" solution is useful for interactive visualization. It is also used in dataflow processors.

Instead of using Decorators, each node points to the nodes which follow it in the network. Operators fetch data from their input queue, process it, and send the result off to the input queues of the nodes which need it. This approach relies on eager computation and therefore is not appropriate for some applications, such as accessing a database. For general-purpose programming, though, it is preferred since it allows loops and procedure calls, i.e. using the same sub-network in multiple places.

Known Uses


Thomas Minka
Last modified: Fri Sep 02 17:23:11 GMT 2005