previous up next index
Previous: Resource Request Routing Up: The MagicEight Virtual Machine Next: Instructions

Streams

 

Streams are a natural means of describing data which changes along any dimension. In its simplest manifestation, a stream may represent a variable which is modified over time. Each entry along a dimension represents a change in the value of the variable. A multidimensional stream is essentially a multidimensional array, possibly sparsely populated or unbounded in size along one or more dimensions.

Streams are supported as a means of specifying the data interconnection between functional tasks. They explicitly identify the data parallelisms available in the application of a task to a large set of data, and also represent (possibly elastic) delay elements for temporary storage of intermediate results.

The terms used in this paper to describe a stream (such as offset, extent, step and order) are defined in an earlier paper on Streams [WB95] A particular stream (or subsection thereof) is described using the following parameters :

dimensions
Each stream is defined for some number (possibly zero) of dimensions. There is no change in the stream along undefined dimensions.

offset
Establishes the absolute location in the stream of the lesser bound of this stream fragment along each defined dimension.

extent
This is the maximum range of the stream fragment along each defined dimension. This may be infinite.

If the stream is being written or read to a linearly addressed or accessed storage medium, then additional parameters are used. Collapsing a multidimensional array down to a sequence of values conceptually involves a set of nested, bounded, integrally-incremented array index counters. We describe the behavior of these virtual counters with two more integer parameters per dimension.

step
The step parameters give the counter increment in each dimension. There are several ``special'' values that a step parameter may have for a particular dimension. One is zero, which indicates that the same values are to be replicated along that dimension. The value of the extent along any dimension of the stream (denoted by tex2html_wrap_inline257 ) may also be specified as a step parameter, allowing the multidimensional array to be stored in a ``packed'' format.

order
This is the order of counter nesting (i.e., do we scan horizontally and then vertically, or vice versa.)

As there is no limit on the number of dimensions which may be defined in a program, it is safe to say that every stream is embedded in a higher dimensional space. When a task has one or more streams as arguments, it is applied to a particular fragment, or subsection, of the stream. If the task only operates over a lower dimensional or smaller space than the fragment, it will be applied multiple times over subspaces of the fragment. This allows synchronization and scheduling costs to be amortized over multiple task invocations.

Pipelining

There are a number of common operations (such as convolution, or motion estimation) that operate in a fragment which is changed only incrementally between task applications. Using the above access pattern definition, this neighborhood is indicated independently for each dimension, and is defined as the sample extent times the absolute value of the sample step -- if the step is non-zero -- and one otherwise. The size of the ``sample overlap'' along each dimension is equal to the sample neighborhood minus the absolute magnitude of the block step.

If the sample overlap is non-zero, the resource manager will attempt to pipeline the stream operation, thereby amortizing the cost of transferring the overlap over as many blocks as feasible. Pipelining is possible when the function can be executed on a processor that has sufficient local storage to store stream values that it will need again. In the one-dimensional case, there must be sample_extent stream values of memory. In the N-dimensional case, there must be enough memory to store the multiple of the block extent times the sample extent for each of the N-1 lower dimensions (as indicated by the block order). When it is desirable to pipeline the processing of data extents larger than the local storage provided by the processing units in a system, the stream may be automatically partitioned into multiple smaller streams in order to avoid overflowing the processor local memory, in a manner identical to that used for exploiting data parallelism.

Synchronization

Synchronization, or locking, is effectively performed on each individual value of each stream. For efficiency's sake, however, the locking is almost always implemented on larger blocks of data. The size of a synchronization data block is determined by the resource manager, using both the stream access pattern, amount of pipelining, and complexity of the input task and output task(s) of a stream.

A good example of the above would be the estimation of image motion between two streams of video. Pipelining is very effective in reducing data fetches for the ``reference'' stream. The resource manager would therefore attempt to fragment the stream for scheduling and synchronization maintaining maximum extent in either the vertical or horizontal dimension (depending on the number of available processors).

Stream Creation & Destruction

The user application is responsible for declaring a stream -- in the initial program environment or using the CREATE-STREAM memory primitive -- before posting any tasks using it. What is really created at this stage is the stream directory structure. This structure, which is incompletely replicated across all the processors accessing a stream, contains an entry for each stream fragment in local memory, and information on which processor to query for more data in a particular dimensional extreme (i.e., for stream fragments with an X dimension position lower than C, ask node D.) The resource manager is responsible for allocating the actual data objects for a stream. Sufficient memory for the stream is allocated at stream creation, but this allocation may be increased during execution if necessary. Deallocation of the stream buffers is handled automatically as for any process data object.


previous up next index
Previous: Resource Request Routing Up: The MagicEight Virtual Machine Next: Instructions

magiceight-web@media.mit.edu