previous up next index
Previous: MagicEight Program Elements Up: My Thesis Next: Eduction

Stream Mechanism

The notion of a ``stream'' of data has pervaded computing since at least the 1960s gif. The search for programming languages that effectively express the parallelism inherent in an algorithm has also driven the development of a stream mechanism [12][2][19]. The stream definition typically used is an ordered sequence of scalar values. Although streams of compound objects have sometimes been supported (in particular, a stream of streams), these streams are in general one-dimensional.

Probably the most successful example -- the UNIX operating system -- uses one-dimensional character streams at both the user and programmer interfaces. The ANSI C standard programming model derived from it defines only a stream interface for data I/O. The reason given for this decision is that, for performance reasons, most programs that use a raw I/O interface implement internally the equivalent of a stream interface (with its implied buffering)[25].

Previous stream mechanisms have commonly been utilized by a system implementation in one of two distinct manners: in many cases they are used only for purposes of algorithm specification[12][3][13]. Once the algorithm is translated for execution, the data is treated as an ordered sequence of individual data objects. In this dataflow model, ``tokens'' consisting of single instructions along with a reference to data values to which they should be applied are generated for each array element in the stream. Another approach, supporting a stream primitive data object, allows efficiencies in data storage not possible in the pure dataflow model [10][24][15][5]. The former approach provides the maximal parallelism, but incurs a high scheduling overhead and is usually proposed for execution only on fine-grained dataflow architectures. The latter approach is amenable to a coarser granularity of processing, which allows the amortization of the scheduling overhead over a number of stream data elements.

Streams have also been proposed before as a means of overcoming memory latency [14]. If a memory access pattern is predetermined, ie. not dependent on the data being processed, merely on the algorithm, then the processor performance may be made independent of the memory access latency by proper data pre-fetching. Hardware prefetching in general only supports simple (linear) access patterns, whereas software prefetching incurs additional instruction stream bandwidth [9]. In either case, the explicit identification of a function's multidimensional data access pattern provided by the stream mechanism should allow more accurate data prefetching.

In MagicEight I propose the novel mechanism of using multidimensional streams as a primitive data type supported by the execution model and the operating system. It differs from previous examples in that it defines the intensional mapping between the multidimensional array and the one-dimensional stream sequence (the access pattern) to be used at the source and destination of each stream [29]. The properties of those mappings are then used to determine implementation requirements such as minimum buffer sizes and available data parallelism.

A stream is read or written by traversing through a multidimensional array of vectors in memory in a regular manner. Figure 1 illustrates a two-dimensional array of three-dimensional vectors (which could represent a color image), and three stream generated from it using a variety of simple access patterns.

 

Fig. 1: A vector array in memory and streams generated from it

Instead of devoting a large amount of silicon area to cache memory, and attempting to predict the memory access pattern at runtime, MagicEight proposes decoupling memory accesses from data processing using the stream mechanism. Most image processing data access patterns are pre-determined (data independent), allowing the separation of data address generation and fetch from processing. This provides a processing throughput that is independent of the latency of remote memory accesses. The amount of hardware support for streams may vary from machine to machine within a system. The ``simplest'' support for streams is an explicit (software) cache prefetch/clear mechanism on a general purpose processor. More complex support includes separate address generators capable of transferring data to/from specialized processors independently [8][26].


previous up next index
Previous: MagicEight Program Elements Up: My Thesis Next: Eduction

wad@media.mit.edu