There are a number of common functions that operate in a data locality which is changed only incrementally between invocations. If the function implementation has the data machine store locally input values which will be reused, and only requires that new data (not previously used) be input, the function implementation is pipelined5. Pipelining is very attractive when using specialized stream processors, where local state is relatively cheap and memory bandwidth is expensive; but may also be used to support software optimizations such as loop unrolling.
The size of the overlap along each dimension between function invocations is indicated by the absolute magnitude of the extent minus the absolute magnitude of the step. If the overlap is non-zero, the control machine will attempt to pipeline the task, thereby amortizing the cost of transferring the overlap over as many applications as possible. Pipelining is possible when the function has an implementation which supports it, and it can be scheduled to a data machine that has sufficient local storage to store the input and intermediate values that it will reuse. When partitioning a demand, the runtime system should take into account the amount of local memory available on a data machine, and partition it so as to allow the use of pipelined function implementations if possible.
Both software and hardware pipelining are supported by Q. If a function implementation is marked as supporting pipelining, it will be executed once to generate a stream fragment, instead of being executed once for each output access pattern in the stream fragment. The stream fragment may only span dimensions which the function has explicitly declared (bound dimensions). It is the responsibility of the function to apply itself as necessary to generate all the locations specified in its output stream fragment(s).