Previous: J. Watlington: Thesis Proposal Up: J. Watlington: Thesis Proposal Next: Problem Description

Introduction

While the ability of personal computers to acquire, process, and present video and sound has now been established, the computational requirements of many media applications exceed that provided by a single general purpose processor. My thesis is that streams are a mechanism for enabling efficient dynamic parallelization of the computational tasks typically found in media processing. I am also proposing a programming model for media processing using this mechanism. The model is a variant of hybrid dataflow, utilizing multidimensional streams as both a basic data type and a mechanism for synchronization and obtaining parallelism. It supports machine architectures containing a heterogenous mix of processors.

In order to provide higher compression, greater flexibility, and more semantic description of scene content, video is increasingly moving toward representations in which the data are segmented not into arbitrary fixed and regular patterns, but rather into objects or regions determined by scene-understanding algorithms [18][23][30][7][6]. These structured (or object-based) representations are effectively sets of objects and ``scripts'' describing how to render output images from the objects. The media being presented is generated at the receiver, not merely decoded, allowing the presentation to adapt to receiver capabilities, viewing situation, and user preferences.

Custom processors operating in parallel and using hardwired communications networks are capable of meeting the computational demands of media processing for a given algorithm/application, yet the flexibility needed to support different algorithms (and thus object-based media) is difficult to provide with these architectures. Single ``general-purpose'' processors, now often with specialized instructions/datapaths for manipulating small data elements in parallel (e.g. using a 32-bit ALU to process 8-bit R,G,B pixel values simultaneously [28][20]), provide adequate flexibility and are showing promise of meeting the needs of the current generation of media applications. Yet algorithms and applications which require tens to thousands of times more computation and memory bandwidth than current applications are being developed. Programmable parallel architectures will remain attractive for all but the cheapest or most limited media applications, since these requirements of media processing greatly exceed the capabilities of single processors.

Irrespective of the coding method, computational needs for video are likely to increase greatly in coming years. Digital video, unlike digital audio, is far from operating at human perceptual limits. As display technologies and communications bandwidth permit, higher definition systems will add to the computational demands. Alternative output technologies, for example the holographic video displays developed at the MIT Media Laboratory [27][32], push these demands still further.

The programming model described in this thesis is an attempt to support computationally demanding media tasks in an environment in which the programmer can take advantage of parallelism and specify real-time performance without needing to know details of the hardware architecture(s) used to execute the tasks. Thus differently scaled or architected systems should be able to execute the same application software, and specialized processors may be utilized without the explicit cooperation of the application developer. I am proposing systems that utilize whatever resources are available currently in the local area -- the system attempts to dynamically execute an application on any processing nodes found idle nearby. Consider, for example, a VRML viewer on a personal computer borrowing cycles from the rendering engine of a video game in the next room, or several PCs working together to achieve real-time media encoding.

This model inherits from dataflow the attributes of programmer transparent parallelism and resource sharing. It also allows the efficiency of the static pipeline model -- processing units may be directly coupled to minimize (off-chip) memory accesses. It extends current hybrid dataflow architectures by supporting data primitives which are multidimensional streams of scalars or vectors. These are partitioned at runtime to provide an appropriate scheduling granularity, minimizing the overhead associated with each partition.

In order to evaluate this thesis, I am proposing to develop a software system for testing this programming model. This implementation, MagicEight, will utilize traditional workstations interconnected by a local area network and serve several purposes. It will allow research into the resource requirements (amount of processing overhead induced by the stream mechanism, stream buffer size, etc.) of such a system. It will also allow the development and testing of applications without the specialized hardware (or multiple processors) needed for real-time performance. While the software being written is intended to ultimately support specialized processors and native implementations (where MagicEight is the operating system), these are not being included in the scope of the proposed dissertation research.

In order to test the system appropriately, I am planning to implement a video segmentation application based on the multimodal segmentation research done within our group [11]. This application uses both a large number of low-level vision operations and medium-level operations which track the objects and classify pixels to corresponding objects. This is an algorithm which currently runs sixty to one thousand times slower than real time on our fastest Alpha workstation, working with small (320x240) images, providing ample complexity for testing the MagicEight model.

The stream mechanism and MagicEight programming model are proposed by this thesis in order to solve a computational problem posed by processing media (in particular, video). They were designed to take advantage of the characteristics of media processing (the large amount of data and the data independence of typical media data access patterns), and rely on them for efficient execution. While it is hoped that the programming model performs well enough for effective general purpose computation, it is predicted that the data parallelism provided by media processing (and similar applications) will be necessary for a significant performance gain.

Section 2 describes the problems addressed by this thesis in more detail, along with previous work addressing them. A better description of the thesis may be found in Section 3, along with a description of the example implementation (MagicEight). Finally, a rough estimate of the resources required and an estimated timeline is provided in Section 4.

Previous: J. Watlington: Thesis Proposal Up: J. Watlington: Thesis Proposal Next: Problem Description

wad@media.mit.edu