previous up next
Previous: Graphics Architectures Up: Video & Graphics Processors: 1997 Next: Commentary

Structured Video Architectures

Microsoft has proposed a reference architecture for graphics and multimedia, named Talisman [29] [30], which is intended to provide a high level of performance with a minimum of memory and hardware. Talisman is based on four concepts:

The Talisman architecture is justified by its authors as supporting incremental, high quality rendering at a minimal system cost. What isn't touted is the ability to effortlessly integrate rendered and real images in the system. The composited image layers and multi-pass rendering are typical of a structured video system -- Talisman implements the processing pipeline for compressed structured video proposed by Bove, et al [32].

Escalante

Part of the Talisman architecture proposal is a reference implementation, Escalante (code-named Touchstone)[30] [29] [33], aimed at the high end of the consumer PC market. This PCI-based implementation consists of four major functional blocks (each in a separate package). One or two Rambus DRAMs (8-16 Mbits) are used, incorporated into a single functional block, to meet all system memory needs.

The Media Processor

The first functional block in the system is a programmable ``Media DSP'', responsible for video codecs, audio processing, and graphics front-end processing (geometry transformations and lighting processing.) Any of the Video Signal Processors introduced above is capable of filling this position. The reference design suggest either the Samsung MSP or the Philips TriMedia processor as suitable examples.

The Polygon Object Processor

The second functional block in the system is a Polygon Object Processor (POP), responsible for rendering the transformed polygons passed to it by the Media processor. The polygon rendering pipeline supports texture mapping (with anisotropic texture filtering), anti-aliasing, and z-buffered hidden surface removal. It renders a single 32x32 pixel virtual buffer at a time, which are then compressed and stored in system memory for later fetch by the Image Layer Compositor. The POP is fabricated in a 0.35 tex2html_wrap_inline569 m 4 level metal CMOS process, and packaged in a 304 pin QFP. A summary of the area consumed by the different processing blocks is provided in Table 7.

In order to reduce memory costs, Talisman consolidates the different large memory buffers in the system into a single external memory subsystem. This subsystem, which uses dual Rambus channels to provide a peak memory bandwidth in excess of 10 Gb/s, is integrated into the POP. It is thus conveniently located between the two other blocks in Talisman that require external memory: the Media Processor and the Image Layer Compositor.

 

Unit RAM (bits) Area (M lambda squared) Unit RAM (bits) Area (M lambda squared)
Rambus I/F 169 Memory I/F 12K 58
Clip & Scan Convert 57K 764 Decompression 16K 195
Texture Addr. 290 Texture Cache 71K 356
Compositor 137K 654 Compression 32K 477
Testability 215 Routing 318
I/O Pads 708
Total 325K 4,200
Table 7: Escalante Polygon Object Processor Block Areas

 

The Image Layer Compositor

The Image Layer Compositor (ILC) is responsible for fetching image layers from memory, decompressing them, bilinearly filtering them (if necessary), then outputting them in depth order to the compositor in order to generate a strip of the video output. It is implemented using a 0.35 tex2html_wrap_inline569 m four level metal 3.3V CMOS process, and packaged in a 304 pin QFP. The maximum filtering and compositing throughput is 320 Mpixels/sec.

 

Unit RAM (bits) Area (M lambda squared) Unit RAM (bits) Area (M lambda squared)
Layer Prefetch 4K 220 Decompressor 25K 685
Image "Cache" 71K 600 Filtering 134
Composite Ctl. 85 Testability 215
Routing 152 I/O Pads 555
Total 100K 2,640
Table 8: Escalante Image Layer Compositor Block Areas

 

In order to handle the latency involved in fetching and decompressing an image layer, two traverses of the display list are made. The first fetches objects from memory and decompresses it into a 64 Kbit image ``cache'' (really just a temporary storage buffer, since nothing ever gets re-used) The second traversal of the display list results in the bilinear interpolation of the ``cached'' decompressed image data and it's writing to an alpha-blending compositor (located on another chip !) As shown in Table 8, the decompressor and staging RAM occupy the majority of the chip. The image filtering and composite control (including the second display list traversal) occupy only 8% of the silicon.

Compositing DAC

The final stage of the ILC, the alpha-blending, is actually located on a separate die, probably due to area constraints. A video DAC is incorporated to lighten the package count. The ILC passes (in reverse depth order) four 32b (RGB plus Alpha) pixels at a time to the compositing DAC. The alpha-blended compositing is performed into a double buffered 32 scan-line buffer, using a single 8b 32 scan-line alpha buffer. Thus the compositing DAC contains some simple arithmetic logic and 1,792 bits of memory per scan line pixel. For the Escalante target resolution (1344 hor.), the compositor requires 2.5 Mbits of buffer memory.

Other Talisman Implementations

Other implementations of Talisman are expected. At the extreme low end, the entire architecture may be implemented as software running on a conventional microprocessor, taking advantage of virtual buffers to improve data cache performance. The specialized Image Layer Composition hardware may be replaced by a conventional frame buffer, into which the alpha-blended, composited strips of video data are stored.

While Video Signal Processors are explicitly included into the Escalante reference design (as the Media Processor), a complete implementation could consist solely of the VSP and memory. Actually, Talisman could be implemented on any system described in this survey. Even the Infinite Reality contains a method of feeding the output of renderer back to it's input, critical to a layered approach. Newer VSPs which include co-processors capable of implementing parts of the Talisman architecture are especially suitable. The Chromatics Mpact2, for example, has an Image Co-processor analogous to the Image Layer Compositor.


previous up next
Previous: Graphics Architectures Up: Video & Graphics Processors: 1997 Next: Commentary

wad@media.mit.edu