Next: Video Finger: Data Up: Synthetic Movies Previous: Video Finger: An

Video Finger: Software Description

The software development environment used for the development of Video Finger was the Macintosh Programmer's Workshop (MPW). The majority of the software was written in MPW C. Some hardware dependent or speed restricting parts of the drawing routines were written in 68020 assembly language. A public domain library of routines, TransSkel [DuBois89], which abstracts out the interface to the Macintosh operating system, was used to simplify program development. The User Interface Toolbox of the Macintosh, along with the Operating System Utilities, provided support for windowing, mouse actions, menus, and user dialogs. This allowed the program to easily support those features, while maintaining a user interface consistent with other Macintosh applications.

Basic Software Overview

The Video Finger software system is diagrammed in Fig. 5.1. The Task Dispatcher executes any tasks that are being performed, then calls each object to redraw itself. One of the tasks being performed is the User Task, which polls the state of the computer environment via the local area network, modifying the movie generated to match. Each type of object has a software routine (an object handler ) associated with it that is called to manipulate objects of that type. A common nucleus of drawing and CLUT management routines are provided for use by all the objects. Parameters of the Video Finger display, such as what background image is being used, or the update rate of the display, are changeable from user menus.

Figure 5.1: Video Finger Software Overview

Object Handlers

Every instance of a given type of object executes the same object handler routine. The handler routine is called with a pointer to the data structure defining the object, a command, and a pointer to optional arguments. All object handlers recognize a small set of basic commands: open, close, and draw. The open command opens an object, loading in necessary view and task information from local storage if needed. The close command closes the object, freeing the associated memory, and the draw command calls the proper drawing routines to draw the current object view into the frame buffer.

Optional commands supported by most object handlers are: position, view, and request-task. The position command allows objects to be positioned either absolutely or relatively. The view command request a new view of the object, specified using some characteristic of the view, at a certain scale. The request-task command queries an object to obtain the description of a particular task for that object.

The number of object types is quite small. Only one type is used to represent all the human figures in Video Finger. The other types include background objects, prop objects, and test objects.

Task Dispatcher

The task dispatcher is called whenever there are no system events that require handling. These events are generated by user input, or operating system actions (such as window refreshes). The dispatcher first traverses a linked list of tasks currently being executed, executing each one in turn. Most of these tasks are interpreted by the Basic Task Language (BTL) interpreter. Next, it traverses an ordered linked list of objects in the scene, calling each object with the draw command. The list is ordered such that the objects farther back in the scene are drawn first, and overlayed by later (ie. closer) objects. After all the objects have finished drawing, the task dispatcher sets a semaphore signaling that drawing has finished on the new frame.

Drawing and Color Routines

Common drawing routines provide flicker free generation of the Video Finger window. The window is maintained by the Macintosh Window Manager, which interfaces with the drawing routines. The common drawing routines transparently provide double buffering, as well as decoding of the transparency run-length encoded images as they are drawn. Since the IranScan video card architecture includes a second frame buffer which may be re-positioned without affecting the first frame buffer, double buffering is done in the actual frame memory.

The color map is partitioned and maintained by the color manager. Whenever an object is opened, the object handler requests the needed number of slots from the color manager. The color manager allocates and reserves the requested slots, returning the index of the first slot allocated. The object is then responsible for calling the color manager to load the slots allocated when required. If more than one CLUT is used by the object views, then the object handler is responsible for calling the color manager to update the color map as needed. The small size of the typical object CLUT allows sixteen objects to share the IranScan color map. Since the amount of local memory available usually limits the number of object displayed to less than sixteen, the color manager does not currently degrade gracefully upon running out of free slots.

A routine is installed in the Vertical Blanking (VBL) Queue for the IranScan card which refreshes the display buffers. When called, this routine checks to see that a semaphore indicating the availability of a new frame has been set by the Task Dispatcher. If it has, it repositions the display base address of the ScanRam to display the drawing buffer and hide the previous display buffer. Routines in the VBL Queue for a video card are called by the operating system upon receiving a hardware interrupt from the video card signaling a vertical blanking period. The number of video frames between calls to the VBL routine may be controlled from one of the user menus. It functions as a Fast Play/Slow Play control.

The internal representation of the object views is significant for several reasons. Both the drawing time and the memory required for the view are dependent upon the representation used. In addition, the time required to manipulate the view is dependent upon the representation.

The statistics of the image view data (around 50% transparent pixels) imply that run-length encoding the data would provide a significant gain. Due to the large number of colors allowed each object, however, the non-transparent parts of the image were not amenable to run-length encoding. The final compromise was to run-length encode the transparent portions of the image, allowing the drawing code to draw any length of transparency in a very small, constant time. The non-transparent portions are simply packed into pixels, adjusted for the actual location of the view's CLUT, and transferred to frame memory. The data was actually stored in a non-encoded format on disk and encoded upon being read into local memory. This allowed easy testing of different encoding techniques.

Basic Task Language Interpreter

The basic unit of object motion, the ``task'', is defined using a very simple interpreted language, the Basic Task Language (BTL). The task description is obtained by calling the appropriate object handler, and then interpreted using the BTL Interpreter. A command ( btlFrame ) is provided to mark discrete frame boundaries. Each frame, the BTL Interpreter sequentially executes the commands in the task description until a frame marker is reached.

The Basic Task Language consists of a stream of commands with optional arguments. In order to facilitate interpretation, commands are of fixed size (4 Bytes), as are the optional arguments. A list of the BTL commands with a brief description of each is provided is Table 5.1.

The BTL commands are the minimum required to describe an object and its motion in the 2 1/2 D world of Video Finger. The btlDepth command adjusts not only the depth, but the perceived scale of the object. The btlSignal command provides a simple way of synchronizing the end of one task with the start of another. When a task is executed, a pointer to a signal handling routine is provided to the BTL Interpreter. If the task contains a btlSignal command, the signal handler is called with the command's argument. In addition, the end of a task is automatically signalled to the signal handling routine by the interpreter.

btlFrame: Signals the end of the commands for the current frame.
btlView: Indicates which object view should be displayed. Requires one argument indicating the resource number of the object view.
btlPosition: Indicates a relative movement of the object. Requires two arguments indicating the horizontal and vertical offsets.
btlDepth: Indicates a relative depth movement. Requires one argument indicating the depth offset.
btlSignal: Provides a mechanism for communicating between tasks. When a task is executed, a signal handler is specified. This signal handler is passed the value of the one argument.

Table 5.1: Basic Task Language Commands

View Scaling

One of the restrictions imposed by using object views as the object representation is that the views have a particular scale. One of the more useful object manipulations that can be implemented is the scaling of these object views.

Each view in the object descriptions has a scale specified in an accompanying header. When the object view is being drawn, the object handler compares the scale with which the object is currently being drawn with the scale of the view to be drawn. If they are different, an interpolated view is generated and used for display.

A major obstacle to interpolating the image views is their internal representation. Currently the views are stored as transparency run-length encoded color quantized images. The view image data being interpolated is reconstructed using the image's CLUT to RGB values. After interpolation, the resulting RGB value is mapped back into the image's color table using an inverse color lookup table.

The RGB value being mapped is first hashed by taking the most significant 5 bits from each channel and using them as an index into a 32 KByte inverse color table. The number of bits per channel used in the hashing is important in obtaining acceptable image quality. Anything smaller than 5 bits/channel appears very quantized, yet larger inverse tables (6 or 7 bits/channels) consume too much memory (256 KBytes or 2 MBytes respectively).

The inverse color table is generated using a full search algorithm to find the most accurate rendition of a particular region of color space among the colors in the object view CLUT. The inverse tables are currently stored as part of the object description to save time when initializing the object. Alternatively, they could be generated from the object's color lookup table.

The interpolation algorithm used allows a rapid scaling of the image by an arbitrary quantity. A depiction of the interpolation algorithm is found in Fig. 5.2. A set of intermediate values is first calculated, using a linear interpolation between two adjacent scan lines of the original image:

vertical interpolation eqns

In these equations M is the array of intermediate values, O is the array of original pixels, and N is the output array of interpolated pixels.

Graphical view of interpolation
Figure 5.2: Pixel Interpolation

The final output is then calculated by linearly interpolating horizontally between the two closest intermediate values:

horizontal interpolation eqns

A and C are computed incrementally, by accumulating the vertical and horizontal spacing, respectively, of the new sampling grid N in the original image O. Only the two intermediate values required for calculating the current pixel are stored. All calculations are done in fixed point math, using 16 bits of integer and 16 bits of fraction.

A special interpolation routine decodes the run length encoding of the image's transparent portions while interpolating the view. In addition, the output it produces is itself transparency run-length encoded. The interpolation is performed independently on all three color components of the image: R, G, and B. A typical pixel interpolation requires 12 multiplies and 14 additions per interpolated pixel. If the scaling factor is less than 0.5, only 9 multiplies and 11 additions per interpolated pixel are required. Unfortunately, even using this simple algorithm, the time required to interpolate a normally sized object view using a Macintosh IIx is around 0.8 sec.

View Caching

Since the interpolation algorithm is processor intensive, the interpolated views are cached. Every time an object handler decides that interpolation is necessary, it first checks to see if the desired view (in the desired scale) is present in the View Cache. If not, it caches the desired view in the cache after interpolating it.

The View Cache is organized as a fully associative cache. It is typically 1 MByte in size, divided into 8 KByte pages. Both these values were obtained by empirical methods. The cache replacement algorithm utilized is a Least Recently Used algorithm. This is implemented by maintaining for each cache entry a count of cache fetches since its last usage. When more pages are needed in the cache, the entry with the largest count is removed.

Without the view cache, the time required for interpolation limits the usefulness of view scaling. In order to fully utilize the view cache, however, a pre-fetching routine should be utilized. This routine would anticipate the use of an object view that required scaling and utilize idle processor cycles to interpolate the view and store it in the View Cache before it is needed. Such a routine could utilize the ``task'' structure of motion representation in Video Finger to anticipate future need.

Network Interface

The network interface software allows Video Finger to communicate with other machines sharing a common Local Area Network. The network interface chosen was the Transfer Control Protocol/Internet Protocol (TCP/IP), due to its prevalent use in the Media Laboratory's computing environment. In addition, the UNIX finger utility is already a supported service of the TCP interface software on many machines. The actual software package used was Kinetics TCPort for MPW C v3.0. The physical network interface used was thin-line Ethernet.

Upon initialization, Video Finger starts the User Task, which is responsible for maintaining the User State Tables. These tables record the remote state of every user being monitored. The State Table entry for each user contains a login flag and a state variable. The valid values for the state variable represent the current actions of the user, such as editing, reading news, compiling, idle, etc. The User Task polls the finger server on remote machines for information necessary to maintain the User State Tables. The rate of this polling is limited by the response time of the remote host. It usually requires fifteen to thirty seconds for most hosts to generate a reply.

Next: Video Finger: Data Up: Synthetic Movies Previous: Video Finger: An

wad@media.mit.edu