Person Tracking



This is a small collection of screen shots of my feature tracking software written for the ALIVE project. ALIVE (an acronym for Artificial Life Interactive Video Environment) is an effort to allow realistic simulation of virtual environments without the burdensome goggles and datagloves customarily associated with VR systems. It uses a passive camera tracking system mounted above a projection screen to isolate the image of the user from the background room (a process known as background subtraction) and locate the user's head, hands, and feet for interaction with the environment through natural gestures. Autonomous agents inhabit the virtual environment and make behavioral decisions based on inputs they receive from other agents, the environment itself, and the user. The image of the user, composited with the virtual environment, is flipped horizontally and projected onto the screen, creating a "magic mirror" effect.

Trevor Darrell's previous system was implemented using a special purpose vision box by Cognex, which did background subtraction and hand tracking by direct manipulation of the bitmap. Composition of the virtual environment and the real-world room was achieved by chroma keying, which necessarily kept the user in front of the computer graphics. Because of the nature of the ALIVE system, complex heuristics had to be hand written for the vision system to understand different and unusual positions that people might assume for interaction with the agents. Examples include bending over to the side and sqatting down; in both cases the hands are not where the system would generally expect.

This system, built by Michael P. Johnson, Thad Starner, and others, uses a digitizer on a Silicon Graphics Indigo^2 to reimplement the background subtraction algorithm in software. This allows not only portability but also flexibility of the dependent algorithms. My software converts the resultant bitmap from the background subtraction into a polygon which can be rendered into the 3-D virtual environment, allowing proper occlusion of objects by the person and vice versa. As a result of this conversion process, extremities are essentially found automatically, reducing the hand tracking problem to a problem of classification. This system is not only more general than the previous one, but also drastically reduces the reliance on heuristics.

The above images illustrate this technique: the white balls at the top left and lower right of the silhouette of the person indicate the bounding box, and the colored balls are the head, hands, and feet. Note that the same color ball is always at the same extremity. The brightness of each ball indicates the confidence measure at that point. With this measure, many new positions that would be untrackable using heuristics can be successfully tracked.

And no, I will not tell you how many attempts it took to get each of these pictures. :-) (Very few, I promise.)

Note that this project was subsumed several years ago into pfinder, the primary architects of which were Christopher Wren and Ali Azarbayejani.


Kenneth Russell - kbrussel@media.mit.edu