3D Modeling and Tracking of Human Lip Motions

Sumit Basu, Nuria Oliver, and Alex Pentland

The lips are a critical factor in spoken communication and expression. Accurately tracking and synthesizing their motions from arbitrary head poses is essential for analyzing, understanding, and animating human faces. Our approach is to build and train 3D models of lip motion to make up for the information we cannot always observe when tracking. We use physical models as a prior and combine them with statistical models, showing how the two can be smoothly and naturally integrated into a synthesis method and a MAP estimation framework for tracking. Because the resulting description has a small number of parameters, it is ideal for coding as well.

This page contains three audio-visual sequences demonstrating the performance of our system. A new paper describing the details of our latest techniques is in press, but for now you can view the tracking by clicking on the QuickTime movies below. WARNING: the audio and video tracks are properly synced, but your QuickTime player may disregard this (the INDY, for example, does not bother to keep sync). Also, to really see the motions in detail, I suggest that after viewing the sequences at normal speed, you look at them at half speed or slower as well to see the subtleties in the motions. For further information on this latest work, please contact me directly sbasu@media.mit.edu . (5/21/98)

back to my homepage