Aaron Bobick
Associate Professor of Computational Vision
MIT Media Laboratory
Contact Information:
20 Ames St.
Cambridge, MA 02139
Tel : 617-253-8307
Fax: 617-253-8874
Internet: bobick@media.mit.edu

Research interests

For the last several years my work has focused on the visual machine perception of action. Lately I've taken to thinking of the different motion understanding problems as a taxonomy consisting of movement, activity, and action:. There is a page (or pages) with my research interests divided up into areas, or you can get there through the topics below:

  • Movement is the most primitive form of motion that can be interpreted semantically. Movements are typically atomic, with no distinct parts. To recognize movements one only needs a description of the appearance of the motion. Time can typically be handled by only linear scaling. Sitting down or swinging a bat are good examples.

  • Activity is a sequence of either movements or of static configurations. Activity recognition typically requires a statistical model of the sequences which include not only configurational variation (how the constituent states or movements may vary in appearance) but also temporal variation. Time is manipulated by some form of dynamic warping. The gesture literature exploiting HMMs is a good example. Still only the object or objects in motion needs to be considered, not the surrounding context.

  • Action are the high-level entities that people typically use to describe what is happening. Two examples from my own work are "The chef is mixing the ingredients." or the "NE Patriots just ran a p51-curl play.'" (You really have to go to the research interests page to see this stuff and pointers to the demos!) Representing and recognizing actions requires qualitative descriptions of time, such as intervals. Context becomes fundamental, though how much one has to reason about context is unclear. Currently we are focusing on actions that can be recognized by what they look like as opposed to needing to reason about such unsavory things as intentionality and the like. Not the I don;t find that incredibly interesting. I just don't know how to do it well enough to be useful. Yet.
  • Two other pointers I'd like to include here are two major applications we've constructed during the past two years.

  • The first is the KidsRoom, perhaps the world first interactive, narrative play space for children. It was really wonderful and I wish more remained of it other than the web site (which is huge and comprehensive - 120MB including music, movies, story, technology, everything). There is also description of it on my research page which includes papers.

  • The second is a play, It/I. The play was written, directed, scripted, implemented, etc by my student Claudio Pinhanez. The play had two characters, one ("I") played by an actor, the other ("It") controlled by a computer. The computer could display images and graphics on two large projection screens, control sound effects and music, and control the lights. There were two parts that were cool. The first is that we needed to hack up some clever stereo-based background subtraction to be able to operate in the varying light condition of a stage. But of much greater consequence is that the entire performance was performed automatically by the computer watching the the actor and responding appropriately. The novel technology employed is that of interval scripts, a scripting method based on the work Claudio and I did on representing temporal constraints for action recognition.
  • Publications

    There are several ways to get my publications. You can:

    Or, finally there is the boring way for those who need it: All the publications from a Word->HTML version of the official MIT CV as of Jan 1999. For those who want a prettier rendition, get the PostScript version.


    Students and Affiliates

    Current Media Lab Graduate Students:

    Jim Davis
    Lee Campbell,
    Stephen Intille
    Yuri Ivanov
    Claudio Pinhanez
    Martin Szummer
    Andy Wilson

    My personal Web page is here.


    Vision and Modeling is the research group in which my lab is located. Check out the Vismod Home page for links to all TRs, demos, and other goodies.