Semiautomatic scene modeling from a
three views with partially known structure
My doctoral work thus far has been to extract view invariant information
about a scene given one or more
views of a static scene
given preselected sets of
parallel and
coplanar edges.
View invariant scene information includes
3-D geometry of objects in the scene, surface texture/reflectance
of objects, scene lighting conditions, and a description of the
cameras
used to create those views.
Given a sequence of partially overlapping photos of a static scene we
determine 3-D scene structure and camera parameters. Unlike
pure-rotation approaches to scene mapping, i.e. Apple's QuickTime VR, which
require known focal length and lens distortion and constrain camera
motion to planar rotation about the camera's nodal point (or center
of projection), my approach places no restriction on camera
placement AND recovers a continuous 3-D rather than merely a 2-D
description.
Since each image is individually calibrated from visible line
structures, each photo can have arbitrary camera position,
orientation, focal length and
lens distortion. For example, here we process three views of a
room of an exhibit shown in the Fall of '94 at List Visual Art Center located
on the Weisner building at MIT.
Original view #1
and the same view overlayed with detected edges
which have been manually grouped into
parallel and
coplanar sets.
Original view #2
and the same view overlayed with detected edges
which have been manually grouped into
parallel and
coplanar sets.
Original view #3
and the same view overlayed with detected edges
which have been manually grouped into
parallel and
coplanar sets.
This set of CAD-like geometric constraints among 2-D points and lines
is analyzed to produce a textured 3-D model of the scene. Here is a synthetic view of the result and a low
resolution 362K MPEG
fly-through movie. The textured 3-D model is stored in OpenInventor VRML
format. If you have an OIV viewer (like ivview, Holodeck or 3DView)
here's a low-res textured version of the scene's .iv file and here's a bigger
.iv file.
Structured Video
This technique has also been used to determine 3-D positions of actors
from video taken in a
chromakey studio. Camera pose relative to the studio floor is found
using calibration cubes. Keeping cameras fixed, the cubes are removed
and actors are filmed in front of chromakeyed background. An
alpha-mask sillouette is found, and the head and feet are detected in
each view. Image coordinates of the feet are projected onto the known
floor to determine actor position in each frame.
The alpha-mask and computed 3-D positions of the actors are then used
to composite the actor's 2-D cutouts into the textured museum
scene. TVOT's parallel processing platform, Cheops, is used
to render actor textures into views of the textured 3-D scene at a
rate of about 12 frames per second, which is the basis of our research
of interative 3-D digital video. Here's a 536K MPEG movie that
shows the final result.
sbeck@media.mit.edu
Copyright © 1994 by MIT Media Lab. All rights reserved.