3-D Models from Uncalibrated 2-D Views

Input
We take a set of 7 overlapping photos of a room with a hand-held
uncalibrated camera of unknown lens and focal length. Here are 3 of
the views. Notice that camera focal length, position and rotation are
significantly different in all views.
Analysis
Point and line features are specified by hand, and then grouped across
views into parallel and coplanar sets. Corresponding views of scene
elements are used to get a least-squares estimation for each view's
camera parameters describing focal length, world-relative rotation and
position. For this particular scene, I took image center as principal
point and considered lens distortion to be negligable. However, with
more edge information these parameters can also be estimated as shown
in previous
results.
Here is an orthographic wireframe view of the resulting cameras and
scene element geometry. Estimated 3-D image planes and center of
projections for each cameras are visible as grey rectangles and yellow
vectors.
Synthesis
Least-squares estimates for geometry and extent of polygons in the
scene are then found by projecting features from the now calibrated
views into the scene. Textures are sampled and averaged according to
original visibility. The result is a VRML file that
describes the cameras and surfaces that were visible in the original
views. Any view (including one from the original set of images) can be
synthesized by texture rendering. A few frames of the result are shown
above or you can see a 120K MPEG movie
navigating through the synthetic 3-D textured scene.
Notice the artifacts: patches mark the extent of various images
in which a surface is visible (perhaps fixable by correcting for
vignetting or just avoid the entire problem by using a radial camera
to get a cylindrical panoramic view from a fixed point); textures
don't line up when non-planar structures are forced to project onto a
plane (may be able to correct for displacement from the plane if dense
texture correspondences can be found); specularity and translucency
(maybe use a median filter to get rid of them).
(1/96 Update): many of these problems have since been
solved. The patchwork effect is fixed by treating the original images
as fading towards transparency at the edges. Texture mis-matches on
planar surfaces have also been solved (or at least minimized) by doing
an iterative down-hill simplex search which refines surface and camera
parameters by minimizing visible point and line feature mis-alignment.
User Interface
Here's a snapshot of the user interface. The entire package consists
of three programs: sceneBuild, sceneAnalyze and sceneView. Each
program uses Tcl for user level scripting language and does Xlib
display of 2D and 3D graphics in custom software. Simple menus and
dialogs are implemented as separate Tk applications that communicate
using Tk's send protocol.
(1/96 Update): As the first user of
sceneBuild, sceneAnalyze and sceneView besides myself, Bill Butera is
using his painful experience for good and writing a tutorial
document. Thanks Bill!
Eugene Lin is
writing a newer version of sceneBuild called ModelMaker using
OpenInventor and Motif widgets. See his very illustrative online
documentation.
sbeck@media.mit.eduCopyright © 1995 by MIT Media Lab. All rights reserved.