3-D Models from Uncalibrated 2-D Views -----

Input

We take a set of 7 overlapping photos of a room with a hand-held uncalibrated camera of unknown lens and focal length. Here are 3 of the views. Notice that camera focal length, position and rotation are significantly different in all views.

Analysis

Point and line features are specified by hand, and then grouped across views into parallel and coplanar sets. Corresponding views of scene elements are used to get a least-squares estimation for each view's camera parameters describing focal length, world-relative rotation and position. For this particular scene, I took image center as principal point and considered lens distortion to be negligable. However, with more edge information these parameters can also be estimated as shown in previous results.

Here is an orthographic wireframe view of the resulting cameras and scene element geometry. Estimated 3-D image planes and center of projections for each cameras are visible as grey rectangles and yellow vectors.

Synthesis

Least-squares estimates for geometry and extent of polygons in the scene are then found by projecting features from the now calibrated views into the scene. Textures are sampled and averaged according to original visibility. The result is a VRML file that describes the cameras and surfaces that were visible in the original views. Any view (including one from the original set of images) can be synthesized by texture rendering. A few frames of the result are shown above or you can see a 120K MPEG movie navigating through the synthetic 3-D textured scene.

Notice the artifacts: patches mark the extent of various images in which a surface is visible (perhaps fixable by correcting for vignetting or just avoid the entire problem by using a radial camera to get a cylindrical panoramic view from a fixed point); textures don't line up when non-planar structures are forced to project onto a plane (may be able to correct for displacement from the plane if dense texture correspondences can be found); specularity and translucency (maybe use a median filter to get rid of them).

(1/96 Update): many of these problems have since been solved. The patchwork effect is fixed by treating the original images as fading towards transparency at the edges. Texture mis-matches on planar surfaces have also been solved (or at least minimized) by doing an iterative down-hill simplex search which refines surface and camera parameters by minimizing visible point and line feature mis-alignment.

User Interface

Here's a snapshot of the user interface. The entire package consists of three programs: sceneBuild, sceneAnalyze and sceneView. Each program uses Tcl for user level scripting language and does Xlib display of 2D and 3D graphics in custom software. Simple menus and dialogs are implemented as separate Tk applications that communicate using Tk's send protocol.

(1/96 Update): As the first user of sceneBuild, sceneAnalyze and sceneView besides myself, Bill Butera is using his painful experience for good and writing a tutorial document. Thanks Bill!

Eugene Lin is writing a newer version of sceneBuild called ModelMaker using OpenInventor and Motif widgets. See his very illustrative online documentation.


sbeck@media.mit.edu
Copyright © 1995 by MIT Media Lab. All rights reserved.