Well, did it work ? In a nutshell: NO.
To reduce the scope of the project, I didn't spend any time working on the code for detecting and matching feature points in the image sequence. I used synthetic data, generated by creating a set of points in three space, then projecting onto an image plane for each frame in a test sequence. See the sources for details. This allowed me to easily vary the parameters of the motion being estimated.
The first version I got running, which only modeled translational motion (no rotational motion) of the Iterated Extended Kalman Filter (IEKF) (and to a lesser extent, the Extended Kalman Filter, or EKF) did track small camera motion, for medium length test sequences (500 frames), given a good initial estimate. This (to some extent) validated the functionality of my matrix math and generic extended kalman filter code.
I played around with this translation only version for a while (I was fascinated that it finally converged, after the last Jacobian calculation error was fixed !), noticing it's sensitivity to an accurate initial estimate - both the starting values and their covariances. The error in the estimate could be easily steered from one state variable to another by changing the system (actually motion model) noise covariances.
When I added support for rotation, it stopped working as well. But I'm not really sure...
I added a batch method for generating (strictly speaking - improving) the initial estimate, selecting a Levenberg-Marquardt (LM) nonlinear minimization algorithm. This did a pretty good job, quickly converging on a relatively correct estimate of the structure. The problem here becomes one of when to stop it...
If you continue to iterate, the LM approach starts overfitting, which can produce fantastic results if some of the feature points are poorly located ! This could be detected using measurements not used in the minimization (out of sample data), but the nature of this problem made that difficult. I could withhold some of the measurements, but the global location of their corresponding feature point would not be known, making difficult an estimate of the error. In addition, that would only indicate overfitting evident in the camera parameters (not the structure estimate.) I settled for iterating until the squared error was a small fixed value (around 1.0e-7, which typically requires three iterations to achieve, not 0.01 as NR suggests).
An analysis of the computational complexity of the two algorithms is
beyond the amount of time left me, but suffice to say that they both
require matrix inversions. Given n feature points and m
motion parameters, in the case of the IEKF a 2n by 2n matrix
is inverted (and several matrices are multiplied and added) per iteration.
In the LM minimization, a 3n + m by 3n + m matrix is inverted
per iteration (regardless of the number of frames in the batch !)
Future Directions
Jump back to my NMM index page
wad@media.mit.edu