We describe a method for estimation of 3-D geometry from 2-D blob features. Blob features are clusters of similar pixels in the image plane and can arise from similarity of color, texture, motion and other signal-based metrics. The motivation for considering such features comes from recent successes in real-time extraction and tracking of such blob features in complex cluttered scenes in which traditional feature finders fail---scenes containing moving people, for example.
We use nonlinear modeling and a combination of iterative and recursive estimation methods to recover 3-D geometry from blob correspondences across multiple images. The 3-D geometry includes the 3-D shapes, translations, and orientations of blobs and the relative orientation of the cameras.
Using this technique, we have developed a real-time wide-baseline stereo person tracking system which can self-calibrate itself from watching a moving person and can subsequently track people's head and hands with RMS errors of 1--2 cm in translation and 2 degrees in rotation. The blob formulation is efficient and reliable, running at 20--30 Hz on a pair of SGI Indy R4400 workstations with no special hardware.