next up previous
Next: Experiment 2: Object Tracking Up: Experiments Previous: Experiments

Experiment 1: Egomotion, Models from Video

In this example, a texture-mapped model of a building is extracted from a 20-second video clip of a walk-around outside a building (the Media Laboratory, MIT). Figure 8(a) shows two frames of the original digitized video with feature points overlaid.

Twenty-one features on the building were tracked and used as measurement input to the EKF described earlier. The resulting estimates of camera geometry, camera motion, and pointwise structure are shown in Figure 10. The EKF is iterated once to remove the initial transient.

Figure: Experiment 1: Recovering Models from Video. (a) Features are tracked from video using normalized correlation. (b) 3-D polygons are obtained by segmenting a 2-D image and back-projecting the vertices onto a 3-D plane. This 3-D plane is computed from the recovered 3-D points corresponding to image features in the 2-D polygon. (c) Texture maps are obtained by projecting the video onto the 3-D polygons. The estimated motion and camera parameters are used to warp and combine the video from 25 separate frames to create the texture map for each polygon. (d) Alternate views of the recovered model.
\begin{figure}
\begin{center}
\begin{tabular}{ccc}
& \multicolumn{2}{c}{\large\...
...in}&
\psfig{figure=fly3.ps,width=1.1in}\end{tabular}\par\end{center}\end{figure}

Figure: Experiment 2: Head Tracking. (a) 2D Feature tracking, (b) Vision and Polhemus estimates of head position. Much of the observed error is known to be due to Polhemus error. RMS differences are 0.11 units and 2.35 degrees.
\begin{figure}
\begin{center}
\begin{tabular}{ll}
\multicolumn{2}{c}{\large\bf E...
... (b) Vision versus Polhemus Estimates} \\
\end{tabular}\end{center}\end{figure}

Recovered 3-D points were used to estimate the planar surfaces of the walls. The vertices were selected in an image by hand and back projected onto the planes to form 3-D polygons, depicted in wireframe in Figure 8(b). These polygons, along with the recovered motion and focal length were used to warp and combine video from 25 separate frames to synthesize texture maps for each wall using a procedure developed by Galyean [6]. In Figure 8(c,d), the texture-mapped model is rendered along the original trajectory and at some novel viewing positions.

Figure: Experiment 1: Models from Video. Structure, motion, and focal length recovered from 2D features.
\begin{figure*}
\begin{center}
\par\begin{tabular}{cccc}
\multicolumn{4}{c}{\lar...
...) Building -- Iteration 1}\\
\par\end{tabular}\par\par\end{center}\end{figure*}


next up previous
Next: Experiment 2: Object Tracking Up: Experiments Previous: Experiments

1999-05-17