next up previous
Next: Mapping 2D Feature Tracking Up: Structure from Motion Previous: Structure from Motion

Stable Representation for Recursive Estimation

The objective of SfM is to recover 3D structure, motion and camera geometry. These form the ``internal state vector'', $\bf x$ of the system under observation. These internal states are to be recovered by observation measurements of the system. For a thorough justification of the internal state vector representation, consult Azarbayejani and Pentland [2]. One internal state parameter is the camera geometry. Instead of trying to estimate focal length to describe the camera, we estimate $\beta = \frac{1}{f}$. The structure of points on the 3D object is represented with one parameter per point instead of an XYZ spatial location. The mapping from this 3 Cartesian form to one parameter is described in Equation 6 where $\alpha$ is the new representation of structure and u and v are the coordinates of the point in the image plane when tracking is initialized.



 
$\displaystyle \begin{array}{ccc}
\left [
\begin{array}{c}
X \\
Y \\
Z
\end{ar...
...a \beta ) u \\
(1+ \alpha \beta ) v \\
\alpha
\end{array}\right ]
\end{array}$     (6)


In addition, we define translation as $(t_X,t_Y,t_Z\beta)$. Rotation is defined in terms of $(\omega_X, \omega_Y,
\omega_Z)$ which are the incremental Euler angles for the interframe rotation. This representation of rotation overcomes the normality constraints of the quaternion representation by linearizing with a tangent hyper-plane on the unit hyper-sphere formed by the quaternion representation.

The final representation of the internal state vector has a total of 7+N parameters where N is the number of feature points being tracked (each of which requires one scalar depth value to determine 3D structure):



 \begin{displaymath}\bf {x}\rm = (t_X,t_Y,t_Z\beta, \omega_X, \omega_Y, \omega_Z, \beta,
\alpha_1, \alpha_2, ..., \alpha_N)
\end{displaymath} (7)


At each time step, we also have a measurement or observation vector, $\bf {y}$ of size 2N with the following form:



 \begin{displaymath}\bf {y}\rm = (X_1, Y_1, X_2, Y_2, ..., X_N, Y_N)
\end{displaymath} (8)


Where (Xi,Yi) are the positions of a feature point currently being tracked in the image. Unlike other formulations which are underdetermined at every time step, the above parametrization of the SfM problem is well-posed when $2N \geq 7+N$ or when $N \geq 7$. Thus, if 7 or more feature points are being tracked in 2D simultaneously, a unique, well-constrained solution can be found for the internal state and a recursive filter can be employed.

Due to the non-linearities in the mapping of state vector to measurements, an extended Kalman filter is used as the estimator. The dynamics of the internal state are trivially chosen to be identity with Gaussian noise for each time step.


next up previous
Next: Mapping 2D Feature Tracking Up: Structure from Motion Previous: Structure from Motion
Tony Jebara
1999-12-07