Next: System Integration and Feedback Up: MIT Media Laboratory, Perceptual Previous: Mapping Residuals to Spatial

Initialization and Parametrization of the Kalman Filter State Vector

Since the particular objects being tracked by the system are faces, we can initialize the system with a 3D model of the structure of a head to speed up convergence of true structural motion. In addition, during tracking and estimation, a more constrained set of 3D configurations for the structural estimate in the SfM solution is expected. Only faces are being tracked so we do not wish to allow the structural estimate of the SfM computation to diverge to another shape. Thus, we propose filtering the estimated 3D structure computed by the EKF to avoid any unreasonable estimates. This is done by constructing an eigenspace filter from a set of previously scanned 3D structures.

Recall the set of cyberware heads used to generate the average 3D human head for face detection. These 3D models have all been aligned into frontal view. When automatic face detection determines the loci of eyes, nose and mouth, it aligns a 3D average head model to these locations. Thus, it automatically has an estimate of the depth map of the face and the depth values at the positions of the feature points to be tracked are sampled. In addition, the system has an estimate for the 3D pose of the face $(T_X, T_Y, T_Z, \theta_X, \theta_Y, \theta_Y)$ . The SfM state vector can thus be initialized (camera geometry is arbitrarily set to $\beta=0.5$ ) using much of the information from the previous face detection stage which gives us $\bf {x}\rm _{t=0}=(T_X, T_Y, T_Z\beta, \theta_X, \theta_Y, \theta_Y, \alpha_1, \alpha_2, ..., \alpha_N)$ .

During tracking, the structural estimate can also be filtered to prevent any non-face-like structural estimates. Recall that the average 3D head model was aligned to the locations of the eyes, nose and mouth. Susbequently, the 3D model of the average head generates a depth map to find the initial values for $(\alpha_1, \alpha_2, ..., \alpha_N)$ . This is also done for each of the other cyberware heads so that multiple vectors of $\bf {\alpha}\rm = (\alpha_1, \alpha_2, ...,\alpha_N)$ are generated. We perform a Karhunen-Loeve decomposition on 12 such $\bf {\alpha}$ vectors from our 12 cyberware 3D head models and obtain a parametrized representation of the structure.

The eigenspace is computed each time the system is initialized since the parametrization of structure $(\alpha_1, \alpha_2, ..., \alpha_N)$ depends on initial feature positions in the image plane. However, due to the small size of the training set, this computation is trivial.

A linear subspace is formed from the first 4 eigenvectors of this eigenspace (the eigen- $\alpha$ -structures). At each time time step, we project the Kalman filter's current estimate of structure into this eigenspace. Thus, the N degrees of freedom in the structural estimate are constrained by the 4 degrees of freedom in our linear subspace of facial structure. Equation 13 maps the current structure vector into an eigenspace parametrization by projection onto the eigenvectors $\bf {e}\rm _i$ . Equation 14 reconstructs the filtered structure vector, $\hat{\alpha}$ . Thus, constraints are introduced into the loop by filtering the recovered SfM information with an eigenspace.

$\begin{displaymath}c_i = \bf {\alpha}\rm\cdot \bf {e}\rm _i \end{displaymath}$

(13)

$\begin{displaymath}\hat{\alpha} = \sum_{i=1}^{i=4} c_i \bf {e}\rm _i \end{displaymath}$

(14)

Next: System Integration and Feedback Up: MIT Media Laboratory, Perceptual Previous: Mapping Residuals to Spatial

Tony Jebara
1999-12-07