Next: Conclusions Up: MIT Media Laboratory, Perceptual Previous: System Integration and Feedback

Testing and Performance

The full detection and tracking loop was tested on live video streams. Typically, detection found a face within 1 or 2 loops and was able to handle $\pm$ 20 degrees rotation in-plane as well as roughly $\pm$ 20 degrees rotation out-of-plane. This flexibility is due to the rather lax constraints on feature detection and the heauristics in the search. However, the consequent false alarms are eliminated by using 3D normalization and a strict eigenspace DFFS technique. Thus, subjects do not need to look explicitly at the camera for tracking to commence since detection can handle non-frontal views. Detection has been tested successfully in a wide variety of backgrounds, under many views and with numerous subjects. The system was used to detect facial features in the Achermann face database (courtesy of the University of Bern in Switzerland) and obtained over 90% success even though the skin classification stage was not used (the images were gray-scale). The database contains 30 individuals in 10 different views (of which 8 involve significant out-of-plane rotation).

**Figure:** Real-Time Closed-loop tracking of a sample video sequence.
$\begin{figure}\center \setlength \tabcolsep{2pt} \begin{tabular}[b]{ccccc} ... ... Frame=354 & Frame=827 & Frame=1175 & Frame=1527 \end{tabular}\end{figure}$

Real-time tracking was tested on the live video sequence shown in Figure 11. Roughly 2000 frames were tracked without feature-loss (over 1 minute of tracking in real-time). The filtered tracking windows are shown projected on the face. The normalized mug-shot (after 3D warping and illumination correction) is shown at the bottom of Figure 11.

As can be seen, the subject is undergoing large in-plane and out-of plane rotations in all axes as well as partial occlusion (in frame 827). Out-of-plane rotations of over $\pm$ 45 degrees are tolerated without feature loss. Even though almost half of the correlation-based trackers may be occluded under large, out-of-plane rotations, the global EKF filtering maintains tracking using the visible features. Unless very jerky motion is used or extreme out-of-plane rotations are observed, the system maintains tracking and does not exhibit instability. The system has been tested on multiple subjects from live video streams and tracking performance is consistent.

Figure 12(a) displays the typical residual correlation error of a tracking window. However, this noisy behaviour is filtered and a stable estimate of depth structure is obtained in Figure 12(b). The EKF converges quickly to the true underlying 3D geometry despite noisy feature tracking. We also measured the SSD residual between the initial mug-shot (at frame 0) and the current normalized face. Figure 12(d) displays the DFFS value over the sequence which is used as a cue to stop tracking (when DFFS is too large). In this sequence, the threshold was set to a generous value of 0.5 and face detection was not re-used since tracking did not fail. However, if the DFFS value were to exceed 0.5, tracking would stop and detection would search for a new face.

**Figure:** EKF Estimates and Residual Errors.
$\begin{figure}\center \begin{tabular}[b]{cccc} \par\epsfysize=0.7in \epsfbo... ...D for Normalized Face & DFFS for Normalized Face \par\end{tabular}\end{figure}$

Next: Conclusions Up: MIT Media Laboratory, Perceptual Previous: System Integration and Feedback

Tony Jebara
1999-12-07