Kronos: Model-Based Head Tracking

What It Does

Kronos is a system that tracks the rigid motion of heads in 3D from a single 2D camera view. It can very accurately recover the 3D translations and rotations of the head (see Ground Truth section below) and is stable over hundreds of frames.

How It Does It

Kronos first automatically fits a 3D ellipsoid to the head in the first frame of the sequence using the feature positions produced by the modular eigenspaces work of Baback Moghaddam and Alex P. Pentland.

To compute the motion of the model from the current to the next frame, the optical flow between the frames is first computed. The six rigid parameters (three rotations, three translations) of the ellipsoid are then iterated about their current position. The "model flow" is defined as the flow resulting from moving the model from its current parameters to a given set of iterated parameters. A robust error norm is used to compare this model flow with the actual optical flow. The set of parameters with the (locally) smallest error is chosen as the model parameters for the next frame. This process is then continued for the next set of frames.

The images at the top of this page show several key frames from a hundred frame sequence. The first row of images are the original input frames. The second row shows the ellipsoid model with the current estimated parameters superimposed on the original frames.

Who Developed It

This research was done by Sumit Basu, Irfan A. Essa, and Alex P. Pentland.


This material is based upon work supported in part by a National Science Foundation Graduate Fellowship. We also gratefully acknowledge our corporate sponsors, especially British Telecom, which has worked closely with us on parts of this project in terms of both research and funding.

Where To Get More Information

Download Vismod Technical Report #362, Motion Regularization for Model-based Head-Tracking, from our tech-reports page.

Ground Truth Sequence

To demonstrate the accuracy of our system, we generated a sequence for which the rigid parameters of the head were known exactly for each frame. Using computer animation, a texture-mapped head was moved around on a real background. This head was then tracked using the ellipsoidal model. The head was also tracked using a 2D planar model to demonstrated the advantages of the full 3D model. As you can see in the sequences, while the point to point correspondence (i.e., a point on the mesh to a point on the face) is good for both models, the 3D parameters are much more accurate for the ellipsoidal model (see the plots). Note the angles (alpha, beta, and gamma) in particular.

Click on the following to see the corresponding QuickTime MOV movies:

the original synthetic sequence

tracking with the ellipsoidal model

tracking with the planar model

Click here to see plots of each of the following rigid parameter values for the original sequence, the ellipsoidal model, and the planar model:

Real Sequences

Click on the following images to see QuickTime MOV movies of the corresponding sequences (with the ellipsoidal model superimposed on each frame).

30 FPS, HandyCam, 320x240 images

30 FPS, HandyCam, 320x240 images

30 FPS, HandyCam, 320x240 images

15 FPS, unknown camera type, 280x210 images

5 FPS, IndyCam, 90x90 images

-- DEMOS -- Back up to the FaceView menu