Kronos: Model-Based Head Tracking
What It Does
Kronos is a system that tracks the rigid motion of heads in 3D from a
single 2D camera view. It can very accurately recover the 3D
translations and rotations of the head (see Ground Truth section
below) and is stable over hundreds of frames.
How It Does It
Kronos first automatically fits a 3D ellipsoid to the head in the
first frame of the sequence using the feature positions produced by
the modular eigenspaces work of Baback
Moghaddam and Alex P. Pentland.
To compute the motion of the model from the current to the next frame,
the optical flow between the frames is first computed. The six rigid
parameters (three rotations, three translations) of the ellipsoid are
then iterated about their current position. The "model flow" is
defined as the flow resulting from moving the model from its current
parameters to a given set of iterated parameters. A robust error norm
is used to compare this model flow with the actual optical flow. The
set of parameters with the (locally) smallest error is chosen as the
model parameters for the next frame. This process is then continued
for the next set of frames.
The images at the top of this page show several key frames from a
hundred frame sequence. The first row of images are the original
input frames. The second row shows the ellipsoid model with the
current estimated parameters superimposed on the original frames.
Who Developed It
This research was done by Sumit
Basu, Irfan A. Essa, and Alex P. Pentland.
Acknowledgements
This material is based upon work supported in part by a National
Science Foundation Graduate Fellowship. We also gratefully
acknowledge our corporate sponsors, especially British Telecom, which
has worked closely with us on parts of this project in terms of both
research and funding.
Where To Get More Information
Download Vismod Technical Report #362, Motion Regularization for
Model-based Head-Tracking, from our
tech-reports page.
Ground Truth Sequence
To demonstrate the accuracy of our system, we generated a sequence for
which the rigid parameters of the head were known exactly for each
frame. Using computer animation, a texture-mapped head was moved
around on a real background. This head was then tracked using the
ellipsoidal model. The head was also tracked using a 2D planar model
to demonstrated the advantages of the full 3D model. As you can see
in the sequences, while the point to point correspondence (i.e., a
point on the mesh to a point on the face) is good for both models, the
3D parameters are much more accurate for the ellipsoidal model (see
the plots). Note
the angles (alpha, beta, and gamma) in particular.
Click on the following to see the corresponding QuickTime MOV movies:
the original
synthetic sequence
tracking with
the ellipsoidal model
tracking with
the planar model
Click here to see plots of each of the
following rigid parameter values for the original sequence, the
ellipsoidal model, and the planar model:
- alpha (rotation about the
object's z axis)
- beta (rotation about the
object's y axis)
- gamma (rotation about the
object's x axis)
- x (translation along the
global x axis)
- y (translation along the
global y axis)
- z (translation along the
global z axis)
Real Sequences
Click on the following images to see QuickTime MOV movies of the
corresponding sequences (with the ellipsoidal model superimposed on
each frame).
Sumit
30 FPS, HandyCam, 320x240 images
Irfan
30 FPS, HandyCam, 320x240 images
Chris
30 FPS, HandyCam, 320x240 images
Yaser
15 FPS, unknown camera type, 280x210 images
Sumit
5 FPS, IndyCam, 90x90 images
Back up to the FaceView menu