Having determined the locations of facial features in the image, it is now possible to define a number of windows on the face which will be used for template matching via SSD correlation [5]. Using a simple mapping, a set of windows are overlayed upon the face automatically from the data gathered in the face detection stage. A typical initialization result is shown in Figure 9. Eight tracking windows are initialized on the nose, the mouth tips and the eyes automatically as shown. These windowed correlation trackers acquire templates from the image and minimize the SSD of the underlying image patch from one frame to the next. The image patches first undergo contrast and brightness compensation. Registration of the image patch from one frame to the next is accomplished by minimizing the normalized correlation over translation, scaling and rotation parameters. A linear approximation of the behaviour of the image patch under small translation, scaling and rotation perturbations can be used to recover the motion of the image patch. Only simple linear computations are required for this (i.e. no explicit searching) rendering the computation quite efficient.
Given an image
at time 0, we wish to find
that minimizes
defined in Equation 5.
Where
is a motion parametrized by vector
which allows translation, rotation and scaling. In other
words,
.
Solving
for
in an optimal
sense is performed by computing
the pseudo-inverse of a matrix composed of the motion
templates. Such a solution for
is only valid for small
displacements and smoothing is used to extend the applicable range of
the solution.
The minimum value of
is also recovered by the process
which gives us a cue for the reliability of the resulting optimal
.
Unfortunately, minimizing
over rotations, scaling and
translations cannot account for other 3D or complex changes in the
image region. Such changes might be induced by 3D out of plane
rotations, occlusions or noise and could easily mislead the estimate
of
.
Thus, the correlation window typically loses track of
the feature being tracked if it undergoes excessive change beyond the
span of the 2D motion model. In addition, due to the local nature of
the tracking algorithm, it would be extremely unlikely for feature
tracking to recover from this failure without external
assistance. Even if multiple features are being tracked, without a
strong coupling feature tracking will eventually fail. As
unpredictable effects such as 3D structure, occlusion and noise,
interfere with the 2D tracking, each of the feature trackers will
stray off in turn and yield invalid spatial trajectories.
What is desired is a global framework that overcomes some of the difficulties inherent in simple 2D tracking by coupling the individual trackers to a global 3D structure. The outputs of the trackers are integrated appropriately to achieve a global explanation of the scene which can be fed back to constrain their individual behaviour and avoid feature loss.