The process of face detection and alignment consists of a
two-stage object detection and alignment stage, a contrast
normalization stage, and a feature extraction stage whose output is
used for both recognition and coding. Pictures above illustrate the
operation of the detection and alignment stage on a natural test image
containing a human face.
The first step in this process is illustrated in "Estimated
Head Position and Scale" where the ML estimate of the position and
scale of the face are indicated by the cross-hairs and bounding
box. Once these regions have been identified, the estimated scale and
position are used to normalize for translation and scale, yielding a
standard ``head-in-the-box'' format image. A second feature detection
stage operates at this fixed scale to estimate the position of 4
facial features: the left and right eyes, the tip of the nose and the
center of the mouth. Once the facial features have been detected, the
face image is warped to align the geometry and shape of the face with
that of a canonical model. Then the facial region is extracted (by
applying a fixed mask) and subsequently normalized for contrast.