An Automatic System for Detection, Recognition & Coding of Faces

The system diagram above shows a fully automatic system for detection, recognition and model-based coding of faces for potential applications such as video telephony, database image compression, and automatic face recognition. The system consists of a two-stage object detection and alignment stage, a contrast normalization stage, and a Karhunen-Loeve (eigenspace) based feature extraction stage whose output is used for both recognition and coding. This leads to a compact representation of the face that can be used for both recognition as well as image compression. Good-quality facial images are automatically generated using approximately 100-bytes worth of encoded data. The system has been successfully tested on a database of nearly 2000 facial photographs from the ARPA FERET database with a detection rate of 97%. Recognition rates as high as 99% have been obtained on a subset of the FERET database consisting of 2 frontal views of 155 individuals.

Detection and Alignment

Original Input Image
Estimated Head Location & Scale
Head-Centered Image
Estimated Facial Feature Locations
Warped & Masked Facial Region

The process of face detection and alignment consists of a two-stage object detection and alignment stage, a contrast normalization stage, and a feature extraction stage whose output is used for both recognition and coding. Pictures above illustrate the operation of the detection and alignment stage on a natural test image containing a human face.
The first step in this process is illustrated in "Estimated Head Position and Scale" where the ML estimate of the position and scale of the face are indicated by the cross-hairs and bounding box. Once these regions have been identified, the estimated scale and position are used to normalize for translation and scale, yielding a standard ``head-in-the-box'' format image. A second feature detection stage operates at this fixed scale to estimate the position of 4 facial features: the left and right eyes, the tip of the nose and the center of the mouth. Once the facial features have been detected, the face image is warped to align the geometry and shape of the face with that of a canonical model. Then the facial region is extracted (by applying a fixed mask) and subsequently normalized for contrast.

Recognition and Coding

Once the image is suitably normalized with respect to individual geometry and contrast, it is projected onto a set of normalized eigenfaces. The figure above shows the first few eigenfaces obtained from a KL expansion on an ensemble of 500 normalized faces. In our system, the projection coefficients are used to index through a database to perform identity verification and recognition using a nearest-neighbor search.

The First 8 Normalized Eigenfaces

Here, the geometrically aligned and normalized image is projected onto a custom set of eigenfaces to obtain a feature vector which is used for recognition purposes as well as facial image coding.

Last modified: Thu Jul 25 10:22:59 EDT 2002