Next: FACEFACTS: Modeling Natural Facial
Up: Motion Field Histograms for
Previous: Motion Field Histograms for
The aim of this research is to extract and process facial features
from natural conversation sequences for the purpose of modeling
emotional or cognitive states that generate different expressions. It
is crucial that the system be capable of dealing with the
unconstrained nature of real life data. We divide our task into four
parts : (i) Data collection (ii) Head tracking and initial
normalization (iii) Robust feature extraction (iv) Temporal modeling
of multi-level expressions. The data collection process is designed to
allow the natural flow of interactions. The system starts by
performing initial normalization and alignment on the recorded data
using a 3D model-based head tracker. However, the normalization and
alignment is at best approximate and always suffers from errors in
rotation, translation and scale. We have found no head-tracker that
can provide sub-pixel accurate tracking for extended periods on
medium-resolution video of natural, completely unconstrained head
motion. Thus, it is important to select features that are robust
against scale changes and failures of precise alignment of the input
image, and which are stable and consistent over time. We were inspired
by the performance of the local receptive field histograms for object
recognition originally developed by Schiele and Crowley
[12]. We extend the local histograms approach to be able to
capture the fine scale changes in facial features and be suitable for
building temporal models using Hidden Markov Models.
Most work in automatic understanding of facial expressions has focused
on classification of the universal expressions defined by Ekman
[7]. These expressions are sadness, anger, fear, disgust,
surprise, happiness and contempt. Thus, the algorithms were tailored
towards building models to recognize the universal expressions from
static images or video sequences
[4,8,14]. Recently, some work is being done
towards recognition of individual action units that measure muscle
action, proposed by Ekman as the basis for Facial Action Coding (FACS)
[1,5,6]. All the experiments done and models
built for facial actions or expressions require precise image
registration and in some cases temporal alignment [6]. The
image sequences used for these experiments depict very discrete and
clean examples of specific action units or expressions which are
almost impossible to find in natural, unconstrained interactions.
Next: FACEFACTS: Modeling Natural Facial
Up: Motion Field Histograms for
Previous: Motion Field Histograms for
Tanzeem Choudhury
2000-01-21