next up previous
Next: FACEFACTS: Modeling Natural Facial Up: Motion Field Histograms for Previous: Motion Field Histograms for

Introduction

The aim of this research is to extract and process facial features from natural conversation sequences for the purpose of modeling emotional or cognitive states that generate different expressions. It is crucial that the system be capable of dealing with the unconstrained nature of real life data. We divide our task into four parts : (i) Data collection (ii) Head tracking and initial normalization (iii) Robust feature extraction (iv) Temporal modeling of multi-level expressions. The data collection process is designed to allow the natural flow of interactions. The system starts by performing initial normalization and alignment on the recorded data using a 3D model-based head tracker. However, the normalization and alignment is at best approximate and always suffers from errors in rotation, translation and scale. We have found no head-tracker that can provide sub-pixel accurate tracking for extended periods on medium-resolution video of natural, completely unconstrained head motion. Thus, it is important to select features that are robust against scale changes and failures of precise alignment of the input image, and which are stable and consistent over time. We were inspired by the performance of the local receptive field histograms for object recognition originally developed by Schiele and Crowley [12]. We extend the local histograms approach to be able to capture the fine scale changes in facial features and be suitable for building temporal models using Hidden Markov Models.

Most work in automatic understanding of facial expressions has focused on classification of the universal expressions defined by Ekman [7]. These expressions are sadness, anger, fear, disgust, surprise, happiness and contempt. Thus, the algorithms were tailored towards building models to recognize the universal expressions from static images or video sequences [4,8,14]. Recently, some work is being done towards recognition of individual action units that measure muscle action, proposed by Ekman as the basis for Facial Action Coding (FACS) [1,5,6]. All the experiments done and models built for facial actions or expressions require precise image registration and in some cases temporal alignment [6]. The image sequences used for these experiments depict very discrete and clean examples of specific action units or expressions which are almost impossible to find in natural, unconstrained interactions.


next up previous
Next: FACEFACTS: Modeling Natural Facial Up: Motion Field Histograms for Previous: Motion Field Histograms for
Tanzeem Choudhury
2000-01-21