Next: Related Work Up: No Title Previous: List of Tables

Introduction

Applications such as video databases, wireless virtual reality interfaces, smart rooms, very-low-bandwidth video compression, and security monitoring all have in common the need to track and interpret human action. The ability to find and follow people's head, hands, and body is therefore an important visual problem.

To address this need we have developed a real-time system called Pfinder (``person finder'') that substantially solves the problem for arbitrarily complex but single-person, fixed-camera situations. The system provides interactive performance on general-purpose hardware, and has been tested on thousands of people in several installations around the world, and has performed quite reliably.

Pfinder has been used as a real-time interface device for information spaces[19], performance spaces[22], video games[18], and a distributed virtual reality populated by artificial life[6]. It has also been used as a pre-processor for gesture recognition systems, including one that can recognize a forty-word subset of American Sign Language with near perfect accuracy [20].

Pfinder adopts a Maximum A Posteriori Probability (MAP) approach to detection and tracking of the human body using simple -D models. It incorporates a priori knowledge about people primarily to bootstrap itself and to recover from errors. The central tracking and description algorithms, however, can equally well be applied to tracking vehicles or animals, and in fact we have done informal experiments in these areas. Pfinder is a descendant of the vision routines originally developed for the ALIVE system [7], which performed person tracking but had no explicit model of the person and required a controlled background. Pfinder is a more general, and more accurate, method for person segmentation, tracking, and interpretation.

Christopher R. Wren
Wed Feb 25 14:56:43 EST 1998