Toco the Toucan
M.I.T. Media Laboratory
SIGGRAPH '97

A Synthetic Character with Speech and Vision

Toco the Toucan is a synthetic creature created at the MIT Media Laboratory. Toco combines speech recognition, computer vision, machine learning and behavior-based animation to create an autonomous character who interacts with people using natural speech and gesture.

Speech Recognition and Learning Toco employs speaker- independent phoneme recognition based on hidden Markov models and artificial neural networks to recognize spoken utterances. Based on machine learning techniques, Toco can acquire and subsequently use new words on the fly.

Computer Vision Toco uses statistical models of human skin color to track a person's hands in real time, allowing him to understand simple hand gestures such as pointing. The current vision system uses a two camera configuration and triangulation to recover 3-D depth information.

Behavior-Based Animation Toco's internal control mechanisms are structured as a loose hierarchy of simple behaviors which interact with perceptual events and internal state variables to produce unpredictable and life-like behavior.

Deb Roy
dkroy@media.mit.edu
Project Lead
Speech Recognition

Tony Jebara
jebara@media.mit.edu
Computer
Vision
  Michal Hlavac
hlavac@media.mit.edu
Creature
Architecture

Bill Tomlinson
badger@media.mit.edu
Graphics
Animation

Christopher Wren
wren@media.mit.edu
Computer Vision
Systems
  Prof. Alex Pentland
sandy@media.mit.edu
Faculty Advisor