Smart Spaces
"Smart spaces" are ordinary environments equipped with
visual and audio sensing systems that can perceive and
react to people without requiring them to wear any special
equipment. The "smart desk" research project consists of a
self-calibrating visual system for 3D motion capture of a
person, a face-tracking and facial-expression analysis
system, and an audio-based expression and speech
analysis system. Using these sensing systems, prototype
applications have been developed for literal and
non-literal physical and expressive animation of virtual
characters that the user can interact with in real time.
"Smart Spaces" was a booth in
1996
Siggraph
Digital Bayou
.
The Bayou was a haven away from the vendor floor indended to
showcase interesting research in graphics and interaction.
The Smart Spaces demonstration was built on top of a diverse set of
technologies. Each application combines some set of
these tools to solve a unique challenge:
-
Vision
-
Head and hands are tracked as flesh-colored blobs in real time using
color image processing.
Three-dimensional position of the head and hands is estimated from
blob correspondences among multiple cameras. Position, orientation
and approximate size are extracted.
Blob features allow self-calibration of multiple cameras by tracking
the head and hands over a short time period. Painstaking calibration
is unnecessary.
-
Audition
-
Segmentation extracts speech events from background noise.
Prosody analysis determines pitch, volume and timing of speech.
Pitch, volume and timing is used to synthesize "wah wah" utterances
by manipulating a sampled bugle note to give the system the ability
to mimic speech, and to drive other aspects of expressive behavior.
-
Facial Expression
-
The face is tracked by a computer controlled pan/tilt/zoom camera,
using a statistical description of skin color. The mouth is detected
using a learned statistical model.
Face orientation and mouth shape information is extracted.
Facial parameters are used to control the expression and head
orientation of an animated character.
-
Gesture Recognition
-
Position of the hands over time is statistically modeled. The models
are computed automatically from a set of example motions from several
people.
The input stream is recognized by computing a quantitative score of how
similar the input is to the stored models.
The user then gets feedback showing how the input motion differs from
a given model.
-
Dynamic Simulation
-
An animated character is controlled by a dynamic model which reacts
to several potential fields: position of head and hands, gravity and
behavioral priors.
Behavioral priors permit animation to follow more than just
physically correct simulation: correct elbow placement, and in the
future, more general habits of the user that constrain motion.
-
Performance Animation
-
The mapping from head and hands to animated character is
interactively set by showing the system example correspondences.
Interpolation is then used to drive the animation.
The mapping can be chosen from a set of existing mappings on the fly
by recognizing features of the head and hand's motion that are
consistent with a mapping.
A large number of applications were showcased at the Digital Bayou to
demonstrate the flexibility of the underlying technologies. The applications covered a wide range of
domains, including animation by example, gesture understanding, education,
entertainment, and information retrieval:
-
Whacka Game
- Whack the wuggles and pop the bubbles! The puppet character follows
your movements and keeps score. Uses vision technology.
Video Avaliable.
-
Waldorf
- Waldorf mimics your movement, voice, and facial expression. Eerie,
huh? Uses vision, facial expression, audition and dynamic
simulation technologies.
-
Luxo Lamp
- Animate the Luxo lamp. Inspired by how people gesture when describing
actions. Uses vision and performance animation technologies.
-
T'ai Chi Teacher
- Watch the master as he performs t'ai chi, try the moves yourself then
watch as the master rates your performance. Uses vision and gesture
recognition technologies.
-
Seagull
- First show the bird how you flap your wings, then take control and
soar over the landscape. Uses vision and performance animation
technologies.
-
Netspace
- Navigate a three-dimensional web-space with body movements and voice
command. Uses vision and speech recognition technologies.
-
Text Actor
- Choreograph a dynamic typographic actor with your voice. Uses
audition and speech recognition technologies.
Christopher R. Wren,
wren@media.mit.edu
Last modified: Mon Dec 30 14:50:42 EST 1996