SmartDesk Home Page

Introduction

SmartDesk is a project of the Perceptual Computing group at the MIT Media Lab and encompasses experimentation on a range of computer-based perceptual input and output systems in a personal work environment.

Hardware

The hardware consists of visual input (wide-baseline stereo, pan-tilt-zoom camera), visual output (large screen graphics display), audio input (phased-array microphone), and audio output (stereo loudspeakers). Other sensor modalities are also possible.

The wide-baseline stereo is used for visually tracking the macroscopic movements of the user. The foveating (pan-tilt-zoom) camera is used to obtain high-resolution images of an area of interest, based on various attention-focusing algorithms. The phased-array microphone is used to pick up audio from a direction of interest, usually from the user's head. And the loudspeakers are used to generate transaural spatial audio.

Applications

Planet: visually-guided interaction and transaural rendering
Puppet: visually-guided animation
Visually-guided face recognition

***** Another name for the smart desk is the "Cyberdesk", a term coined by our research partners at BT (British Telecom). *****

More online

Movie

34MB movie

Self calibration

Ali's self-calibration page

3-D person tracking

Face recognition

Baback's face recognition home page

Facial expression analysis

Gesture, action, and event analysis

Irfan's page on causal analysis of gesture

American Sign Language

Thad's ASL page

Vision-steered audio input

Mike's mic page

Transaural audio rendering

Mike's mic page

References and related research papers

Research reports and publications relevant to the underlying technologies of SmartDesk applications.

Self calibration

A. Azarbayejani and A. Pentland, Real-time self-calibrated stereo person tracking using 3-D shape estimation from blob features, TR#363, January 1996 (Submitted to ICPR'96)

3-D person tracking

Face recognition

Facial expression analysis

Gesture, action, and event analysis

American Sign Language

Thad Starner (1995), Visual Recognition of American Sign Language Using Hidden Markov Models (S.M. Thesis)

Vision-steered audio input

M. Casey, W. Gardner, and S. Basu, Vision steered beam-forming and transaural rendering for the Artificial Life Interactive Video Environment, TR#352

Transaural audio rendering

M. Casey, W. Gardner, and S. Basu, Vision steered beam-forming and transaural rendering for the Artificial Life Interactive Video Environment, TR#352

Ali Azarbayejani, ali@media.mit.edu

Last modified: Tue Sep 22 11:29:05 EDT 1998