Speech and Audio Processing

Supervisor Deb Roy

Brian Clarkson

Toucan GIF Adaptive Interfaces

Language Acquisition from Natural Audio and Visual Input

We are developing a computational model of word learning which learns from natural audio and visual input. Given images from a video camera and associated spoken utterances recorded from a microphone, the system learns salient words and their meanings. This automatically acquired vocabulary can then be used to understand and generate spoken language. Although simple in its current form, this effort is a first step towards a more complete, fully-grounded model of language acquisition. The current system can be applied to human-computer interfaces which use spoken input. A significant problem in designing effective speech interfaces is the difficultly in anticipating a person's word choice and associated intent. Our system addresses this problem by learning the vocabulary of each user together with its visual grounding. We are investigating several practical applications including adaptive human-machine interfaces for information browsing, assistive technologies, education, and entertainment.

Adaptive Mulitmodal Interfaces

Multimodal interfaces combine speech recognition, computer vision, and machine learning to bridge the gap between man and machine. This brief paper describes where we are today and what the future holds.

Audio Wearables

SoundWear

SoundWear is an AUI (Audio User Interface) developed for Windows 95/NT. It provides a natural computing environment for wearable computing that is unobtrusive and powerful. Most wearable computing systems are dependent on a GUI that is customized for desktop applications and not wearable computing. SoundWear gets around this problem by having it whole interface in the tactile and auditory domain. The only I/O that SoundWear uses is input from a Twiddler keyboard, audio I/O, and speech recognition.

Speech Recognition

Under Contruction

Related Projects Past and Present

Present

Nomadic Radio

Nomadic Radio is an attempt towards a personalized and dynamic audio-only information environment. It uses the rich metaphor of Radio to structure simultaneous and spatialized audio streams as radio broadcasts of timely information.

Past

NewsComm

The NewsComm system delivers personally selected audio information to mobile users through a hand-held audio playback device. The system provides a one-to-one connection from individual users to information providers so that users can access information on demand with breadth and depth unattainable through traditional media. Filtering mechanisms help the user efficiently find information of interest.

Back to Vismod Projects

Last Updated July 27, 1997
Maintained by Travell Perkins
tperkins@media.mit.edu