Speech and Audio Processing
We are developing a computational model of word learning which learns
from natural audio and visual input. Given images from a video camera
and associated spoken utterances recorded from a microphone, the
system learns salient words and their meanings. This automatically
acquired vocabulary can then be used to understand and generate spoken
language. Although simple in its current form, this effort is a first
step towards a more complete, fully-grounded model of language
acquisition. The current system can be applied to human-computer
interfaces which use spoken input. A significant problem in designing
effective speech interfaces is the difficultly in anticipating a
person's word choice and associated intent. Our system addresses this
problem by learning the vocabulary of each user together with its
visual grounding. We are investigating several practical applications
including adaptive human-machine interfaces for information browsing,
assistive technologies, education, and entertainment.
Multimodal interfaces combine speech recognition, computer vision, and machine learning to bridge the gap between man and machine. This brief paper describes where we are today and what the future holds.
SoundWear is an AUI (Audio User Interface) developed for Windows 95/NT. It provides a natural computing environment for wearable computing that is unobtrusive and powerful. Most wearable computing systems are dependent on a GUI that is customized for desktop applications and not wearable computing. SoundWear gets around this problem by having it whole interface in the tactile and auditory domain. The only I/O that SoundWear uses is input from a Twiddler keyboard, audio I/O, and speech recognition.
Related Projects Past and Present
Nomadic Radio is an attempt towards a personalized and dynamic audio-only information environment. It uses the rich metaphor of Radio to structure simultaneous and spatialized audio streams as radio broadcasts of timely information.
The NewsComm system delivers personally selected audio information to mobile users through a hand-held audio playback device. The system provides a one-to-one connection from individual users to information providers so that users can access information on demand with breadth and depth unattainable through traditional media. Filtering mechanisms help the user efficiently find information of interest.
Back to Vismod Projects
Last Updated July 27, 1997
Maintained by Travell Perkins