The Facilitator Room

Despite advances in interface technologies, the computer is still seen as either an elaborate tool requiring detailed instructions (the workstation view) or as a black-box replacement for a human (the automation view). In this project, we are exploring a different role for machines -- namely, that of a "facilitator" for our everyday activities. Instead of requiring dozens of mouseclicks, such a machine would do what we want when we want, perhaps without even being asked. As a facilitator, the computer would be an active participant in our activities, not just an observer/servant. The key for such a system to succeed is for it to understand our ways of doing things and to fit itself naturally into them. In order to for this to happen, the system needs to be perceptually aware of our activities at many levels. At the lowest level, it needs to be able to recognize how many people are in the room, who is speaking when, and what sorts of motions we are making. Beyond this, it must estimate the "interaction context" in the room -- are people watching a movie, having an informal lunch, or in the midst of a raging debate?

This requires estimating a range of speech and expression features and also doing higher level pattern recognition on the time evolution of these parameters. This sort of information is critical if the system is to be reactive in a useful way. Furthermore, we wish to gather this information in an unemcumbering way -- we don't want to weigh down the participants with headset microphones, special clothing, etc., in the interests of simplifying the pattern recognition. We believe that to provide natural interfaces, our systems must adapt to the way people behave/look/sound, and not the vice versa. Last, in order to be a participant, the system must be able to act on the room as well. It can do this by playing music, bringing up relevant information, giving private cues to participants, and so on.

The "facilitator room" is a conference room that we have outfitted with the sensors and actuators necessary to pursue these goals. There are currently five cameras whose orientation and zoom are computer controlled, a phased array of microphones, and several LCD projectors controlling both information display and lighting. We are now drawing on our collective backgrounds in computer vision, auditory scene analysis, machine learning, and human-computer interaction to develop a variety of facilitator capabilities. One of our first demos in this infrastructure is the "Magic Board," by Francois Berard and Yann Laurillau. This is an interactive, combined physical/electronic whiteboard that uses computer vision techniques to allow natural control of its image manipulation features. Another is Egon Pasztor's "Rhythm Games," which uses similar techniques to provide a natural interface to a physical/virtual musical automaton that bounces back and forth around lines on a physical whiteboard. Third is a "Conversation Aid" by Tony Jebara and Yuri Ivanov, which recognizes the general topic of discussion and suggests new items for discussion when there is a lull in the conversation. The last is a "Mediator" by Sumit Basu, which tracks who is speaking when and issues warning when a participant is dominating the conversation or being left out. In addition to these projects, many of the core technologies listed below are available for individual demonstration and more detailed descriptions.

Core technologies:

Face detection/tracking: Ali Rahimi
Expression estimation: Tanzeem Choudhury
Body tracking: Egon Pasztor, Chris Wren
User Interface: Francois Berard , Yann Laurillau
Audio Localization: Sumit Basu
Speech Processing: Sumit Basu
Machine Learning/Pattern Recognition: Tony Jebara, Brian Clarkson,Yuri Ivanov, Sumit Basu

Demos:

Dynaman: Egon Pasztor, Chris Wren
Magic Board: Francois Berard , Yann Laurillau
Rhythm Games: Egon Pasztor
Conversation Aid: Tony Jebara,Yuri Ivanov
Audio Localization: Sumit Basu