System Overview

Next: Background Up: Introduction - The Computer Previous: Introduction - The Computer

System Overview

The system is portrayed in Figure 2. It consists of a number of microphones (either head-mounted or clip-on, wireless mikes), a video camera and a large projection screen. The users sit at a meeting table and engage in a conversation. While they speak, the microphones feed a commercial speech recognizer which detects words. The frequency of the past few words is used to compute the general topic area of the conversation and establishes the situational context. In addition, the video camera detects frontal faces using computer vision. When either of the users looks at the screen, the computer gets a reinforcement signal. Feedback from the computer mediator is generated using the projection screen which asks relevant questions and can provide other audio-visual augmentations to the current conversation. To train the topic spotter, we obtain text documents that are representative of each topic class we wish to track (i.e. 'medicine', 'politics', 'religion', etc.). The system analyzes the statistics of these text documents to later detect active subject from the words the users generate. For example, if the users are talking about medicine, key words such as 'doctor', 'health', 'cancer', etc. will be detected and help isolate the topic.

Next: Background Up: Introduction - The Computer Previous: Introduction - The Computer

Tony Jebara
2000-02-24