Next: Bibliography Up: MIT Media Laboratory, Perceptual Previous: Conversation Analysis

Discussion

For more flexibility, we are now developing our own speech recognition engine. Using a recurrent neural network, we compute phoneme probabilities (in a 40 phoneme alphabet) in real-time. This provides a lower level representation of audio that could handle phonetically similar words, proper nouns and foreign languages.

We are also considering other output modalities. For instance, if the conversation indicates users are frustrated (i.e. many harsh words) we could toggle a bluish or greenish lighting to create a more relaxed ambiance.

The system we described required us to manually specify the outputs and design questions it would ask to imitate a human mediator or facilitator. Ultimately, however, a mediation system should be trained from real-world data where the machine studies a human mediator responding to sample meetings and participants. The machine forms a predictive model of the mediator's responses to different key words spoken during the training meeting sessions. The output (i.e. the mediator's responses) could then be automatically associated with their trigger stimuli and these could be later synthesized by the machine (i.e. via audio playback).

We demonstrated a real-time conversational context tracking system. The topic tracking performs well reliably processing natural spoken audio. This situational awareness has many applications and we have shown examples of it as a meeting augmentation tool to prompt speakers with relevant questions. For further results see:

http://www.media.mit.edu/vismod/demos/conversation.html

Next: Bibliography Up: MIT Media Laboratory, Perceptual Previous: Conversation Analysis

Tony Jebara
2000-08-17