Speech Technology

Despite many important advances in speech recognition the technology is still brittle, preventing its proliferation in the HCI community. Therefore, it is unlikely to have a system that will accurately respond to key words on a one-by-one basis. To circumvent this problem, we look at the frequencies of the past few words (i.e. 200 words) to determine the topic. Therefore, despite a poor accuracy on the speech recognizer (i.e. about $50\%$ recognition rate), the aggregate performance over a set of 200 words for topic-spotting (as opposed to word-spotting) could be well above $90\%$. The topic-spotting, in turn, tells the system what feedback to generate.

In addition, the system is used in a meeting scenario where people are actively generating many words. This is a far better situation for a recognizer than when the computer is interacting with a single user and has to recognize a sentence at a time in a turn-taking situation. Such query-response systems are far too brittle except in constrained applications.

We now discuss the details of the topic detection system.

Tony Jebara