Despite many important advances in speech recognition the technology
is still brittle, preventing its proliferation in the HCI
community. Therefore, it is unlikely to have a system that will
accurately respond to key words on a one-by-one basis. To circumvent
this problem, we look at the frequencies of the past few words
(i.e. 200 words) to determine the topic. Therefore, despite a poor
accuracy on the speech recognizer (i.e. about
recognition
rate), the aggregate performance over a set of 200 words for
topic-spotting (as opposed to word-spotting) could be well above
.
The topic-spotting, in turn, tells the system what
feedback to generate.
In addition, the system is used in a meeting scenario where people are actively generating many words. This is a far better situation for a recognizer than when the computer is interacting with a single user and has to recognize a sentence at a time in a turn-taking situation. Such query-response systems are far too brittle except in constrained applications.
We now discuss the details of the topic detection system.