Topic Classification

Next: Prompt Selection Up: Implementation Previous: Model Training

Topic Classification

**Figure:** Plot of class probabilities.
$\begin{figure}\psfig{figure=topics.eps,height=2in,width=3.3in}\end{figure}$

After training data is collected and class models are built, the system begins receiving audio input from speakers. A matching algorithm sequentially updates a conversation history ( ${\bf x}$ ) which counts the frequency of most recently spoken words and weights them by their recency (which is slowly decaying). The conversation history ${\bf x}$ (i.e. a 30,000 dimensional vector of counts of past words), is updated at each step after receiving a new word word_k by decaying ${\bf x}$ and adding a count of one for the new word:

$\begin{displaymath}x_i^t = \alpha x_i^{t-1} + \delta(k, i) \end{displaymath}$

(2)

where $\alpha$ is the decay parameter, $\delta(k, i)$ equals 1 if the the word_k is the same word as x_i (i.e. i = k). Given the conversation history at time t, its class-conditional probability is computed as follows:

$\begin{displaymath}P({\bf x}\vert c) = \prod_i P(word_i\vert c)^{x_i} \end{displaymath}$

(3)

This probability is converted into the posterior for topic c using Bayes' rule. The prior probabilities P(c) are scalars (one per topic class) estimated by cross-validation:

$\begin{displaymath}P(c\vert{\bf x}) = {P({\bf x}\vert c) P(c) \over \sum\limits_{k=1}^{C} P({\bf x}\vert k) P(k)} \end{displaymath}$

(4)

Fig. 2 shows class probabilities for the ongoing conversation. After these probabilities are computed for each class the most likely topic c is selected and the corresponding feedback is given to the users as described below.

Next: Prompt Selection Up: Implementation Previous: Model Training

Tony Jebara
2000-08-17