next up previous
Next: Bounding Gate Covariances Up: CEM and Bound Maximization Previous: CEM and Bound Maximization

Bounding Gate Means

Taking derivatives of Q and setting to 0 is not as straightforward for the case of the gate means (even though they are decoupled). What is desired is a simple update rule (i.e. computing an empirical mean). Therefore, we further bound the Q function for the M-step. The Q function is actually a summation of sub-elements Qim and we bound it instead by a summation of quadratic functions on the means (Equation 11).



 
$\displaystyle Q(\Theta^t,\Theta^{(t-1)}) ~ = ~ \sum_{i=1}^N \sum_{m=1}^M
Q(\The...
... ~ \sum_{i=1}^N \sum_{m=1}^M k_{im} -
w_{im} \Vert \mu_x^m-{\bf c}_{im} \Vert^2$     (11)


Each quadratic bound has a location parameter ${\bf c}_{im}$ (a centroid), a scale parameter wim (narrowness), and a peak value at kim. The sum of quadratic bounds makes contact with the Qfunction at the old values of the model $\Theta^{t-1}$ where the gate mean was originally $\mu_x^{m*}$ and the covariance is $\Sigma_{xx}^{m*}$. To facilitate the derivation, one may assume that the previous mean was zero and the covariance was identity if the data is appropriately whitened with respect to a given gate.

The parameters of each quadratic bound are solved by ensuring that it contacts the corresponding Qim function at $\Theta^{t-1}$ and they have equal derivatives at contact (i.e. tangential contact). Solving these constraints yields quadratic parameters for each gate m and data point i in Equation 12 (kim is omitted for brevity).



 
$\displaystyle \begin{array}{lll}
{\bf c}_{im} & = & \frac{1}{2w_{im}} ( {\hat h...
...}_i}^T{\mu_x^m}
}
{{\mu_x^m}^T {\mu_x^m}} + \frac{{\hat h}_{im}}{2}
\end{array}$     (12)


The tightest quadratic bound occurs when wim is minimal (without violating the inequality). The expression for wim reduces to finding the minimal value, wim*, as in Equation 13 (here $\rho^2 = {\bf x}_i^T {\bf
x}_i$). The f function is computed numerically only once and stored as a lookup table (see Figure 2(a)). We thus immediately compute the optimal wim* and the rest of the quadratic bound's parameters obtaining bounds as in Figure 2(b) where a Qim is lower bounded.



 
$\displaystyle w_{im}^* = r_i \alpha_m
\stackrel{\max}{c} \{ e^{-\frac{1}{2} \rh...
...m}}{2}
= r_i \alpha_m
e^{-\frac{1}{2} \rho^2}
f(\rho) + \frac{{\hat h}_{im}}{2}$     (13)



  
Figure: Bound Width Computation and Example Bounds
\begin{figure}\center
\begin{tabular}[b]{cccc}
\epsfxsize=1in
\epsfbox{mulut....
...
(c) $g$\space Function &
(d) Bound on $\Sigma_{xx}$ \end{tabular}
\end{figure}

The gate means $\mu_x^m$ are solved by maximizing the sum of the $M
\times N$ parabolas which bound Q. The update is $\mu_x^m = ( \sum
w_{im}^* {\bf c}_{im}) ~ (\sum w_{im}^*)^{-1}$. This mean is subsequently unwhitened to undo earlier data transformations.


next up previous
Next: Bounding Gate Covariances Up: CEM and Bound Maximization Previous: CEM and Bound Maximization
Tony Jebara
2000-03-20