Bounding Gate Means

Next: Bounding Gate Covariances Up: CEM and Bound Maximization Previous: CEM and Bound Maximization

Bounding Gate Means

Taking derivatives of Q and setting to 0 is not as straightforward for the case of the gate means (even though they are decoupled). What is desired is a simple update rule (i.e. computing an empirical mean). Therefore, we further bound the Q function for the M-step. The Q function is actually a summation of sub-elements Q_im and we bound it instead by a summation of quadratic functions on the means (Equation 11).

$\displaystyle Q(\Theta^t,\Theta^{(t-1)}) ~ = ~ \sum_{i=1}^N \sum_{m=1}^M Q(\The... ... ~ \sum_{i=1}^N \sum_{m=1}^M k_{im} - w_{im} \Vert \mu_x^m-{\bf c}_{im} \Vert^2$

(11)

Each quadratic bound has a location parameter ${\bf c}_{im}$ (a centroid), a scale parameter w_im (narrowness), and a peak value at k_im. The sum of quadratic bounds makes contact with the Qfunction at the old values of the model $\Theta^{t-1}$ where the gate mean was originally $\mu_x^{m*}$ and the covariance is $\Sigma_{xx}^{m*}$ . To facilitate the derivation, one may assume that the previous mean was zero and the covariance was identity if the data is appropriately whitened with respect to a given gate.

The parameters of each quadratic bound are solved by ensuring that it contacts the corresponding Q_im function at $\Theta^{t-1}$ and they have equal derivatives at contact (i.e. tangential contact). Solving these constraints yields quadratic parameters for each gate m and data point i in Equation 12 (k_im is omitted for brevity).

$\displaystyle \begin{array}{lll} {\bf c}_{im} & = & \frac{1}{2w_{im}} ( {\hat h... ...}_i}^T{\mu_x^m} } {{\mu_x^m}^T {\mu_x^m}} + \frac{{\hat h}_{im}}{2} \end{array}$

(12)

The tightest quadratic bound occurs when w_im is minimal (without violating the inequality). The expression for w_im reduces to finding the minimal value, w_im^*, as in Equation 13 (here $\rho^2 = {\bf x}_i^T {\bf x}_i$ ). The f function is computed numerically only once and stored as a lookup table (see Figure 2(a)). We thus immediately compute the optimal w_im^* and the rest of the quadratic bound's parameters obtaining bounds as in Figure 2(b) where a Q_im is lower bounded.

$\displaystyle w_{im}^* = r_i \alpha_m \stackrel{\max}{c} \{ e^{-\frac{1}{2} \rh... ...m}}{2} = r_i \alpha_m e^{-\frac{1}{2} \rho^2} f(\rho) + \frac{{\hat h}_{im}}{2}$

(13)

**Figure:** Bound Width Computation and Example Bounds
$\begin{figure}\center \begin{tabular}[b]{cccc} \epsfxsize=1in \epsfbox{mulut.... ... (c) $g$\space Function & (d) Bound on $\Sigma_{xx}$ \end{tabular} \end{figure}$

The gate means $\mu_x^m$ are solved by maximizing the sum of the $M \times N$ parabolas which bound Q. The update is $\mu_x^m = ( \sum w_{im}^* {\bf c}_{im}) ~ (\sum w_{im}^*)^{-1}$ . This mean is subsequently unwhitened to undo earlier data transformations.

Next: Bounding Gate Covariances Up: CEM and Bound Maximization Previous: CEM and Bound Maximization

Tony Jebara
2000-03-20