next up previous
Next: Discussion Up: Maximum Conditional Likelihood via Previous: Bounding Gate Covariances

Results

The CEM algorithm updates the conditioned mixture of Gaussians by computing him and rim in the CE steps and interlaces these with updates on the experts, mixing proportions, gate means and gate covariances. For the mixture of Gaussians, each CEM update has a computation time that is comparable with that of an EM update (even for high dimensions). However, conditional likelihood (not joint) is monotonically increased.


  
Figure: Conditional Density Estimation for CEM and EM
\begin{figure}\center
\begin{tabular}[b]{cccccc}
\epsfxsize=0.7in
\epsfysize=...
...) EM $p(y\vert x)$ } &
{\small (f) EM $l^c$\space }
\end{tabular}
\end{figure}

Consider the 4-cluster (x,y) data in Figure 3(a). The data is modeled with a conditional density p(y|x) using only 2 Gaussian models. Estimating the density with CEM yields the p(y|x) shown in Figure 3(b). CEM exhibits monotonic conditional likelihood growth (Figure 3(c)) and obtains a more conditionally likely model. In the EM case, a joint p(x,y) clusters the data as in Figure 3(d). Conditioning it yields the p(y|x) in Figure 3(e). Figure 3(f) depicts EM's non-monotonic evolution of conditional log-likelihood. EM produces a superior joint likelihood but an inferior conditional likelihood. Note how the CEM algorithm utilized limited resources to capture the multimodal nature of the distribution in y and ignored spurious bimodal clustering in the xfeature space. These properties are critical for a good conditional density p(y|x).


 
Table: Test Results. Class label regression accuracy data. (CNN0=cascade-correlation, 0 hidden units, CCN5=5 hidden LD=linear discriminant).
Algorithm CCN0 CCN5 C4.5 LD EM2 CEM2
Abalone 24.86% 26.25% 21.5% 0.0% 22.32% 26.63%
 

For comparison, standard databases were used from UCI 2. Mixture models were trained with EM and CEM, maximizing joint and conditional likelihood respectively. Regression results are shown in Table 1. CEM exhibited, monotonic conditional log-likelihood growth and out-performed other methods including EM with the same 2-Gaussian model (EM2 and CEM2).


next up previous
Next: Discussion Up: Maximum Conditional Likelihood via Previous: Bounding Gate Covariances
Tony Jebara
2000-03-20