Next: Discussion Up: Maximum Conditional Likelihood via Previous: Bounding Gate Covariances

Results

The CEM algorithm updates the conditioned mixture of Gaussians by computing h_im and r_im in the CE steps and interlaces these with updates on the experts, mixing proportions, gate means and gate covariances. For the mixture of Gaussians, each CEM update has a computation time that is comparable with that of an EM update (even for high dimensions). However, conditional likelihood (not joint) is monotonically increased.

**Figure:** Conditional Density Estimation for CEM and EM
$\begin{figure}\center \begin{tabular}[b]{cccccc} \epsfxsize=0.7in \epsfysize=... ...) EM $p(y\vert x)$ } & {\small (f) EM $l^c$\space } \end{tabular} \end{figure}$

Consider the 4-cluster (x,y) data in Figure 3(a). The data is modeled with a conditional density p(y|x) using only 2 Gaussian models. Estimating the density with CEM yields the p(y|x) shown in Figure 3(b). CEM exhibits monotonic conditional likelihood growth (Figure 3(c)) and obtains a more conditionally likely model. In the EM case, a joint p(x,y) clusters the data as in Figure 3(d). Conditioning it yields the p(y|x) in Figure 3(e). Figure 3(f) depicts EM's non-monotonic evolution of conditional log-likelihood. EM produces a superior joint likelihood but an inferior conditional likelihood. Note how the CEM algorithm utilized limited resources to capture the multimodal nature of the distribution in y and ignored spurious bimodal clustering in the xfeature space. These properties are critical for a good conditional density p(y|x).

Table: Test Results. Class label regression accuracy data. (CNN0=cascade-correlation, 0 hidden units, CCN5=5 hidden LD=linear discriminant).

Algorithm	CCN0	CCN5	C4.5	LD	EM2	CEM2
Abalone	24.86%	26.25%	21.5%	0.0%	22.32%	26.63%

For comparison, standard databases were used from UCI ². Mixture models were trained with EM and CEM, maximizing joint and conditional likelihood respectively. Regression results are shown in Table 1. CEM exhibited, monotonic conditional log-likelihood growth and out-performed other methods including EM with the same 2-Gaussian model (EM2 and CEM2).

Next: Discussion Up: Maximum Conditional Likelihood via Previous: Bounding Gate Covariances

Tony Jebara
2000-03-20