## TR#533: A family of algorithms for approximate Bayesian inference

Thomas P. Minka

MIT PhD thesis
One of the major obstacles to using Bayesian methods for pattern
recognition has been its computational expense. This thesis presents an
approximation technique that can perform Bayesian inference faster and more
accurately than previously possible. This method, ``Expectation
Propagation,'' unifies and generalizes two previous techniques:
assumed-density filtering, an extension of the Kalman filter, and loopy
belief propagation, an extension of belief propagation in Bayesian
networks. The unification shows how both of these algorithms can be viewed
as approximating the true posterior distribution with a simpler
distribution, which is close in the sense of KL-divergence. Expectation
Propagation exploits the best of both algorithms: the generality of
assumed-density filtering and the accuracy of loopy belief propagation.

Loopy belief propagation, because it propagates exact belief states, is
useful for limited types of belief networks, such as purely discrete
networks. Expectation Propagation approximates the belief states with
expectations, such as means and variances, giving it much wider scope.
Expectation Propagation also extends belief propagation in the opposite
direction---propagating richer belief states which incorporate
correlations between variables.

This framework is demonstrated in a variety of statistical models using
synthetic and real-world data. On Gaussian mixture problems, Expectation
Propagation is found, for the same amount of computation, to be
convincingly better than rival approximation techniques: Monte Carlo,
Laplace's method, and variational Bayes. For pattern recognition,
Expectation Propagation provides an algorithm for training Bayes Point
Machine classifiers that is faster and more accurate than any previously
known. The resulting classifiers outperform Support Vector Machines on
several standard datasets, in addition to having a comparable training
time. Expectation Propagation can also be used to choose an appropriate
feature set for classification, via Bayesian model selection.

Postscript