Next: Color Spaces Up: No Title Previous: Future Work

Multi-Class Classification

Classification is the process of sorting feature vectors into categories, or classes. Feature vectors are points in a feature space that is defined to be some collection of measurements, either raw or pre-processed. The process of transforming feature vectors into classification tags is a richly studied topic known as pattern recognition. A brief description of the main ideas used by Pfinder follows, but a very thorough discussion of this material can be found in Therrien[21].

Video chroma-key segmentation is an instructive place to start. Chroma-keying is the process of identifying pixels in an image sequence that are of a particular color, usually for the purpose of compositing two video signals. The classes for chroma-keying are foreground and background. The features are raw video pixel values. The only thing the keyer models is the color of the background pixels. The crudest (and least effective) keyer would simply compare each pixel in the frame with the target color and label them with the with the result of the comparison: equality indicates a background pixel, inequality indicates a foreground pixel.

Since there is likely to be noise in the video signal, the crude approach is doomed to failure. The keyer must assume a certain neighborhood, in color space, around the target color that must be classified as background along with the target color. Classification then involves computing some distance to the target color and comparing that distance to a tunable threshold: less than threshold indicates background, greater indicates foreground.

The distribution of the noise is unlikely to be isotropic in the feature space. A more general keyer might model the distribution of noise and compare distances in a normalized space, instead of the somewhat arbitrary feature space. This is the case illustrated in Figure A.1. The mean is the target background color. The concentric ellipses represent equidistant contours in the normalized space. The threshold is a tunable parameter that moves the decision boundary closer or farther away from the mean. All points inside the decision boundary are labeled as belonging to the class. In the case of the keyer, pixels in this region of feature space are labeled as background pixels.

Figure A.1:

Single-sided classification in the 2-D case. The concentric ellipses represent lines of equal probability away from the mean.

If the noise model is Gaussian, with mean , and covariance , then the normalized distance measure is called the Mahalanobis distance. Give a measurement , this distance can be computed with the following equation:

The mean, , of the Gaussian is the target color in the chroma-key example, and the covariance, , describes the first-order distribution of the noise.

The main problem with this approach is the threshold. Given a detailed model of the noise, and a desired level of classification performance, it is possible to analytically pick values for the threshold. In practice such detailed models are rare, and are not necessarily stationary. At best, the threshold must be chosen through trial-and-error. At worst, it must be retuned often. Our experience with single-sided classification techniques for person/room segmentation showed that, due to lighting and shadowing, it was necessary to retune the threshold on a frame-to-frame basis. This is an unacceptable situation.

Fortunately, there is well-behaved, analytic solution to the threshold problem: use more than one class. If the task is to separate foreground from background, then model both classes. To classify a measurement , calculate the distance to each class and choose the smaller distance:
equation452
The decision boundary that results from this process is the line of equi-probability between the two classes. The two-class situation is illustrated in Figure A.2.

Figure A.2:

Double-sided classification in the 2-D case. The concentric ellipses represent lines of equal probability away from the means. The decision boundary lies where the equal probability lines cross.

Extension to more than two classes is straightforward. The distance to each class is computed, and the classes with the smallest distance labels the pixel:

This is the situation inside Pfinder as described in Chapter 2.

Our experience shows that two-class classification results in better segmentation than single-sided classification. This is the case even when the foreground isn't well modeled by a single Gaussian distribution in color space, because the foreground is a person wearing blue jeans and a white shirt. The single-sided classification is essentially a two-class decision between a Gaussian and a uniform distribution. Even if the foreground class has a large variance, it still contains more information than the uniform distribution, and this leads to better decisions.

Next: Color Spaces Up: No Title Previous: Future Work

Christopher R. Wren
Wed Feb 25 14:56:43 EST 1998