The choice of representation for the color-space can have important consequences for a classification system. As always, a properly chosen representation can make certain operations easier. When tracking a person in a room, it is often necessary to eliminate shadows caused by white, or nearly white, lights. Choosing a color-space representation that makes this easy is a good thing.
There are many color spaces to chose from, each with their own special
strengths. However, video digitization hardware tends to provide only a
limited selection of formats, and since applying a transform to each pixel
is very expensive, the only real choice is usually between RGB, and YUV.
The relationship between these two color-spaces is the linear transform
described by Equation B.1:
However, the important differences between RGB and YUV are best illustrated
by Figure B.1. By transforming an RGB color cube into YUV
space, it is easy to see that the luma, or brightness, component Y, is
orthogonal to the chroma, or color, components U and V. As described
in Section 2.3.2, normalization of shadows involves projecting
pixel values onto a luma-invariant plane in color-space. In YUV space,
normalizing for a white luminant is accomplished simply by discarding the
Y component.
The left images shows an RGB color cube in RGB color space. The
vertical axis is green (G). The right image shows the same cube
transformed into YUV space where the vertical axis is Y
(luma). Decisions to based only on color (chroma), can be projected
into the U-V plane simply by discarding the Y coordinate.
Flesh tracking is another operation that the YUV color-space makes easier. If classification is done in the luma-invariant subspace, then a class trained on even an unrepresentative sample population will reliably track flesh across a wide range of skin tones. This convenient outcome derives from the fact that skin pigmentation is always the same color. Varying concentrations only cause variance in luminance.