In choosing our features, we were inspired by the object recognition system proposed by Schiele and Crowley [12]. Objects are represented as multidimensional histograms of vector responses of local operators. Schiele experimentally compared the invariant properties of a few receptive field functions, including Gabor filter and local derivative operators. His results showed that Gaussian derivatives provided the most robust and equivariant recognition results. Given the Gaussian distribution , the first derivative in the x and y is: and . The Laplace operator is Lap(x,y) = Gxx(x,y) + Gyy(x,y), where , .
In our experiment, we used only the first derivative and the Laplacian at two different scales resulting in a 6 dimensional histogram. The resolution of the histogram axis was either 16 or 32 pixels. For more details on creating the histograms please refer to [12].
The probability of an object On given local measurement Mk is obtained using Bayes' rule:
where p(On) is the prior probability of the object which is known and p(Mk) as the prior probability of the filter output which is measured as . So, p(Mk|Oi), the probability density of an object On differs from the multi-dimensional histogram of an object by a normalization term. If we have K independent measurements M1, M2, ... ..., MK then the probability of the object On is:
To ensure independence between measurements, we choose the minimum distance between two measurements Mk1 and Mk2. The measurement locations can be chosen arbitrarily and it is not necessary have measurements at corresponding points and only a certain number of local receptive field vectors need to be calculated, the method is fast and robust to partial occlusion.
Because we are trying to distinguish changes in the same object as opposed to different objects we incorporate some motion cues into our histograms by using difference images, which significantly improve the performance. To capture both fast and slow changes, temporal differencing should be done at different rates. However, for short time scale or fast expressions it is enough to have consecutive frame differencing of images recorded at 30 frames/second. In the original framework of Schiele and Crowley [12] the histograms were compared directly using the statistic, histogram intersection, or mahalanobis distance. In our case, it is important to see changes in the histograms over time for an expression rather than compare histograms. Thus, to compactly represent the histograms and reduce the dimensionality for temporal modeling we take the PCA of the input histograms and use the top 20 eigenvectors, which capture 90% of the variance to represent the histogram space.