Motion Field Histograms

In choosing our features, we were inspired by the object recognition system proposed by Schiele and Crowley [12]. Objects are represented as multidimensional histograms of vector responses of local operators. Schiele experimentally compared the invariant properties of a few receptive field functions, including Gabor filter and local derivative operators. His results showed that Gaussian derivatives provided the most robust and equivariant recognition results. Given the Gaussian distribution $G(x,y) = e^{-\frac{x^2+y^2}{\displaystyle 2\sigma^2}}$ , the first derivative in the x and y is: $Dx(x,y) = - \frac{x}{\displaystyle \sigma^2}G(x,y)$ and $Dy(x,y) = - \frac{y}{\displaystyle \sigma^2}G(x,y)$ . The Laplace operator is Lap(x,y) = G_xx(x,y) + G_yy(x,y), where $G_{xx}(x,y) = (\frac{x^2}{\displaystyle \sigma^4} - \frac{1}{\displaystyle \sigma^2})G(x,y)$ , $G_{yy}(x,y) = (\frac{y^2}{\displaystyle \sigma^4} - \frac{1}{\displaystyle \sigma^2})G(x,y)$ .

In our experiment, we used only the first derivative and the Laplacian at two different scales resulting in a 6 dimensional histogram. The resolution of the histogram axis was either 16 or 32 pixels. For more details on creating the histograms please refer to [12].

The probability of an object O_n given local measurement M_k is obtained using Bayes' rule:

$\begin{displaymath}p(O_n\vert M_k) = \frac{p(M_k\vert O_n)p(O_n)}{\displaystyle p(M_k)}\end{displaymath}$

where p(O_n) is the prior probability of the object which is known and p(M_k) as the prior probability of the filter output which is measured as $\sum_{i}{p(M_k\vert O_i)p(O_i)}$ . So, p(M_k|O_i), the probability density of an object O_n differs from the multi-dimensional histogram of an object by a normalization term. If we have K independent measurements M₁, M₂, ... ..., M_K then the probability of the object O_n is:

$\begin{displaymath}p(O_n\vert M_1, M_2, ... ,M_K) = \frac{\prod_{k}{p(M_k\vert O_n)p(O_n)}}{\displaystyle \prod_{k}{p(M_k)}}\end{displaymath}$

To ensure independence between measurements, we choose the minimum distance $d(M_{k_1}, M_{k_2}) \geq 2\sigma$ between two measurements M_k₁ and M_k₂. The measurement locations can be chosen arbitrarily and it is not necessary have measurements at corresponding points and only a certain number of local receptive field vectors need to be calculated, the method is fast and robust to partial occlusion.

**Figure:** Example blink sequence
$\begin{figure} \center \begin{tabular}[htb]{c} \epsfxsize=3.0in \epsfysize=0... ...fbox{/u/tanzeem/Thesis/Docs/Thesis/Images/blink1.ps} \end{tabular}\end{figure}$

**Figure:** Difference images of the sequence
$\begin{figure}\center \begin{tabular}[htb]{c} \epsfxsize=3.0in \epsfysize=0.... ...fbox{/u/tanzeem/Thesis/Docs/Thesis/Images/blink2.ps} \end{tabular}\end{figure}$

Because we are trying to distinguish changes in the same object as opposed to different objects we incorporate some motion cues into our histograms by using difference images, which significantly improve the performance. To capture both fast and slow changes, temporal differencing should be done at different rates. However, for short time scale or fast expressions it is enough to have consecutive frame differencing of images recorded at 30 frames/second. In the original framework of Schiele and Crowley [12] the histograms were compared directly using the $\chi^2$ statistic, histogram intersection, or mahalanobis distance. In our case, it is important to see changes in the histograms over time for an expression rather than compare histograms. Thus, to compactly represent the histograms and reduce the dimensionality for temporal modeling we take the PCA of the input histograms and use the top 20 eigenvectors, which capture 90% of the variance to represent the histogram space.