Mapping Residuals to Spatial Uncertainty

Recall that Kalman filtering uses a noise covariance matrix to describe the expected noise on input measurements. Traditionally, the noise covariance matrix is denoted R and is $n \times n$ where n is the number of measurements in the observation vector $\bf {y}$ . The role of R in the computation of the Kalman gain matrix described by Equation 9. Adaptive Kalman filtering [4] proposes the use of a dynamically varying R matrix that changes with the arrival of new observation vectors to model the confidence of the new data. By changing R using the values of the residuals of the 2D correlation based trackers, we can assign a weight on the observations they provide and end up with a more robust overall estimate of internal state.

At this stage, we address the issue of relating residuals from the correlation-based trackers to the noise covariance matrix on the feature points being tracked for Kalman filtering. We propose fitting a function that models the residual as a function of spatial uncertainty.

Consider, first, the simple case of SSD tracking with only translational motion. We observe the residuals between an image patch $I(\bf {x}\rm ,0)$ and the same image patch after a given translation $I(\bf f \rm ( \bf {x}\rm ,\bf {\mu}\rm ),0)$ . The residual is expected to grow as alignment errors increase and this value is plotted over various perturbations in x and y translation $(\Delta x,\Delta y)$ as shown in Figure 10. We can model this residue function with respect to $(\Delta x,\Delta y)$ as a 2D paraboloid centered at (0,0)by sampling various values of $(\Delta x,\Delta y)$ and fitting in a least-squares sense. The result of this fitting is a fitted paraboloid as shown in Figure 10. Note that these residual error functions or paraboloids have different shapes for different textures over which normalized correlation is to be applied.

**Figure:** Residuals for a Correlation Template and Parabolic Approximation
$\begin{figure}\center \begin{tabular}[b]{c} \epsfysize=2.0in \epsfbox{eyePara.eps} \\ Eye Residuals \end{tabular}\end{figure}$

Extending this concept to 4D (the true dimension of $\bf {\mu}$ in our application), we can compute a 4D paraboloid which maps the spatial error in alignment to correlation residue error. This process is performed on all image patches being tracked each time the system is initialized. The perturbations on $\bf {\mu}$ are computed for a variety of $\Delta \bf X \rm = (\Delta X_1, \Delta Y_1, \Delta X_2, \Delta Y_2)$ perturbations and a 4D paraboloid of the form in Equation 10 is found.

$\displaystyle \begin{array}{c} \sqrt{SSD} = \Delta \bf X \rm\par\left [ \begin{... ... & a_{yn} & a_{mn} & a_{nn} \end{array}\right ] \Delta \bf X \rm ^T \end{array}$

(10)

Having solved this 4D paraboloid, we can find the 4D ellipsoid that corresponds to a given value of residue directly. The surface defined by the 4D ellipsoid is essentially the error window on the current estimate of (X₁, Y₁, X₂, Y₂) from the correlation based tracking. Under the paraboloid noise model, it is straightforward to show that the 4D iso-residual surface (the ellipsoid) is also a 4D iso-probability surface for a Gaussian model of the spatial noise on the current estimate of (X₁, Y₁, X₂, Y₂). Thus, the Gaussian error on the current feature points can be estimated by the following $4\times4$ covariance matrix in Equation 11.

$\displaystyle \begin{array}{cc} C \propto & \sqrt{SSD} \left [ \begin{array}{cc... ...n} \\ a_{xn} & a_{yn} & a_{mn} & a_{nn} \end{array}\right ] ^ {-1} \end{array}$

(11)

For each of the N correlation windows in the tracking, a 4x4 sub matrix of the form of C can be computed and these are placed into the matrix R in the Kalman filter. For feature i, we compute a noise covariance C_i and place it into R which becomes block-diagonal as shown in Equation 12.

At each iteration, the rotation, scaling and residue of a correlation window determine the rotation and scaling of the covariance sub-matrix C_i associated with it. Thus, R is adaptively adjusted to reflect the noise on the spatial position of the feature points being tracked. In addition, these covariances are determined by a sensitivity analysis and are specialized to the noise characteristics of the particular texture being tracked.

Thus, at each iteration, we have an appropriate weighting of feature tracks determined by the current orientation and scale of the correlation trackers as well as their residual values and the spatial sensitivity of the textures they have been initialized to track. The Kalman filter abstracts the rest of the estimation and returns the structure, motion and camera geometry optimally from the weighted set of inputs.