The dynamic skeleton model currently includes the upper body and arms. Figure 7 shows the real-time response to various target postures. The model interpolates those portions of the body state that are not measured directly, such as the upper body and elbow orientation, by use of the model's intrinsic dynamics and the behavior (control) model. The model also rejects noise that is inconsistent with the dynamic model. Table 2 compares RMS noise in the dynamic model output with noise in the underlying feature tracker. The ``line following'' test measures error from the best-fit line to data produced by constraining the users hand to move along a linear trajectory. The ``rotational jitter'' measures error to a smoothed version of data obtained by smooth motions of the user's hand through a rotation.
|
It can be seen that Figure 8 illustrates another advantage of feedback from higher-level models to the low-level vision system. Without feedback, the 2-D tracker fails if there is even partial self-occlusion from a single camera's perspective. With feedback, information from the dynamic model can be used to resolve ambiguity during 2-D tracking.
The model predictions also stabilize tracking by providing constraints that help the tracking algorithm reject distractions in the environment. The addition of another person to the scene, as in Figure 9, produces many patches in the image that are similar to the target blobs. Without high-level model knowledge, the 2-D tracker can only reject these distractions based on some assumptions about the temporal stability of blobs. With the addition of high-level feedback, however, the 2-D tracker now has information about the physical constraints of the underlying system. Consequently, it is generally not distracted by competing targets (such as other people).
Figure 10 illustrates a simple occlusion example, where a hand can occlude the face from the viewpoint of one or both cameras. Without feedback to guide the low-level vision systems, tracking fails. The top-right graph in Figure 10 shows this case. The ambiguity caused by the occlusion caused one camera to mislabel the head and tight hand. This correspondence error causes stereo estimation to fail, resulting in an erroneous head position (second cluster of blue data points) and an erroneous path for the right hand (red data points).
The bottom two plots in Figure 10 show correct tracking when feedback is enabled, either with the observer-based system, or with the Kalman filters.
Figure 11 illustrates a more complex occlusion example, where the hands are moving in circular paths, repeatedly occluding each other simultaneously in both cameras. Without feedback, again, tracking fails. Even though the head is never occluded, the system fails badly enough to require several re-initializations. Ambiguities encountered in this process lead to the head being mislabeled in several instances. The scattering of blue (head position) data points in the top-right graph in Figure 11 is a result of these errors.
The bottom-left graph in Figure 11 shows the observer-based feedback system not faring much better. Compared to the previous case, there are very few head tracking errors. the feedback system has, at least, eliminated the need for constant, bottom-up re-initializations. The observer just can't keep up with the frequent occlusions. It seems to need more time to stabilize its estimates between occlusions.
The bottom-right plot in Figure 11 show correct tracking with the Kalman filter based feedback system. The track paths aren't entirely smooth. They should be smooth given that the actual physical process behind the data was smooth. These aberrations are probably caused by the significant increase in measurement noise associated with occlusions.