Current applications make use of the information that Pfinder produces by making implicit assumptions about how the that information relates to the state of the user. Wren and Pentland[23] have begun work on a system that utilizes dynamic and stochastic components to explicitly model the human body.
The first version of the system assumed a 2-D dynamic model of the user. Figure 6.1 shows several frames from a 5-link, 2-D model as it was interactively driven by a use through Pfinder. Information from Pfinder determines the potential field that is applied to the model. The model filters these influences through the system dynamics to arrive at the lowest energy solution. This solution becomes the new pose estimate. This version obviously suffers from an over simplified body model.
Figure 6.1:
The 2-D estimate of the user's upper-body state given Pfinder vision
input
The next step was to extend the modeling system to 3-D . Figure 6.2 shows a 5-link, 3-D model being driven by a user through STIVE (a wide-baseline stereo system built on Pfinder technology, see Section 5.2). This model is reacting to the combination of several potential fields: 3-D head and hand positions from STIVE, gravity, and a behavioral prior that affects elbow placement.
The 3-D estimate of the user's upper-body state given STIVE vision
input (STIVE is a Pfinder-based, wide-baseline stereo system, See
Section 5.2)
This last potential field is particularly interesting. It's a crude attempt at coding behavioral priors in the form of a potential field. With only the STIVE input and gravity, the model places the elbows too close to the body, compared to the user. Solving for the minimum energy in this potential field yields the wrong answer because the user is constrained by more than just physics; the user also has many habits that constrain their motion. Future work in this direction involves codifying those habits statistically so they can be used to better estimate body pose from 3-D\ vision data. Eventually, this knowledge may be useful in estimating 3-D\ body pose from 2-D vision data.