Action Recognition
These pages briefly describe the current state
of our action recognition system. The system first computes low-level evidence directly
from the trajectories.
|
"Chalkboard" view of
trajectories. |
|
Curvature where white is highest, yellow
is moderate, and blue is low. |
There are hundreds of evidence
features, because in addition to properties computed over a single trajectory, the system
can compute evidence based on the relationship between two trajectories or groups of
trajectories. Some are shown below.
These evidence detectors are used as primitives
in Bayesian belief networks, also known as belief nets. The belief nets are used to
combine uncertain information. Each network is currently hand constructed. The issues
raised by using this particular representation (as opposed to other alternatives) are
being investigated.
The detector above checks if the object the
network is called with is blocking for a quarterback pass. The networks generally small
and therefore can be evaluated with exact methods.
Piecewise linear functions are used to enter
evidence information into the networks so that continuous valued information is preserved
but modeling conditional probabilities is manageable. Some example discretization curves
are shown here.
The networks can be applied to the data at each
frame to output a certainty factor for each action. For example, below the Quarterback is
shown moving back (to the left) just after the snap, with detectors for the offense
firing.
This QuickTime video shows the system output
when just a few detectors are running over all the offensive agents. For the most part,
the certainty values that are output are reasonable. The "errors" you see
typically have lower certainty values than the correct labels. They are caused sometimes
by networks that need additional tuning and sometimes by the noise in the data and
uncertainty in the representations.
Current work is focussed on using the output of
the network-based mid-level action detectors for describing team plays. The research
questions primarily involve representation of time in multi-agent action descriptions.
|