Vision and Modeling Group

MIT Media Laboratory


Computers Watching Football


Action Recognition

These pages briefly describe the current state of our action recognition system. The system first computes low-level evidence directly from the trajectories.

traj-chalkboard-1-crop.gif (163860 bytes)

"Chalkboard" view of trajectories.
traj-curv-chalkboard-1-crop.gif (5452 bytes)
Curvature where white is highest, yellow is moderate, and blue is low.

There are hundreds of evidence features, because in addition to properties computed over a single trajectory, the system can compute evidence based on the relationship between two trajectories or groups of trajectories. Some are shown below.

lispworks-feature-window-crop.gif (14950 bytes)

These evidence detectors are used as primitives in Bayesian belief networks, also known as belief nets. The belief nets are used to combine uncertain information. Each network is currently hand constructed. The issues raised by using this particular representation (as opposed to other alternatives) are being investigated.

blockqbpass.gif (8795 bytes)

The detector above checks if the object the network is called with is blocking for a quarterback pass. The networks generally small and therefore can be evaluated with exact methods.

Piecewise linear functions are used to enter evidence information into the networks so that continuous valued information is preserved but modeling conditional probabilities is manageable. Some example discretization curves are shown here.

fuzzy-curves.gif (2666 bytes)

The networks can be applied to the data at each frame to output a certainty factor for each action. For example, below the Quarterback is shown moving back (to the left) just after the snap, with detectors for the offense firing.

labeled-players.gif (181397 bytes)

This QuickTime video shows the system output when just a few detectors are running over all the offensive agents. For the most part, the certainty values that are output are reasonable. The "errors" you see typically have lower certainty values than the correct labels. They are caused sometimes by networks that need additional tuning and sometimes by the noise in the data and uncertainty in the representations.

QuickTime plug-in

Current work is focussed on using the output of the network-based mid-level action detectors for describing team plays. The research questions primarily involve representation of time in multi-agent action descriptions.  

Computers Watching Football Home

Play Recognition

Last modified: April 06, 1999