Final Report

NSF/DARPA Workshop
Perception of Action

May 20-22, 1997

 Prepared by:

Aaron Bobick
  MIT Media Laboratory

Sponsored by:

NSF: Robotics and Machine Intelligence Program
Howard Moraff, Program Manager


DARPA Image Understanding Program
Tom Strat, Program Manager


In May, 1997, more than 25 computer vision researchers gathered in Cape Cod to consider the technical and philosophical challenges that arise in creating machines that perceive action. This web site is designed to serve not only as final report describing the discussions and conclusions, but also as a hyper text resource to both people and additional reference material.


Workshop goals
Initial questions given to attendees to address
Table of links to responses
Attendees (hyperlinks when available, email otherwise)


As computer vision moves from the processing of static images to the consideration of video sequences, the problem of understanding action is becoming fundamental. Current important applications include security and surveillance, dynamic situation monitoring (e.g. refueling a fleet), action verification (e.g. insuring all steps in an assembly occur), and accessing video databases. With the explosion of multi-media content being distributed throughout the world, many new applications are certain to arise in the near future.

To date, unfortunately, most image representations and certainly almost all visual representations of objects have been designed for static situations. While we have numerous methods for representing a particular object (e.g. a given industrial part) we have few visual representations for "throwing a baseball," "walking around a room," or " "refueling a fleet." Current research on recovering the kinematics of a human body from static frames presupposes that a sequence of static poses or "key frames" will be the basis or a prerequisite for representation of action. But such suppositions are made without much consideration of the necessary competencies a representation of action must have.

Also, for human perceivers, the perception and description of action is at least as salient as the perception of static objects. Simple psychophysics shows that people can recognize actions from very low resolution, poor quality images. Such evidence argues that the perception of action may be best thought of as an independent process, not one built upon prior static analysis of the images. That is, maybe we do not need to solve the complete static vision problem before considering the perception of action.

The goals of this workshop were 1) to present the current state of the art in the representation and recognition of action by machine; 2) to consider whether work in cognitive science on the human representation and perception of action can provide possible computational models; 3) to propose and evaluate possible research directions in terms of importance and likelihood of success; and 4) to consider which application areas are most likely to be suitable for work and would have the greatest impact. Attendees included representatives from academic and industrial researchers in computer and human vision, industrial developers of vision technology, and industrial consumers. In addition, representatives of the government from DoD and the intelligence community participated.



To stimulate discussion we distributed questions regarding general issues in the perception of action. The responses to the questions can be accessed per question, or via the response table:

  1. What is action?
  2. In what ways is perceiving action fundamentally different than perceiving static images?
  3. What can we learn from the human perception of action?
  4. What are the minimal cues for animating action?
  5. How do we represent action?
  6. What is the role of causal or physics based reasoning in understanding action?
  7. How should time be handled?
  8. What are good research domains and example tasks?
  9. What disciplines do we need to (re)consider?

In addition, participants provided additional positions, comments on research methodologies, or extended abstracts that gave their vision of research in this area.