TR#369: An Appearance-based Representation of Action

Aaron F. Bobick and James W. Davis

A new view-based approach to the representation of action is presented. The work is motivated by the observation that a human observer can easily and instantly recognize action in extremely low resolution imagery, even imagery in which individual frames provide no information about three-dimensional structure of the scene. Our underlying representations are view-based descriptions of the coarse image motion associated with viewing given actions from particular directions. Using these descriptions, we propose an appearance-based action-recognition strategy comprised of two stages: first a motion energy image (MEI) is computed that grossly describes the spatial distribution of motion energy for a given view of a given action. The input MEI is matched against stored models which span the range of views of known actions. Second, any models that plausibly match the input are tested for a coarse, categorical agreement between a stored motion model of the action and a parameterization of the input motion. Using a ``sitting'' action as an example, and using a manually placed stick model, we develop a representation and verification technique that collapses the temporal variations of the motion parameters into a single, low-order vector. Finally we show the type of patch-based motion model we intend to employ in a data driven action segmentation and recognition system.