A probabilistic framework for representing and visually recognizing complex multi-agent action is presented. Motivated by work in model-based object recognition and designed for the recognition of action from {\em visual evidence}, the representation has three components: (1) temporal structure descriptions representing the logical and temporal relationships between agent goals, (2) belief networks for probabilistically representing and recognizing individual agent goals from visual evidence, and (3) belief networks automatically generated from the temporal structure descriptions that support the recognition of the complex action. We describe our current work on recognizing American football plays from noisy trajectory data.