We are exploring the use of high-level knowledge about bodies in the visual understanding of gesture. Our hypothesis is that many gestures are metaphorically derived from the motor programs of our everyday interactions with objects and people. For example, many dismissive gestures look like an imaginary object is being brushed or tossed away. At the discourse level, this implicit mass represents a referent in the conversation; at the scene-formation level, the dismissive gesture obeys many of the kinematic and dynamic constraints that would shape an actual tossing. Thus this metaphor provides us with constraints for both discourse annotation and visual processing. In this paper we present some preliminary results interpreting complex gesture sequences in video.