As research in computer vision has shifted from only processing single, static images to the manipulation of video sequences, the concept of {\em action recognition} has become important. Fundamental to understanding action is reasoning about time, in either an implicit or explicit framework. In this paper I describe several specific examples of incorporating time into representations of action and how those representations are used to recognize actions. The approaches differ on whether variation over time is considered a continuous mapping, a state-based trajectory, or a qualitative, semantically labeled sequence. For two of the domains --- whole body actions and hand gestures --- I describe the approaches in detail while two others --- constrained semantic domains (e.g. watching someone cooking) and labeling dynamic events (e.g. American football) --- are briefly mentioned.