A new approach to the recognition of temporal behaviors and activities is presented. The fundamental idea, inspired by work in speech recognition, is to divide the inference problem into two levels. The lower level is performed using standard independent probabilistic temporal event detectors such as hidden Markov models (HMMs) to propose candidate detections of low level temporal features. The outputs of these detectors provide the input stream for a stochastic context-free grammar parsing mechanism. The grammar and parser provide longer range temporal constraints, disambiguate uncertain low level detections, and allow the inclusion of a priori knowledge about the structure of temporal events in a given domain. To achieve such a system we provide techniques for generating a discrete symbol stream from continuous low level detectors, for enforcing temporal exclusion constraints during parsing, and for generating a control method for low level feature application based upon the current parsing state. We demonstrate the approach in several experiments using both visual and other sensing data.