A method for the recovery of the temporal structure and phases in natural gesture is presented. The work is motivated by recent developments in the theory of natural gesture which have identified several key aspects of gesture important to communication. In particular, gesticulation during conversation can be coarsely characterized as periods of bi-phasic or tri-phasic gesture separated by a rest state. We first present an automatic procedure for hypothesizing plausible rest state configurations of a speaker; the method uses the repetition of subsequences to indicate potential rest states. Second, we develop a state-based parsing algorithm used to both select among candidate rest states and to parse an incoming video stream into bi-phasic and multi-phasic gestures. We present results from examples of story-telling speakers.
Postscript . pdf . Full list of tech reports