This is an image from our final class project in Machine Understanding of
Video (spring 95). The boxes bound the area in which Jay moves his head and hands
while giving his monologue. The dots represent his head and hand positions during the performance.
In this group project, we measured four features of Jay Leno's monologue on
"The Tonight Show with Jay Leno" .
Our goal was to see what speech and gesture reveal about what someone is saying.
In the audio domain, we measured the distribution of pauses in the monologue and tracked the pitch of the voice.
In the visual domain, we tracked the hand positions and measured their velocities.
We used the isodata clustering method to characterize our feature space. This method
plots each feature as a vector in a space and simply finds clusters. The scattering of
the clusters is a measure of the salience of the features.
We were able to automatically find portions of Jay's monologue in which he makes
large gestures at long pauses in his speech. In practice, this usually corresponds to
a point of emphasis and occurs at the conclusion of a joke.
We think we built the first computer joke detector!