This paper describes research in which evidence from audio and visual-kinesic data is combined to obtain an automatic, unsupervised characterization of discourse in the monologues of comedians Jay Leno and David Letterman. We describe the process of obtaining feature vectors from audio and video data and present results of classifying the feature space in terms of statistically significant clusters.