This thesis investigates the role of learning in a video browsing venue. Regions of significant change are preselected automatically from video sequences in a television situation comedy. These regions, which often depict portions of actors, are presented to the user for interactive labeling. The user specifies regions which are positive and negative examples of the actors and the computer trains by analyzing the regions with respect to a bank of signal models. Other regions in the video database, similar to the positive training examples, are found automatically. A feature of this work is the integration of high-level information, as encapsulated by the show's script and closed captions, and low-level signal feature analysis, as derived from similarity measures. The pooling of these descriptors constrains the search. Results of a database query are presented to the user during an interactive session. Given sufficient training data and user feedback, the computer learns the pattern of video which corresponds to a particular actor. By these means, a tool which can intelligently assist a human at indexing, browsing and searching through video is constructed.
Compressed Postscript . Full list of tech reports