TR-365-ABSTRACT

TR#365: An Image Database Browser that Learns from User Interaction

Thomas Minka

MIT Master of Engineering Thesis, January 1996

Digital libraries of images and video are rapidly growing in size and availability. To avoid the expense and limitations of text, there is considerable interest in navigation by perceptual and other automatically extractable attributes. Unfortunately, the relevance of an attribute for a query is not always obvious. Queries which go beyond explicit color, shape, and positional cues must incorporate multiple features in complex ways. This dissertation uses machine learning to automatically select and combine features to satisfy a query, based on positive and negative examples from the user. The learning algorithm does not just learn during the course of one session: it learns continuously, across sessions. The learner improves its learning ability by dynamically modifying its inductive bias, based on experience over multiple sessions. Experiments demonstrate the ability to assist image classification, segmentation, and annotation (labeling of image regions). The common theme of this work, applied to computer vision, database retrieval, and machine learning, is building in enough flexibility to allow adaptation to changing goals.

Postscript . Full list of tech reports