A new approach to tracking weakly modeled objects in a semantically rich domain is presented. We refer to our method as closed-world tracking. A closed-world is a space-time region of an image sequence in which the complete taxonomy of objects is known, and in which all pixels can be explained as belonging to one of those objects. Given object information (and additional contextual information if available) context-specific features can be dynamically selected as the basis for tracking. A context-specific feature is one that has been chosen based upon the context to maximize the chance of successful tracking between frames.
Our work is motivated by the goal of video annotation -- the semi-automatic generation of symbolic descriptions of action taking place in a dynamic scene. Common tracking techniques such as correlation, adaptive-templates, and deformable models typically fail in situations where it is difficult to syntactically isolate an object (e.g. background subtraction), where the shape of an object may change dramatically form frame to frame, and where the behavior of an object is not smooth or predictable.
We test our algorithm in the ``football domain,'' a real video-annotation domain currently being annotated manually. We describe how closed-world analysis and context-specific tracking can be applied to tracking football players and present the details of our implementation. We include tracking results based on hundreds of images that demonstrate the wide range of tracking situations the algorithm will successfully handle as well as a few examples of where the algorithm fails.