In this work we address the problem of tracking objects in a complex, dynamic scene. The objects are non-rigid and difficult to model geometrically. Their motion is erratic and they change shape rapidly between frames sampled at 30 frames per second. The objects have low spatial resolution, and the video used for tracking was taken with a panning and zooming camera. Finally, the objects are tracked in sequences up to eight seconds long while moving over a complex background.
We suggest that conventional tracking methods are unlikely to perform well at tracking small objects in complex environments because they do not use contextual information to drive feature selection. We propose using ``closed-world'' analysis to incorporate contextual knowledge into low-level tracking. A closed-world is a space-time region of an image where contextual information like the number and type of objects within the region is assumed to be known. Given that knowledge, the region can be analyzed locally using image processing algorithms and ``context-specific'' features can be selected for tracking. A context-specific feature is one that has been chosen based upon the context to maximize the chance of successful tracking between frames.
We test our algorithm in the ``football domain.'' We describe how closed-world analysis and context-specific tracking can be applied to tracking football players and present the details of our implementation. We include tracking results that demonstrate the wide range of tracking situations the algorithm will successfully handle as well as a few examples of where the algorithm fails. Finally, we suggest some improvements and future extensions.