TR#329: Using Approximate Models as Source of Contextual Information for Vision Processing

Aaron F. Bobick and Claudio S. Pinhanez

Most computer vision algorithms are based on strong assumptions about the objects and the actions depicted in the image. To safely apply those algorithms in real world image sequences, it is necessary to verify that their assumptions are satisfied in the context of the visual process. We propose the use of approximate world models -- coarse descriptions of objects and actions in the world -- as the appropriate representation for contextual information. The approximate world models are employed to verify the applicability of a vision routine in a given situation. Under these conditions, a task module can reliably use the outputs of the contextually-safe vision routines, without having to refer to an accurate reconstruction of the world.

We are using approximate world models in a project to control cameras in a TV studio. In our Intelligent Studio automatic cameras respond to verbal requests for shots from the TV director. Contextual information is obtained from the script of the TV show and from the imagery provided by wide-angle, low-resolution cameras monitoring the studio. Some examples of the cameras' responses to different requests are shown in the domain of a cooking show.