Object Tracking

  In order to understand what is happening in the KidsRoom it is first necessary to track the positions of all the people and some objects (e.g. the bed) . Some worlds, like the bedroom world, use positional information to know whether people are near certain objects. For example, the blue desk will speak only when a person is near it. The positional information is also used to keep track of whether people are in a group and whether they are moving or not. Finally, position information is used to simplify other vision processes by ensuring that people are in the expected region of the room.  

How it Tracks

The tracking algorithm uses background subtraction to generate a "blob image." Background subtraction uses an image taken of the room without people and compares it to any given image of the room which may contain people. If lighting is relatively constant, the pixels that are different (the people) can be clustered into 2D blob regions. The algorithm then maps each person known to be in the room with a blob in the incoming image frame. It is important for the algorithm to keep track of how many people are in the room, therefore, which is achieved by having everyone in the room enter and exit through a "door" region. For more detailed information on object tracking, see the Info page for technical references. In the KidsRoom, the bed position is tracked as if it were another person, since the bed can be rolled around the room.

Blob Features

In order to match people to blobs, the algorithm uses several low-level "blob" features: mean blob color, blob size, velocity, and distance traveled. These simple features make real-time computation feasible but have proven adequate for tracking in the KidsRoom domain. The local context of the situation (e.g. whether two people are nearby or far apart) determines how the features are combined when determining the match.


This movie shows the system tracking four people wearing colored outfits in the KidsRoom. The background has been removed. The remaining pixels are the "blob" pixels. The algorithm maintains a count of the number of people in the room using the door region in the lower left of the room. The boxes drawn around the blobs indicate where the system thinks the people are located and their size. When two people move near one another, their blobs merge together. When they split again, the algorithm uses the blob features to keep track of which blob is which person.

View the QuickTime movie.

(Warning: 5.6 megabytes)


The tracking, as demonstrated by the example, is good but not perfect. The system is excellent at maintaining the number and position of people in the space (the information most critical to the KidsRoom), but it will sometimes lose track of which person is which when two people who look alike come near each other and then move apart again. While there are some improvements that might be made to the algorithm, it will never perform flawlessly. The control program should use other contextual information to resolve tracking ambiguities. This is an topic for further research.


  Story - Playspace - Technology - People - Info  

The KidsRoom - Perceptual Computing Group - MIT Media Laboratory