Object Tracking
|
In order to understand what is
happening in the KidsRoom it is first necessary
to track the positions of all the people and some
objects (e.g. the bed) . Some worlds, like the
bedroom world, use positional information to know
whether people are near certain objects. For
example, the blue desk will speak only when a
person is near it. The positional information is
also used to keep track of whether people are in
a group and whether they are moving or not.
Finally, position information is used to simplify
other vision processes by ensuring that people
are in the expected region of the room. |
|
How it Tracks
The tracking
algorithm uses background subtraction to
generate a "blob image." Background
subtraction uses an image taken of the room
without people and compares it to any given
image of the room which may contain people.
If lighting is relatively constant, the
pixels that are different (the people) can be
clustered into 2D blob regions. The algorithm
then maps each person known to be in the room
with a blob in the incoming image frame. It
is important for the algorithm to keep track
of how many people are in the room,
therefore, which is achieved by having
everyone in the room enter and exit through a
"door" region. For more detailed
information on object tracking, see the Info page for technical references. In
the KidsRoom, the bed position is tracked as
if it were another person, since the bed can
be rolled around the room.
Blob Features
In order to match people to
blobs, the algorithm uses several low-level
"blob" features: mean blob color,
blob size, velocity, and distance traveled.
These simple features make real-time
computation feasible but have proven adequate
for tracking in the KidsRoom domain. The
local context of the situation (e.g. whether
two people are nearby or far apart)
determines how the features are combined when
determining the match.
Example
This movie shows the system
tracking four people wearing colored outfits
in the KidsRoom. The background has been
removed. The remaining pixels are the
"blob" pixels. The algorithm
maintains a count of the number of people in
the room using the door region in the lower
left of the room. The boxes drawn around the
blobs indicate where the system thinks the
people are located and their size. When two
people move near one another, their blobs
merge together. When they split again, the
algorithm uses the blob features to keep
track of which blob is which person.
View
the QuickTime movie.
(Warning: 5.6 megabytes)
Difficulties
The tracking, as demonstrated
by the example, is good but not perfect. The
system is excellent at maintaining the number
and position of people in the space (the
information most critical to the KidsRoom),
but it will sometimes lose track of which
person is which when two people who look
alike come near each other and then move
apart again. While there are some
improvements that might be made to the
algorithm, it will never perform flawlessly.
The control program should use other
contextual information to resolve tracking
ambiguities. This is an topic for further
research.
|