Do you ever think
about the conversations you had, the people you met, the experiences you
had yesterday, last week, last year, or 10 yrs ago? It is almost tragic
how many memories we lose to the passage of time. Of course, it is not
enough to simply record all the raw audio and video of our daily lives
because it would be impossible to use.
The goal of this
project is to build a simple perceptual system that tries to understand
and annotate the events in a person’s life. The system is called
"The Familiar" because it is a creature (actually a stuffed
animal) that should always be with you and shares your experiences so
that you can see and hear, through its eyes and ears, your past
memories. The actual physical form of the system is actually not
important as long as it is non-obtrusive and can be with you all day and
everyday. So the system has and will take other forms such as a wearable
computer.
This work is serving
two dreams or goals, one scientific, the other a tool for our daily
lives. The first goal is tackling the "frame" or context
problem that AI researchers are currently facing. Take speech
recognition for example. At one level speech recognition is our effort
to recreate the human’s ability to communicate via speech. We are
trying to boil the entire speech system down to a microphone and a
computer
|
|
when in fact the human speech system is
intimately connected with a whole perceptual system that includes
vision, smell, touch, and many more (inner ear), not to mention a whole
life time of experiences to ground everything. However recreating the
entire human perceptual system is too ambitious to tackle all at once.
Currently, the
Familiar consists of 3 sensors: video camera, microphone, and inertial
tracker. On top of these sensors various modules are being built such as
face and speech detection, speaker identification, and gesture
classification. The purpose of these modules is to provide simple robust
and salient features so that the Familiar can start learning about the
structure of your life. Of course some of the features can be used
directly to annotate a person’s day (e.g. speaker identification, face
detection) but more interestingly there is a chance to find more
complicated and long-term patterns (for example, what is just part of
your daily routine and what is a new novel occurrence).
So
another approach is to build a complete perceptual system with much
simpler properties, such as that of a common household fly or maybe an
earthworm. That way whatever task we set this perceptual system to it
will have the benefit of a great deal of context from many sensor types
(robustness) and their relationships (did you see it and hear it at the
same time?).
|