Another important concept also arises that is equally well supported
by the framework: intra-modal learning. By intra-modal learning, we
are referring to the acquisition of a coupling between gestures or
actions in one domain with those of another domain. For instance, User
A could feed the system with audio measurements (pitch energy,
textural properties) while User B could feed the system with visual
gestures. As the system learns how to predict the time-series of both
measurements, it begins to form a mapping between audio and
video. Thus, instead of learning that a clap should follow when the
user rubs his stomach (as demonstrated above), the system could
trigger clapping when the user sings a nice melody into a pitch
tracker. Evidently, there are many unresolved issues here but the
important notion to stress is that User A and User B do not have
to have the same type of measurements. In other words, their
respective measurements (
and
)
could contain
observations of different types of data.