Question #4 - Animating Action
4) What are the minimal cues for animating
action? How much can motion replace
realistic rendering? Does this tell us anything about the primitives of
action perception or representation?
I'm not sure the meaning of this query, but the
design of the primitives of action and perception representation is one
of the most essential problems. "Motion Sketch" [Nakamura and
Asada95] is one of our solutions.
In recognizing action, a significant amount of work has focused on motion, but general "appearance change" is equally important. To animate someone dancing in a long flowing dress, the changes in appearance due to the deforming texture (self shadowing, occlusion, etc) are likely to be as important as the motion information.
In work with Shanon Ju and Yaser Yacoob, we looked at the tracking of human limbs using motion techniques. A detailed examination of image sequences of people in a variety of clothing reveals that much of the "change" observed between frames is not due the semi-rigid displacement of the limbs, but rather is due to the illumination changes resulting from the non-rigid deformation of the clothing. This presents serious (possibly fundamental) challenges to approaches that base recognition solely on motion.
Another common example occurs in image sequences of people talking. The motion of the lips deforming is a significant cue, but so too is the rapid appearance/disappearance of the teeth, tongue, and mouth cavity. Yacoob, Fleet, and I have referred these non-motion changes as "iconic change".
This iconic change should not be viewed as a problem
that motion estimation approaches need to work around. Rather, information
about deforming clothing and appearing teeth provides important cues about
the activity that is occurring. Recognition techniques should exploit both
motion and other types of appearance change.
Perhaps the concepts of physics-based modeling can be employed for developing
detailed representations of actions. The major benefit of physics-based
methods would be that the variability of time can be easily incorporated
into the analysis and allow for spatio-temporal representations. Animation
and mimicking of actions is a very important by-product of this method.
Again, the limitation is the domain and the context.
My notion of spatial and temporal attention control can be reversely
exploited in the animation domain. The quality of images can be controlled
so that it provides only enough details necessary for humans to grasp what's
going on in the film. The problem is that we know very little about generalizing
spatio-temporal attention control in action perception.
Aesthetically, motion capture does not easily produce the kind of wild exaggeration and dramatic motions that are so important in normal animation. However, it easily reproduces little tiny gestures, shrugs of the shoulder, subtle body language, and a sense of weight and mass-which take great talent to represent with keyframe animation or stop-motion.
Good animation-like good puppeteering, or for that matter, good mime-is rarely about trying to duplicate realistic movement. It's more often about drawing from a real movement, developing the essence of it into something larger than the reality. It's also about distilling down all the movements you could be doing to the movements that are necessary, and presenting those essentials clearly. Realistic motion feels different to watch than artistically interpreted motion. Reality is a guideline, but usually only a starting point, not the final goal.