Introductory Remarks

Introductory Remarks

Minoru Asada

We discuss how so-called "intelligence" can be emerged as a cognitive process, that is, how an agent can develop its internal representation according to the complexity of the interactions with its environment through its capabilities of sensing and acting. The complexity might be increased by the existence of other active agents, and the development can be possible depending on how the agent can find a new axis in the internal representation in trying to accomplish a given task in the environment including other agents. As an example of such a development, we show a case study of vision-based mobile robots whose task is to perform a soccer game performing tasks such as shooting and passing a ball or avoiding an opponent, along with preliminary experiments by real robots.

Larry Davis

For most of the past thirty years the computer vision community has focused its attention on a world without people, making substantial progress on problems such as recognition of rigid 3-D objects, estimation of egomotion through (mostly) rigid scenes and understanding the physical relationships between images, scenes, sensors and illumination. During the past five years people have entered the picture, both complicating our lives and bringing to our attention a new set of fundamental and applied research problems in perception and cognitive vision.

Irfan Essa

Present research efforts in the area of machine perception of action are not much different in their goals to the earlier works aimed at extracting intentionality from the environment or extracting syntactic and semantic cues from the scene. What we do have going for ourselves at present is that computing power can support our needs to undertake robust searches through somewhat toy domains, or allow us to model limited scenarios so that we can just look for "stuff" and "things" that we understand and make rule-based or probabilistic inferences from them. No real attempts at undertaking machine perception of actions, and I specifically mean human actions, in real domains have been made to date and perhaps can't be made for some time in the future.

In my earlier work I have taken a much more of a model-based approach. Defining a complete spatio-temporal structure of what to observe and then placing it within the observer-controller framework, I have succeeded in extracting very detailed actions. Detailed models have allowed me to experiment with probabilistic models in the space of the states that the model can handle. I have also experimented with just using the models for energy minimization and constraints (and not for interpretations) on the data for exploring probabilistic data-driven models. Both these approaches though quite limited in their scope and domain, provide very detailed interpretations.

Michael Isard

Perception of action shares a problem with many other areas of computer vision; although there are clear research problems to be addressed, the potential applications are somewhat nebulous. There are two approaches to this dilemma --- either continue along promising lines of research in the expectation that the solution of general problems is A Good Thing and will be ultimately useful, or, in the words of Dilbert's Boss, "Identify the problem, then solve it.'' Both approaches have in the past been carried out simultaneously, and there is no reason to think that things should be different for research into action. As well as an overall aim for the field, therefore, it is worth considering directions for each of these shorter term goals.

Emre Yilmaz

I feel like a bit of an impostor at a conference on the machine recognition of action and gesture-I'm not a computational vision scientist. I'm a "digital puppeteer," from Protozoa, a small animation and technology company in San Francisco. However, I have a background in perception and action research, as well as puppetry and graphics. (I am an ex-student of Bill Warren at Brown, and Ken Nakayama at Harvard.) So my perspective on what I do is fairly analytical and perception- informed. I hope my observations on what I do will be of interest in thinking about the issues of this workshop.

At Protozoa, we are constantly working with issues of how to represent gesture. Protozoa, an offshoot of Colossal Pictures, is a technology and entertainment company that specializes in "real time character animation." To animate our characters, we wear a suit of sensors the computer can follow, allowing us to literally act out their movements (a technique also known as "motion capture.") It can be a lot of fun-imagine looking in a mirror and seeing yourself as you make different movements-except that in the "mirror" (computer screen), you're an orange dog, or a crafty monkey. We also do some characters that are far from human shape; for instance we've made a worm and a spider. (Few places besides Protozoa do this.) With a very human shaped character, the technique feels almost like acting; with non human characters it feels just like puppetry.

Back to Workshop Homepage