2

2.4 Theoretical frameworks for mappings between gestures and music "both sound and human movement can be represented at various abstraction levels. A mapping will be faster to learn when movement features are mapped to sound features of the same abstraction level." There have always been some aspects of gestures that are difficult to describe in language; they can only be described precisely in mathematical terms. However, gestures are used systematically in many domains of human communication, each of which has evolved its own methods and meanings. Specific rule-based systems for gesture have been developed in rhetoric, oratory, theatre, dance, and sign language; numerous theorists have attempted to codify and describe those rules. One of the earlier examples of a codex for gesture came from John Bulwer, a British scholar, who wrote a systematic treatise on the art of hand-speech and rhetorical gesturing in 1644. He described the skilled gesture-performer as a "Chiromancer," expert in "chirologia," or hand-speech, and gave exhaustive illustrations of all the different hand poses and movements with their associated meanings. More recently, Desmond Morris wrote a book that describes the range of human behavior by exhaustively categorizing different activities, and Eugenio Barba similarly tried to formalize the actions of human actors in theatre across the cultures of the world.

Also, others have studied the basic expressive instincts underlying everyday gestures. During the last century, the German psychologist Willhem Wundt wrote a treatise on the language of gestures, trying to describe the essence of human gesture by uncovering the universal principles of expressive movement. He embarked on a study of sign languages after researchers in his psychology lab began measuring and interpreting human breathing and pulse signals; Wundt believed that gestures and physiology reflected a more natural and emotional kind of expression of the internal experience than spoken and written languages. He wrote:

"It is not the degree of education but rather the degree of emotion or the constant affective tendency, the temperament, that is important for the formation of gesture. If, due to this tendency, there exists a leaning toward a more lively pantomime, it not only accompanies speech, but even takes its place should thoughts be difficult to communicate aloud. As such, an aesthetic joy in meaningful gestures naturally arises. The ancients were more familiar with the pleasure of gestures in casual communication than we are today. In fact, conventions actually demanded a superfluity of affective expression, whereas we now tend to suppress it. So the ancients had a more lively feel for the meaning of gestures, not because theirs was a primitive culture, but simply because it differed from ours, and especially because the ability to discern outer signs of inner feeling was more developed." Wundt’s work has been as an "action theory of expression," and it contains a number of important insights about the relationships between emotion, language, and gesture.

Finally, there is the case of the possibility for mappings using data from gestural sensors. Here there are no established rules, since the field is so new and since meanings have not accreted. For now, any gesture can be accompanied by any sound, but the question becomes how to make that relationship meaningful. During the past ten years a few theorists have attempted to formalize theories or frameworks for the myriad possible relationships between gesture and sound. Barry Schrader described these as action/response mechanisms, which have traditionally been very clear with acoustic instruments but now are a requisite part of instrument design. Schrader wrote that "the art of ‘playing’ an instrument is that of creating a series of meaningful action/response associations." There are many different ideas for how these associations might be made; some theorists have identified systematic grammars, definitions, and categories. Others talk in more general ways of the issues involved in designing new instruments.

2.4.1 David Efron

In "Gesture, Race, and Culture," a landmark 1941 study of differences in conversational gesture between neighboring ethnic groups in New York, David Efron presented a general theory of gesture and meaning. In his study, designed to test the claims of Nazi scientists that gestural styles were due to racial inheritance, Efron carefully and systematically documented thousands of examples of the uses of gesture in conversation and communication between people in everyday situations. His relevance and importance to the study of conducting comes from the enormous amount of quantitative and qualitative data that he collected on gestures from natural settings. Efron’s primary method was to take motion pictures and analyze them afterwards, using a unique notational system. From frequency counts of certain motions he built up a comprehensive theory of how gestures are used to communicate between people.

According to Efron, the three basic uses for gesture are spatio-temporal, interlocutional, and linguistic. Spatio-temporal gestures represent pure movement, free from any conversational or referential context; to me they resemble the abstract forms of conducting. These gestures can be categorized according to five aspects: radius (size of the movement), form (shape of the movement), plane (direction and orientation of the movement), the body part that performs it, and tempo (the degree of abruptness vs. flow). Conversely, linguistic gestures happen during conversation and refer to the content of the speech. Efron divides them into two categories: logical-discursive, and objective. Logical-discursive gestures emphasize and inflect the content of the conversations that they accompany, either with baton-like indications of time intervals, or ideographic sketches in the air. Objective gestures have meaning independent of the speech that they accompany, and are divided into three categories: deictic, physiographic, and symbolic. Deictic gestures indicate a visually present object, usually by pointing. Physiographic gestures demonstrate something that is not present, either iconographically, by depicting the form of an object, or kinetographically, by depicting an action. Symbolic gestures represent an object by depicting a form that has no actual relationship to the thing, but uses a shared, culturally-specific meaning. While Efron’s categories may seem unnecessarily complicated for the current study of conductors, his theory provides a great deal of clarity to the attempt to categorize and quantify gestures.

2.4.2 Joel Ryan

"We can see clearly how music grew and changed with the perfection of the physical means of the instruments and the invention of playing styles. For most musicians this sort of experimentation is seen to be of the historic and golden age sort, with no possibility or need to be resumed. The design of new instruments lies on the fringe: partly inspired, partly crankish eccentricity. So far the art of the interface between physical gesture and abstract function is respected only by aero-space and sports equipment designers." One of the first to try to formulate a coherent theory about mappings between gestural interfaces and music was Joel Ryan of STEIM, who was interested in using empirical methods "to recover the physicality of music lost in adapting to the abstractions of technology." He defined a basic controller as a device that provides a one-to-one relationship between a physical movement and a parameter in the musical model. Some examples of basic controllers would include knobs, switches, and simple one-dimensional sensors. He then evaluated controllers based on their responsiveness, which he defined as the amount of physical feedback that they provide over their useful performance range. The responsiveness of a device had to be good, but more importantly, the shape of the response had to fit the performer’s musical idea. Finally, Ryan defined the control chain for interactive music:

Performer->sensor->digitizer->communication ->recognition->interpretation->mapping->composition

"The parsing of this chain, what might be called the system’s design, is becoming a critical aspect of the making of electronic music compositions." He saw that gesture recognition would expand the possibilities for interacting with musical models in real-time. In 1991, Ryan proposed a method for categorizing mappings using a series of Euclidean analogies between points (symbols), lines, and curves (shapes).For example, touch-triggering of complex forms would be point-to-curve, whereas using complex inputs to trigger individual events would be curve-to-point. Matching one continuous degree of freedom from the control side to a MIDI controller value would be line-to-line. He identified numerous linear transforms that should be used to filter sensor data to make it useful for mapping: shifting, inverting, compressing, expanding, limiting, segmenting, quantizing, thresholding, following rates of change (and rates of rates of change, and rates of rates of rates of change), smoothing, amplifying, delaying, adding hysteresis, integrating, convolving, reducing and expanding rates of data transmission (decimation and interpolation), shaping, and distorting. Ryan’s formalization of "shape to symbol" mappings is perhaps the strongest contribution to the literature; however, he did not discuss the case of mapping between two curves. Herein is where most of the interesting aspects of musical performance lie.

2.4.3 Teresa Marrin

In 1996 I attempted to formulate a theoretical framework for musical mappings that would make sense for the Digital Baton. My theory attempted to show how an entire gestural language is constructed from its most basic elements. The idea was that the largest and most obvious features in a gesture developed their qualities from successive layers of atomic and primitive components. My framework began from the level of the atoms of movement, the smallest detectable features. These atoms could be grouped into primitive events, which could then be grouped into larger structures. These structures could be placed relative to each other in sequences, which could then evolve into conducting patterns. Conducting patterns would comprise a subset of musical gesture languages, which themselves would be a subset of all hand-based gestural languages. While this was a nice idea, it didn’t help to further any practical investigation into the gestures themselves. I ultimately abandoned this framework.

Afterwards I tried a simpler model, where I divided all controls and responses into two categories: continuous and discrete. Discrete gestures I defined as single impulses or static symbols that represent one quantity; an example would be flipping a switch or pressing a key on a keyboard. More elaborate examples of discrete gestures can be found in the class of semaphores and fixed postures, such as in American Sign Language. All of these gestures, no matter how complicated they are, generate one discrete symbolic mapping – for example, a binary flip, a single note with a single volume and duration, or a single word. On the other side, I defined continuous gestures to be gestures that did not have a simple, discrete mapping but rather might be mapped in a more complex way. At the time I saw that it was relatively simple to make one-to-one mappings (using repeatable, score-dependent, deterministic relationships), but that more complex mappings would hold the richest and most elusive information.

After making use of my discrete/continuous formalism, I extended it into the realm of regular grammars.

I developed a multiple-state beat model to try to account for all the possible variations on beat patterns; ultimately this also proved to be too brittle to implement. I also belong to a growing community of researchers who are working in the area of gesture research in music. We have an alias and a web page, both moderated by Marcelo Wanderley, a doctoral student at IRCAM. Information on our collective work can be found at:

http://www.ircam.fr/equipes/analyse-synthese/wanderle/Gestes/Externe/index.html

This page covers numerous research projects in the field of gesture capture, interfaces, and applications to sound synthesis and performance.

Chapter 2.5