Chapter 8: CONCLUSIONS

Chapter 8: CONCLUSIONS

"In music, the instrument often predates the expression it authorizes." Finally, I will conclude by discussing several issues that were encountered in the course of this work. They include general issues raised in the design of sensor-based interfaces and musical instruments, comparisons between traditional and digital forms, and distinctions between musical and physical gesture. Then I demonstrate the implications of this doctoral project for other work, present a framework for future research in this area, and hypothesize on the future of musical performances.

8.1 Design Issues for Sensor Instruments

My experiments during this doctoral project have taught me many lessons about designing sensor-based instruments. Some general categories have emerged, including repeatability, depth, emphasis, constraints, and Cartesian reliance. These can be thought of as criteria for the success of a new instrument; to the extent that the Conductor’s Jacket may be successful, it can be said to have maximal repeatability, depth, and emphasis, and minimal constraints and Cartesian reliance. These are described below, followed by larger discussions of problems with the properties of disembodiment and mimesis.

Repeatability is the property of an instrument that makes it deterministic; a specific action should yield a specific result, and on repetition it should the same sound. In order for skilled musicians to perform on sensor-based instruments, they must have the property of repeatability.

Depth is the property that makes an instrument sophisticated enough for someone to become skillful with it; it is the presence of a rich set of interconnected mappings. Depth in an instrument means that the longer one works at mastering the instrument, the more beautiful and pleasing the result becomes. Depth is not achieved by just increasing the number of degrees of freedom; it also involves the careful construction of those degrees of freedom such that the result is meaningful.

"The most important requirements are immediacy and depth of control, coupled with the greatest flexibility possible; immediacy and depth because the user needs feedback on his action, flexibility because the user wants a predictable but complex response."

Emphasis is the property of an instrument that reflects the amount of effort and intensity with which the performer works to create the sound. Most sensor-based instruments fall very short of this property because their sensing mechanisms are not designed to gather this quantity and the synthesis models do not reflect this parameter. As Joel Ryan wrote,

"Physical effort is a characteristic of the playing of all musical instruments. Though traditional instruments have been greatly refined over the centuries, the main motivation has been to increase ranges, accuracy, and subtlety of sound and not to minimize the physical. Effort is closely related to expression in the playing of traditional instruments. It is the element of energy and desire, of attraction and repulsion in the movement of music. But effort is just as important in the formal construction of music as in its expression: effort maps complex territories onto the simple grid of pitch and harmony. And it is upon such territories that much of modern musical invention is founded."
Watanabe and Yachida reflect this idea about emphasis using the word degree: "Methods such as DP matching, HMMs, neural networks, and finite state machines put emphasis on classifying a kind of gesture, and they cannot obtain the degree information of gesture, such as speed, magnitude and so on. The degree information often represents user’s attitude, emotion and so on, which play an important role in communication. Therefore the interactive systems should also recognize not only the kind of gesture but also the degree information of gesture."

The Constraints of an instrument are the aspects of their design that force the human performer to gesture or pose in a particular way. Some sensor-based instruments have the problem of being overly-constrained; the nature of their sensing mechanisms force the performer to make contorted, unnatural gestures to achieve particular effects. All instruments by their very natures will impose some constraints on their performers, but in order for the instrument to be successful the constraints have to limit the physical movement in such a way as to not hamper the expressive capabilities of the performer.

Cartesian Reliance is the property of an instrument whereby the different degrees of freedom are set up in a literal, Cartesian coordinate space for the performer. An example of such a device would be a mixing board, where the individual levers each move in one direction and map to one quantity. My strong feeling is that for a musical instrument to be usable by a performer, it should not rely too heavily on the Cartesian paradigm. That is, humans do not naturally perceive of the world in Cartesian terms; our intuitive ways of gesturing are often invariant to aspects such as translation, rotation, and scale. In the design of interactive systems for people, "the important thing is to not leave motions in Cartesian terms." Certainly the expressive information in conducting is not contained in the absolute positional information, and this doctoral project has shown that quantities such as muscle tension can sometimes be more useful than gathering the accelerational information from the position of the moving arm. The common reliance on orthogonal vertices doesn’t map well to meaningful, natural musical responses. For example, in most traditional music, pitch and loudness are coupled -- they can’t just be manipulated as independent parameters. Unlike car controls, there is no inherent meaning in increasing a single value independently. While the Theremin got away with it, I think orthogonal relationships are just too simplistic and arbitrary; they make the engineering easier, but force the musician to conform to an unnatural structure.

8.1.1 The Disembodiment Problem

Disembodiment is a property of many sensor-based instruments that don’t themselves include the source of their own sound. Unlike with traditional instruments, where the physical action is what generates the vibrations, many sensor instruments are indirectly connected through a long chain of processes to the actuation of its sounds. Many sensor instruments have no resonant cavity and depend upon synthesizers and speakers to convey their sounds; the sounds come out of a set of speakers that might be physically far removed from the actions that generate them. A second component of the disembodiment problem originates from the mapping layer that separates the transduction and actuation processes of interactive music systems. Mappings allow for any sound to be mapped to any input arbitrarily, and the extreme freedom and range of possibility makes it hard to construct mappings that look and sound "real" to an audience.

It is still not well understood how to construct mappings such that they intuitively map well to an action; this is because interactive music is still an extremely new art form. Instruments like the Digital Baton were extremely sensitive to the mapping problem; I think that this was because of certain properties of its sensory system -- the 2D space in front of the performer was interpreted as a large grid, divided into an arbitrary number of cells that acted as triggers. This method worked well algorithmically, but frequently confused audiences because they could not see the virtual grid in front of the performer and were not able to connect the actions of the performer with the musical responses that they heard. It might be said that this causes alienation -- that is, the nature of the instrument made it especially difficult to construct mappings that sounded "embodied."

The problem of disembodiment resembles the situation of conducting, where the performer gestures silently and an external source generates the sound. Therefore, it might be said that conducting is, by definition, disembodied, and provides a useful model for sensor-based systems. For this reason it was decided not to add a local source of auditory and tactile feedback to the jacket. Instead, the Conductor’s Jacket project attempted to address this issue by bringing the sensors closer to the source of the physical gestures, namely in the performer’s physiology.

The disembodiment problem can also cause audiences to become detached, particularly if they can’t identify how the performer is controlling the sounds. David Rothenberg described this property as the fault of tape music when he wrote:

"One answer is to combine live performance with computer performance, but as we have noted, the computer and synthesizer always initiate various levels of distance between performer and sound which create barricades to convincing and engaging performances. There is less to observe as the player plays -- less danger, less physical stress in the playing. Listeners soon learn this, so it is up to performers of electronic and computer-controlled instruments to develop new ways of making their performance more convincing and engaging." Morton Subotnick continued this idea by suggesting that the primary reason to have live performances is for the identification that an audience has with a soloist on stage. In order for the audience to make that identification, Subotnick stressed, it was very important to create a clear relationship between the soloist’s gestures and the sound: "The soloist is the carrier of the information, and it isn’t the music. The music is there for him or her, and the audience is witnessing that person. If you don’t do that, then you are missing a great deal -- the reason for that person to be on the stage...it is terribly important that we identify with the soloist on the stage. And you can’t identify if you don’t know what the player is controlling. If you focus on the player, that player has to have something that is appropriate to what they are doing. Otherwise, you have no reason to give that material to them." Another composer, Atau Tanaka, suggests that the ambiguity of the relationship between gesture and sound is an aspect that can be used creatively to generate dramatic tension: "The audience must distinguish who is playing what. At some moments it is clear, and there are other moments where it is unclear. We can play with this ambiguity. It is a kind of natural reaction on the part of the audience to try to make a connection between the physical gesture they see and what they hear. However, to do so is difficult, because these sounds are unknown. These are abstract computer-generated sounds, whereas with acoustic ensemble music there is always some prior knowledge of how the individual instruments sound." I feel that it is a problem if the audience does not understand what is going on in the performance and expresses confusion. Disembodiment is not a property to be cultivated in a musical instrument. But methods for removing the disembodiment problem remain elusive. Chris Van Raalte, who built and performs with the BodySynth, called this the ‘get-it factor’ and modestly admitted that he does not yet know how to achieve it. Bean also wrote about this problem in a recent article about new electronic instruments: "Making the process of creation and the resulting music compelling enough to bring out of the studio and onto the stage is certainly a challenge. Even more difficult is communicating to an audience the causal relationship between subtle physical movements and sound so that people can comprehend the performance." 8.1.2 Mimesis

Another issue with the design of sensor-based instruments is that of Mimesis, or the degree to which the new instrument resembles or behaves like a traditional instrument. For example, electric guitars and MIDI keyboards have a high degree of Mimesis; they closely resemble their acoustic predecessors. Arguably, this is why they have been much more commercially successful than the more recent examples such as the Miburi and BodySynth. There is a performance tradition from the traditional guitar that carries over nicely to the electric guitar, whereas there is no strong model for what to do with a Miburi that can be adopted from the past. The entire set of expectations for how to use it and how it should sound have to be invented.

Using traditional models helps us make choices about the limits of the technology, but also constrain how we think about the instrument. John Cage criticized this mimetic adherence to old models:

"Most inventors of electrical musical instruments have attempted to imitate eighteenth- and nineteenth-century instruments, just as early automobile designers copied the carriage. The Novachord and the Solovox are examples of this desire to imitate the past rather than construct the future. When Theremin provided an instrument with genuinely new possibilities, Thereministes did their utmost to make the instrument sound like some old instrument, giving it a sickeningly sweet vibrato, and performing upon it, with difficulty, masterpieces from the past. Although the instrument is capable of a wide variety of sound qualities, obtained by the turning of a dial, Thereministes act as censors, giving the public those sounds they think the public will like. We are shielded from new sound experiences." I agree with Cage that new instruments have a new set of unique possibilities that can be damped by relying too heavily on re-creating and serving the expectations of the past. I also think that the properties of musical conducting devices should not always try to copy conducting in a literal way. A question that I received recently reflected this concern: Are you ever concerned that by orienting yourself towards the existing musical expressive vocabulary of conventional performers (i.e. conductors), and designing interfaces to respond to this vocabulary, you're missing the chance to use new musical interfaces to expand the musical vocabulary? That you're trying to mock up a virtual model of what classical musicians do, rather than exploring what's appropriate for the new tools of computer-generated music? My response to this question is that the primary reason to do the Conductor’s Jacket was to study existing musical vocabularies in order to better understand how to build new systems. The premise is that if we can understand how people have historically expressed themselves (by inventing and using gesture-systems such as conducting), then we can build technological tools that extend that expressivity. The new electronic systems don’t have to mime conducting, particularly if the performer is not gesturing in front of 50-100 musicians. Conducting is just one example of an expressive gesture-language; the point of the Conductor’s Jacket project has not been to focus on an old tradition for the sake of replicating it, but rather to leverage from it to a new gesture-language which uses some of our intuitive (or possibly innate) methods of expression. So a Mimetic system is not the optimal end result of this doctoral project; instead, perhaps a perfect final system would be one that transforms and updates the idea of conducting.

8.1.3 Traditional vs. Digital Instruments

The biggest difference between traditional instruments and digital instruments is in what Barry Schrader called the action/response mechanism. With traditional instruments, the action/response relationship is very clear; the musician makes a gesture and the sound is affected in some way. The relationship is based on the rules of physics, which we may not understand cognitively but we have assimilated intuitively. As Schrader writes, "the art of ‘playing’ an instrument is that of creating a series of meaningful action/response associations." How these associations become meaningful remains mysterious, but we might assume that their constancy and repetition over time solidifies the shared expectations of the artists and the audience, much in the manner of operant conditioning.

With digital instruments, those relationships are not so clear. In fact, it is often difficult to design a situation such that the relationships are clear. It takes a lot of knowledge of the intuitive expectations of the audience, as well as the understanding and skill of the performer, in order to make those relationships work. Also, the art form is so new that no one quite knows what the rules are, or what to expect. Joel Ryan suggests that the physicality of the performance interface often helps in this process; he describes that the affordances of the object help stimulate the imagination about how it might be used digitally:

"The physicality of the performance interface helps give definition to the modeling process itself. The physical relation to a model stimulates the imagination and enables the elaboration of the model using spatial and physical metaphors. The image with which the artist works to realize his or her idea is no longer a phantom, it can be touched, navigated, and negotiated with. In some cases it may turn out that having physical ‘handles’ in the modeling process is of even more value than in performance." The ultimate test of whether or not we get these mappings ‘right’ will be when a performer is able to think idiomatically with an instrument "the way a pianist thinks with a keyboard, or a guitarist thinks with a guitar neck"

8.1.4 Distinctions between Musical and Physical Gesture

"the gestures which music embodies are, after all, invisible gestures; one may almost define them as consisting of movement in the abstract, movement which exists in time but not in space, movement, in fact, which gives time its meaning and its significance for us." John Harbison once commented to me that musical gesture is not necessarily the same as physical gesture; that the two types of gesture may be related in the performance of music, but that relationship may not necessarily be quantifiable. But given that musical gestures can be conveyed from conductor to musicians via physical gestures, then, it must be possible that musical gesture can be communicated. The relationship between the two quantities is probably not one-to-one, but rather encoded in the substructure and pulse structure of the gesture. Denis Smalley makes a complex argument that musical gesture and physical gesture are linked, though the apprehension and expectation of people who naturally associate emotions with certain energy-motion trajectories. In many ways, his ideas resemble Clynes’ Sentic curves. He writes:
"Traditionally, musical gesture involves a human agent who, sometimes using mediatory implements, acts physically on sounding bodies by fingering, plucking, hitting, scraping and blowing. These gesture-types harness energy and motion through time: a controlled, physical motion at a varying, energetic rate results in the excitation of a sounding body and the shaping of a spectromorphology. Everyone has daily experience of gestural activity and is aware of the types of consequences of the energy-motion trajectory. Gestural activity is not only concerned with object play or object use but also enters into human relationships: it is a gesture that wields the ax and it is a gesture that expresses the intimate caress. The energy-field can vary between extremes of force and gentleness, and in the temporal domain can be very sudden of motion, or evolve more slowly. Broadly defined, human gesture is concerned with movement of the body and limbs for a wide variety of practical and expressive reasons; it is bound up with proprioceptive (kinesthetic) perception of body tensions and therefore with effort and resistance. However, the indicative field does not stop with a physical act since tension and resistance also concern emotional and psychological experiences. Thus in music there is a link between the energy-motion trajectory and the psychological apprehension of sounding contexts even when physical gesture is not present."

Chapter 8.2