Chapter
8: CONCLUSIONS
 
"In music, the instrument often predates the expression
it authorizes."
Finally, I will conclude by discussing several issues that
were encountered in the course of this work. They include general issues
raised in the design of sensor-based interfaces and musical instruments,
comparisons between traditional and digital forms, and distinctions between
musical and physical gesture. Then I demonstrate the implications of this
doctoral project for other work, present a framework for future research
in this area, and hypothesize on the future of musical performances.
8.1 Design Issues for Sensor
Instruments
My experiments during this doctoral project have taught
me many lessons about designing sensor-based instruments. Some general
categories have emerged, including repeatability, depth, emphasis, constraints,
and Cartesian reliance. These can be thought of as criteria for the success
of a new instrument; to the extent that the Conductor’s Jacket may
be successful, it can be said to have maximal repeatability, depth, and
emphasis, and minimal constraints and Cartesian reliance. These are described
below, followed by larger discussions of problems with the properties of
disembodiment and mimesis.
- 
Repeatability is the property of an instrument that
makes it deterministic; a specific action should yield a specific result,
and on repetition it should the same sound. In order for skilled musicians
to perform on sensor-based instruments, they must have the property of
repeatability.
- 
Depth is the property that makes an instrument sophisticated
enough for someone to become skillful with it; it is the presence of a
rich set of interconnected mappings. Depth in an instrument means that
the longer one works at mastering the instrument, the more beautiful and
pleasing the result becomes. Depth is not achieved by just increasing the
number of degrees of freedom; it also involves the careful construction
of those degrees of freedom such that the result is meaningful.
"The most important requirements are immediacy and depth
of control, coupled with the greatest flexibility possible; immediacy and
depth because the user needs feedback on his action, flexibility because
the user wants a predictable but complex response."
- 
Emphasis is the property of an instrument that reflects
the amount of effort and intensity with which the performer works to create
the sound. Most sensor-based instruments fall very short of this property
because their sensing mechanisms are not designed to gather this quantity
and the synthesis models do not reflect this parameter. As Joel Ryan wrote,
"Physical effort is a characteristic of the playing
of all musical instruments. Though traditional instruments have been greatly
refined over the centuries, the main motivation has been to increase ranges,
accuracy, and subtlety of sound and not to minimize the physical. Effort
is closely related to expression in the playing of traditional instruments.
It is the element of energy and desire, of attraction and repulsion in
the movement of music. But effort is just as important in the formal construction
of music as in its expression: effort maps complex territories onto the
simple grid of pitch and harmony. And it is upon such territories that
much of modern musical invention is founded."
 
Watanabe and Yachida reflect this idea about emphasis using
the word degree:
"Methods such as DP matching, HMMs, neural networks,
and finite state machines put emphasis on classifying a kind of gesture,
and they cannot obtain the degree information of gesture, such as speed,
magnitude and so on. The degree information often represents user’s attitude,
emotion and so on, which play an important role in communication. Therefore
the interactive systems should also recognize not only the kind of gesture
but also the degree information of gesture."
- 
The Constraints of an instrument are the aspects of
their design that force the human performer to gesture or pose in a particular
way. Some sensor-based instruments have the problem of being overly-constrained;
the nature of their sensing mechanisms force the performer to make contorted,
unnatural gestures to achieve particular effects. All instruments by their
very natures will impose some constraints on their performers, but in order
for the instrument to be successful the constraints have to limit the physical
movement in such a way as to not hamper the expressive capabilities of
the performer.
- 
Cartesian Reliance is the property of an instrument
whereby the different degrees of freedom are set up in a literal, Cartesian
coordinate space for the performer. An example of such a device would be
a mixing board, where the individual levers each move in one direction
and map to one quantity. My strong feeling is that for a musical instrument
to be usable by a performer, it should not rely too heavily on the Cartesian
paradigm. That is, humans do not naturally perceive of the world in Cartesian
terms; our intuitive ways of gesturing are often invariant to aspects such
as translation, rotation, and scale. In the design of interactive systems
for people, "the important thing is to not leave motions in Cartesian terms."
Certainly the expressive information in conducting is not contained in
the absolute positional information, and this doctoral project has shown
that quantities such as muscle tension can sometimes be more useful than
gathering the accelerational information from the position of the moving
arm. The common reliance on orthogonal vertices doesn’t map well to meaningful,
natural musical responses. For example, in most traditional music, pitch
and loudness are coupled -- they can’t just be manipulated as independent
parameters. Unlike car controls, there is no inherent meaning in increasing
a single value independently. While the Theremin got away with it, I think
orthogonal relationships are just too simplistic and arbitrary; they make
the engineering easier, but force the musician to conform to an unnatural
structure.
8.1.1 The Disembodiment ProblemDisembodiment is a property of many sensor-based
instruments that don’t themselves include the source of their own sound.
Unlike with traditional instruments, where the physical action is what
generates the vibrations, many sensor instruments are indirectly connected
through a long chain of processes to the actuation of its sounds. Many
sensor instruments have no resonant cavity and depend upon synthesizers
and speakers to convey their sounds; the sounds come out of a set of speakers
that might be physically far removed from the actions that generate them.
A second component of the disembodiment problem originates from the mapping
layer that separates the transduction and actuation processes of interactive
music systems. Mappings allow for any sound to be mapped to any input arbitrarily,
and the extreme freedom and range of possibility makes it hard to construct
mappings that look and sound "real" to an audience.
It is still not well understood how to construct mappings such that
they intuitively map well to an action; this is because interactive music
is still an extremely new art form. Instruments like the Digital Baton
were extremely sensitive to the mapping problem; I think that this was
because of certain properties of its sensory system -- the 2D space in
front of the performer was interpreted as a large grid, divided into an
arbitrary number of cells that acted as triggers. This method worked well
algorithmically, but frequently confused audiences because they could not
see the virtual grid in front of the performer and were not able to connect
the actions of the performer with the musical responses that they heard.
It might be said that this causes alienation -- that is, the nature of
the instrument made it especially difficult to construct mappings that
sounded "embodied."
The problem of disembodiment resembles the situation of conducting,
where the performer gestures silently and an external source generates
the sound. Therefore, it might be said that conducting is, by definition,
disembodied, and provides a useful model for sensor-based systems. For
this reason it was decided not to add a local source of auditory and tactile
feedback to the jacket. Instead, the Conductor’s Jacket project attempted
to address this issue by bringing the sensors closer to the source of the
physical gestures, namely in the performer’s physiology.
The disembodiment problem can also cause audiences to become detached,
particularly if they can’t identify how the performer is controlling the
sounds. David Rothenberg described this property as the fault of tape music
when he wrote:
"One answer is to combine live performance with computer
performance, but as we have noted, the computer and synthesizer always
initiate various levels of distance between performer and sound which create
barricades to convincing and engaging performances. There is less to observe
as the player plays -- less danger, less physical stress in the playing.
Listeners soon learn this, so it is up to performers of electronic and
computer-controlled instruments to develop new ways of making their performance
more convincing and engaging."
Morton Subotnick continued this idea by suggesting that the
primary reason to have live performances is for the identification that
an audience has with a soloist on stage. In order for the audience to make
that identification, Subotnick stressed, it was very important to create
a clear relationship between the soloist’s gestures and the sound:
"The soloist is the carrier of the information, and
it isn’t the music. The music is there for him or her, and the audience
is witnessing that person. If you don’t do that, then you are missing a
great deal -- the reason for that person to be on the stage...it is terribly
important that we identify with the soloist on the stage. And you can’t
identify if you don’t know what the player is controlling. If you focus
on the player, that player has to have something that is appropriate to
what they are doing. Otherwise, you have no reason to give that material
to them."
Another composer, Atau Tanaka, suggests that the ambiguity
of the relationship between gesture and sound is an aspect that can be
used creatively to generate dramatic tension:
"The audience must distinguish who is playing what.
At some moments it is clear, and there are other moments where it is unclear.
We can play with this ambiguity. It is a kind of natural reaction on the
part of the audience to try to make a connection between the physical gesture
they see and what they hear. However, to do so is difficult, because these
sounds are unknown. These are abstract computer-generated sounds, whereas
with acoustic ensemble music there is always some prior knowledge of how
the individual instruments sound."
I feel that it is a problem if the audience does not understand
what is going on in the performance and expresses confusion. Disembodiment
is not a property to be cultivated in a musical instrument. But
methods for removing the disembodiment problem remain elusive. Chris
Van Raalte, who built and performs with the BodySynth, called this
the ‘get-it factor’ and modestly admitted that he does not yet know how
to achieve it. Bean also wrote about this problem in a recent article about
new electronic instruments:
"Making the process of creation and the resulting music
compelling enough to bring out of the studio and onto the stage is certainly
a challenge. Even more difficult is communicating to an audience the causal
relationship between subtle physical movements and sound so that people
can comprehend the performance."
8.1.2 Mimesis
Another issue with the design of sensor-based instruments
is that of Mimesis, or the degree to which the new instrument resembles
or behaves like a traditional instrument. For example, electric guitars
and MIDI keyboards have a high degree of Mimesis; they closely resemble
their acoustic predecessors. Arguably, this is why they have been much
more commercially successful than the more recent examples such as the
Miburi and BodySynth. There is a performance tradition from
the traditional guitar that carries over nicely to the electric guitar,
whereas there is no strong model for what to do with a Miburi that can
be adopted from the past. The entire set of expectations for how to use
it and how it should sound have to be invented.
Using traditional models helps us make choices about the
limits of the technology, but also constrain how we think about the instrument.
John Cage criticized this mimetic adherence to old models:
"Most inventors of electrical musical instruments have
attempted to imitate eighteenth- and nineteenth-century instruments, just
as early automobile designers copied the carriage. The Novachord and the
Solovox are examples of this desire to imitate the past rather than construct
the future. When Theremin provided an instrument with genuinely new possibilities,
Thereministes did their utmost to make the instrument sound like some old
instrument, giving it a sickeningly sweet vibrato, and performing upon
it, with difficulty, masterpieces from the past. Although the instrument
is capable of a wide variety of sound qualities, obtained by the turning
of a dial, Thereministes act as censors, giving the public those sounds
they think the public will like. We are shielded from new sound experiences."
I agree with Cage that new instruments have a new set of
unique possibilities that can be damped by relying too heavily on re-creating
and serving the expectations of the past. I also think that the properties
of musical conducting devices should not always try to copy conducting
in a literal way. A question that I received recently reflected this concern:
Are you ever concerned that by orienting yourself towards
the existing musical expressive vocabulary of conventional performers (i.e.
conductors), and designing interfaces to respond to this vocabulary, you're
missing the chance to use new musical interfaces to expand the musical
vocabulary? That you're trying to mock up a virtual model of what classical
musicians do, rather than exploring what's appropriate for the new tools
of computer-generated music?
My response to this question is that the primary reason to
do the Conductor’s Jacket was to study existing musical vocabularies
in order to better understand how to build new systems. The premise is
that if we can understand how people have historically expressed themselves
(by inventing and using gesture-systems such as conducting), then we can
build technological tools that extend that expressivity. The new electronic
systems don’t have to mime conducting, particularly if the performer is
not gesturing in front of 50-100 musicians. Conducting is just one example
of an expressive gesture-language; the point of the Conductor’s Jacket
project has not been to focus on an old tradition for the sake of replicating
it, but rather to leverage from it to a new gesture-language which uses
some of our intuitive (or possibly innate) methods of expression. So a
Mimetic system is not the optimal end result of this doctoral project;
instead, perhaps a perfect final system would be one that transforms and
updates the idea of conducting.
8.1.3 Traditional vs. Digital Instruments
The biggest difference between traditional instruments
and digital instruments is in what Barry Schrader called the action/response
mechanism. With traditional instruments, the action/response relationship
is very clear; the musician makes a gesture and the sound is affected in
some way. The relationship is based on the rules of physics, which we may
not understand cognitively but we have assimilated intuitively. As Schrader
writes, "the art of ‘playing’ an instrument is that of creating a series
of meaningful action/response associations." How these associations become
meaningful remains mysterious, but we might assume that their constancy
and repetition over time solidifies the shared expectations of the artists
and the audience, much in the manner of operant conditioning.
With digital instruments, those relationships are not
so clear. In fact, it is often difficult to design a situation such that
the relationships are clear. It takes a lot of knowledge of the
intuitive expectations of the audience, as well as the understanding and
skill of the performer, in order to make those relationships work.
Also, the art form is so new that no one quite knows what the rules are,
or what to expect. Joel Ryan suggests that the physicality of the performance
interface often helps in this process; he describes that the affordances
of the object help stimulate the imagination about how it might be used
digitally:
"The physicality of the performance interface helps
give definition to the modeling process itself. The physical relation to
a model stimulates the imagination and enables the elaboration of the model
using spatial and physical metaphors. The image with which the artist works
to realize his or her idea is no longer a phantom, it can be touched, navigated,
and negotiated with. In some cases it may turn out that having physical
‘handles’ in the modeling process is of even more value than in performance."
The ultimate test of whether or not we get these mappings
‘right’ will be when a performer is able to think idiomatically
with an instrument "the way a pianist thinks with a keyboard, or a guitarist
thinks with a guitar neck"
8.1.4 Distinctions between Musical and
Physical Gesture
"the gestures which music embodies are, after all, invisible
gestures; one may almost define them as consisting of movement in the abstract,
movement which exists in time but not in space, movement, in fact, which
gives time its meaning and its significance for us."
John Harbison once commented to me that musical gesture is
not necessarily the same as physical gesture; that the two types of gesture
may be related in the performance of music, but that relationship may not
necessarily be quantifiable. But given that musical gestures can be conveyed
from conductor to musicians via physical gestures, then, it must be possible
that musical gesture can be communicated. The relationship between the
two quantities is probably not one-to-one, but rather encoded in the substructure
and pulse structure of the gesture. Denis Smalley makes a complex argument
that musical gesture and physical gesture are linked, though the apprehension
and expectation of people who naturally associate emotions with certain
energy-motion trajectories. In many ways, his ideas resemble Clynes’ Sentic
curves. He writes:
 
"Traditionally, musical gesture involves a human agent
who, sometimes using mediatory implements, acts physically on sounding
bodies by fingering, plucking, hitting, scraping and blowing. These gesture-types
harness energy and motion through time: a controlled, physical motion at
a varying, energetic rate results in the excitation of a sounding body
and the shaping of a spectromorphology. Everyone has daily experience of
gestural activity and is aware of the types of consequences of the energy-motion
trajectory. Gestural activity is not only concerned with object play
or object use but also enters into human relationships: it is a gesture
that wields the ax and it is a gesture that expresses the intimate caress.
The energy-field can vary between extremes of force and gentleness, and
in the temporal domain can be very sudden of motion, or evolve more slowly.
Broadly defined, human gesture is concerned with movement of the body and
limbs for a wide variety of practical and expressive reasons; it is bound
up with proprioceptive (kinesthetic) perception of body tensions and therefore
with effort and resistance. However, the indicative field does not stop
with a physical act since tension and resistance also concern emotional
and psychological experiences. Thus in music there is a link between the
energy-motion trajectory and the psychological apprehension
of sounding contexts even when physical gesture is not present."
 
 
 Chapter 8.2