1.3 Motivation
"While the human hand is well-suited for multidimensional
control due to its detailed articulation, most gestural interfaces do not
exploit this capability due to a lack of understanding of the way humans
produce their gestures and what meaning can be inferred from these gestures."
The strongest motivation for me to begin this project was
the enormous difficulty I encountered in previous projects when attempting
to map gestures to sounds. This was particularly true with my Digital
Baton project, which I will discuss in detail in section 1.3. Secondly,
a glaring lack of empirical data motivated me to gather some for myself.
A visit to Professor Rosalind Picard in 1996 yielded some new ideas about
how to go about designing a data collection experiment for conductors,
which eventually we implemented in the Conductor’s Jacket project.
As far as I know, there have been no other quantitative studies of conductors
and gesture. Even in other studies of gesture I have not come across the
kind of complex, multidimensional data that were required to describe conducting.
Ultimately, I came to the realization that many music researchers were
going about solving the problems in the wrong way; they were designing
mappings for gestural interaction without really knowing what would map
most closely to the perceptions of the performer and audience. I felt that
the right method would be to study conductors in their real working environments
without changing anything about the situation, and monitoring the phenomena
using sensors. This empirical approach informed the entire process of the
thesis project.
In this section I also discuss my major influences in
Tod Machover’s Hyperinstruments and Brain Opera projects.
It was through my opportunities to participate in the research, performance,
and public education aspects of these projects that I was able to make
many of the observations that I express in this thesis. After describing
aspects of the Brain Opera and the Digital Baton, I go on
to explain why I have chosen conducting as the model to study, and why,
in some ways, it is a bad example. Finally, I discuss the higher-level
aspects of musicianship, the interpretive trajectories that performers
take through musical scores, and the rules and expectations that determine
a musician's skill and expressiveness.
1.3.1 Hyperinstruments, the Brain
Opera, and the Digital Baton
Beginning in 1987 at the MIT Media Lab, Professor Tod
Machover and his students began to bring ideas and techniques from interactive
music closer to the classical performing arts traditions with his Hyperinstruments
project. About his research, Machover wrote:
"Enhanced human expressivity is the most important goal
of any technological research in the arts. To achieve this, it is necessary
to augment the sophistication of the particular tools available to the
artist. These tools must transcend the traditional limits of amplifying
human gestuality, and become stimulants and facilitators to the creative
process itself."
Among the more popular and enduring of the resultant family
of hyperinstruments have been the Hyperviolin, the Hypercello,
and the Sensor Chair, all of which were designed for expert and
practiced performers. For its time, the Hypercello was among the
most complex of real-time digital interfaces; it measured and responded
to five different continuous parameters: bow pressure, bow position, bow
placement (distance from bridge), bow wrist orientation, and finger position
on the strings.
In 1994, Tod Machover began developing the Brain Opera,
perhaps the largest cutting-edge, multidisciplinary performance project
ever attempted. A digital performance art piece in three parts that invited
audiences to become active participants in the creative process, it premiered
at Lincoln Center’s Summer Festival in July of 1996 and subsequently embarked
on a world tour. During the following two years it was presented nearly
180 times in major venues on four continents. I’m proud to have been a
member of the development and performance teams, and think that our most
important collective contributions were the new instrument systems we developed.
In all, seven physical devices were built: the Sensor Chair, Digital
Baton, Gesture Wall, Rhythm Tree, Harmonic Driving, Singing/Speaking Trees,
and Melody Easel. Those of us who were fortunate enough to have
the opportunity to tour with the Brain Opera also had a chance to
observe people interacting with these instruments, and got a sense for
how our designs were received and used by the public.
My primary contribution to the Brain Opera was
the Digital Baton, a hand-held gestural interface that was designed
to be wielded like a traditional conducting baton by practiced performers.
It was a ten-ounce molded polyurethane device that incorporated eleven
sensory degrees of freedom: 3 degrees of position, 3 orthogonal degrees
of acceleration, and 5 points of pressure. The many sensors were extremely
robust and durable, particularly the infrared position tracking system
that worked under a variety of stage lighting conditions. First suggested
by Tod Machover, the Digital Baton was designed by me and built
by Professor Joseph Paradiso; it also benefited from the collaborative
input of Maggie Orth, Chris Verplaetse, Pete Rice, and Patrick Pelletier.
Tod Machover wrote two pieces of original music for it and we performed
them in a concert of his music in London’s South Bank Centre in March of
1996. Later, Professor Machover incorporated the Baton into the Brain
Opera performance system, where it was used to trigger and shape multiple
layers of sound in the live, interactive show. Having designed and contributed
to the construction of the instrument, I also wielded it in nearly all
of the live Brain Opera performances.
Figure 1. The Digital Baton, February 1996.
Despite the high hopes I had for the Digital Baton
and the great deal of attention that it received, however, it ultimately
failed to match the expectations I had for it. Perhaps because I had helped
to design the device and its software mappings and then had the opportunity
to perform with it, I became acutely aware of its shortcomings. From my
experience, its biggest problems were:
The baton’s size and heaviness were not conducive to
graceful, comfortable gestures; it was 5-10 times the weight of a normal
cork-and-balsa wood conducting baton. A typical 45-minute gestural Brain
Opera performance with the 10-ounce Digital Baton was often
exhausting. This also meant that I couldn’t take it to orchestral conductors
to try it out; it was too heavy for a conductor to use in place of a traditional
baton.
Its shape, designed to conform to the inside of my palm,
caused the wrist to grip in a fixed position. While this made it less likely
that I might lose contact with and drop it (particularly when individual
fingers were raised), it was not ideal for the individual, ‘digital’ use
of the fingers.
Its accelerational data was problematic, since the accelerometers’
signal strength decreased nonlinearly as they rotated off-axis from gravity.
Theoretically, with enough filtering/processing, beats can be extracted
from that information, but I had trouble recognizing them reliably enough
to use them for music. This was disappointing, since accelerometers seemed
very promising at the outset of the project.
I initially thought that the Digital Baton’s musical
software system should capture and map gestures into sound in the way that
an orchestra might interpret the movements of a conductor; this turned
out to be incredibly difficult to implement. It was particularly difficult
to imagine how to map the positional information to anything useful other
than fixed two-dimensional grids. I realized then that I did not have any
insight into how conducting gestures actually communicated information.
My simple models did not allow me to extract symbolic
or significant events from continuous signals. The event models I had for
the baton were too simple to be useful; they needed to use higher-order,
nonlinear models.
When the audience perceives a significant, expressive
event in the performer’s gestures, they expect to hear an appropriate response.
If it doesn’t occur, it confuses them. This causes a disembodiment
problem.23 In performances with the baton, it often wasn’t
obvious to audiences how the baton was controlling the sound.
The Digital Baton also suffered from the over-constrained
gesture problem; brittle recognition algorithms sometimes forced performers
to make exaggerated gestures in order to achieve a desired musical effect.
The majority of the problems I encountered with the Digital
Baton had to do with a lack of expressiveness in the mappings. At the
time I lacked insight and experience in mapping complex real-time information
to complex parametric structures. My first response to these problems was
to attempt to formulate a general theory of mappings, which resulted in
a scheme for categorizing gestures along successive layers of complexity.
This allowed for creating sophisticated, high-level action-descriptions
from a sequence of minute atoms and primitives, in much the same way that
languages are constructed out of phonemes. At the time I also thought that
defining a vocabulary of gestures, carefully constructed out of primitives
that conformed easily to the information stream coming from the sensors,
would be a first step. Ultimately, however, I realized that theorizing
about mappings would not help me solve the fundamental problems of the
Digital Baton. Instead, I decided to take a new approach to the
issues through an in-depth, quantitative, signal-based approach. The resultant
project, which is detailed in this dissertation, was motivated and designed
precisely with the previous problems in mind. The Digital Baton
may have disappointed me as an instrument, but that failure generated a
better concept with more scope for exploration and answers.
1.3.2 Why continue with conducting as
a model?
"Too much media art is offered up as performance these
days without awareness of the fact that it remains ungrounded in any performance
practice."
Despite the frustrations that I encountered with the Digital
Baton, I still felt that the powerful gestural language of conducting
was an area that might yield interesting results for sensor-based interfaces.
Conducting is a gestural art form, a craft for skilled practitioners. It
resembles dance in many ways, except it is generative, and not reflective
of, the music that accompanies it. Also, without an instrument to define
and constrain the gestures, conductors are free to express themselves exactly
as they wish to, and so there is enormous variety in the gestural styles
of different individuals.
In addition, conducting is a mature form that has developed
over 250 years and has an established, documented technique. The gesture
language of conducting is understood and practiced by many musicians, and
is commonly used as a basis for evaluating the skill and artistry of conductors.
In order to be able to understand the meaning and significance of gestures,
it helps to have a shared foundation of understanding. The technique of
conducting conveniently provides such a foundation in its widely understood,
pre-existing symbol system.
One reason to do use older techniques is because they
allow us to have performances by expert, talented musicians instead of
inventors; inevitably, the result is stronger. Secondly, there are many
subtle things that trained musicians do with their gestures that could
be neatly leveraged by sensor systems. As Tod Machover wrote,
"one must consider if it is easier for the person to
use the technique that they know, or perhaps examine another way to control
the musical gesture…the smart thing to do is keep with the technique that
can evolve slowly, no matter how far away the mapping goes."
I agree with Professor Machover that with the established
technique as a model, one can slowly develop and extend it with sensor-based
systems. For example, some future, hybrid form of conducting might keep
the basic vocabulary of conducting gestures, while sensing only the degree
of verticality in the conductor’s posture. Such a system might use his
posture to detect his interest and emotional connection to the musicians,
and use the information to guide a graphical response that might be projected
above the orchestra.
1.3.3 Conducting Technique
While styles can vary greatly across individuals, conductors
do share an established technique. That is, any skilled conductor is capable
of conducting any ensemble; the set of rules and expectations are roughly
consistent across all classical music ensembles. Conducting technique involves
gestures of the whole body: posture in the torso, rotations and hunching
of the shoulders, large arm gestures, delicate hand and finger movements,
and facial expressions. Conductors’ movements sometimes have the fluidity
and naturalness of master Stanislavskian actors, combined with musical
precision and score study. It is a gestalt profession; it involves all
of the faculties simultaneously, and cannot be done halfheartedly. Leonard
Bernstein once answered the question, "How does one conduct?" with the
following:
"Through his arms, face, eyes, fingers, and whatever
vibrations may flow from him. If he uses a baton, the baton itself must
be a living thing, charged with a kind of electricity, which makes it an
instrument of meaning in its tiniest movement. If he does not use a baton,
his hands must do the job with equal clarity. But baton or no baton, his
gestures must be first and always meaningful in terms of the music."
The skill level of a conductor is also easily discernable
by musicians; they evaluate individuals based on their technical ability
to convey information. The conducting pedagogue, Elizabeth Greene, wrote
that skillful conductors have a certain ‘clarity of technique,’ and described
it in this way:
"While no two mature conductors conduct exactly alike,
there exists a basic clarity of technique that is instantly -- and universally
-- recognized. When this clarity shows in the conductor’s gestures, it
signifies that he or she has acquired a secure understanding of the principles
upon which it is founded and reasons for its existence, and that this thorough
knowledge has been accompanied by careful, regular, and dedicated practice."
The presence of a shared set of rules and expectations, most
of which are not cognitively understood or consciously analyzed by their
practitioners, is a rich, largely untapped resource for the study of emotional
and musical communication.
Another reason to stay with the model of conducting is
that conductors themselves are inherently interesting as subjects. They
represent a small minority of the musical population, and yet stand out
for the following reasons:
-
they are considered to be among the most skillful, expert,
and expressive of all musicians
-
they have to amplify their gestures in order to be easily
seen by many people
-
they have free motion of their upper body. The baton functions
merely as an interface and extension of the arm, providing an extra, elongated
limb and an extra joint with which to provide expressive effects
-
their actions influence and facilitate the higher-level functions
of music, such as tempo, dynamics, phrasing, and articulation. Their efforts
are not expended in the playing of notes, but in the shaping of them.
-
conductors are trained to imagine sounds and convey them
ahead of time in gestures.
-
conductors have to manipulate reality; they purposefully
(if not self-consciously) modulate the apparent viscosity of the air around
them in order to communicate expressive effects. Two gestures might have
the same trajectory and same velocity, but different apparent frictions,
which give extremely different impressions.
Conducting itself is also interesting as a method for broadcasting
and communicating information in real-time. It is an optimized language
of signals, and in that sense is almost unique. Its closest analogues are
sign and semaphore languages, and mime. John Eliot Gardner, the well-known
British conductor, describes it in electrical terms:
"the word ‘conductor’ is very significant because
the idea of a current being actually passed from one sphere to another,
from one element to another is very important and very much part of the
conductor’s skill and craft."
Finally, conducting as a human behavior has almost never
been studied quantitatively, and so I wanted to use empirical methods to
understand it and push it in new directions.
1.3.4 Why conducting might
not be a good model for interactive music systems
Conducting is often associated with an old-fashioned,
paternalistic model of an absolute dictator who has power over a large
group of people. By the beginning of the eighteenth century when orchestras
evolved into more standard forms, this hierarchical model was generally
accepted in Western culture. But this model has come under increasing scrutiny
and disfavor with the emergence and empowerment of the individual in modern
societies. The notion that conductors have a right to be elitist, arrogant,
and dictatorial no longer holds true in today’s democratic world-view.
In fact, it seems that even some of the choices that have
been made in the development of protocols and standards for electronic
music have been informed by anti-conductor sentiments. For example, the
chairman of the group that developed the General MIDI standard had this
to say about what MIDI could offer to replace the things that were lacking
in classical music:
"The old molds to be smashed tell us that music sits
in a museum behind a locked case. You are not allowed to touch it. Only
the appointed curator of the museum -- the conductor -- can show it to
you. Interactively stretching the boundaries of music interpretation is
forbidden. Nonsense! The GM standard lets you make changes to what you
hear as if you were the conductor or bandleader, or work with you to more
easily scratch-pad any musical thought."
Secondly, many interactive music systems use the solo instrument
paradigm; they are designed to be performed by one player, in much the
same way that a traditional instrumentalist might perform on her instrument.
However, the model of conducting assumes that the performer is communicating
with other people; the gesture language has evolved in order to be optimally
visible and discernable by a large ensemble. As the conductor Adrian Boult
suggested, you only need the extra appendage of the baton if the extra
leverage buys you something by allowing you to communicate more efficiently
with others. Therefore it seems unnecessary to make large, exaggerated
gestures or use a baton when much less effort could be used to get the
computer to recognize the signal.
Thirdly, many conductors spend most of their time working
to keep the musicians together and in time, which is basically a mechanical,
not an expressive, job. In that sense their primary function is that of
a musical traffic cop. Finally, traditional conductors don’t themselves
make any sound, so the image of a conductor directly creating music seems
incongruous. It causes confusion in the minds of people who expect the
gestures to be silent. As a result, it is probably not ideal to redefine
the conducting baton as a solo instrument, since the result will cause
cognitive dissonance or disconnect in the audience. An alternative to this
would be to use a sensory baton like a traditional baton but extend its
vocabulary. That is, a conducting model should be used when an ensemble
is present that needs a conductor – the conductor will continue to perform
the traditional conducting functions, without overhauling the technique.
But she would also simultaneously perform an augmented role by, for example,
sending signals to add extra sampled sounds or cue lighting changes in
time to the music.
1.3.5 Interpretive variation as the key
to emotion in music
"Notes, timbre, melody, rhythm, and other musical constructs
cannot function simply as ends in themselves. Embedded in these objects
is a more complex, indirect, powerful signal that we must train ourselves
to detect, and that will one day be the subject of an expanded notion of
music theory."
From the performer’s perspective, the thing that makes live
performances most powerfully expressive, aside from accuracy and musicianship,
is the set of real-time choices they make to create a trajectory through
the range of interpretive variation in the music. Techniques for creating
this variation involve subtle control over aspects such as timing, volume,
timbre, accents, and articulation, which are often implemented on many
levels simultaneously. Musicians intentionally apply these techniques in
the form of time-varying modulations on the structures in the music in
order to express feelings and dramatic ideas. Some of these are pre-rehearsed,
but some of them also change based on the performer’s feelings and whims
during the moment. Techniques for creating these trajectories of variation
involve subtle control over aspects such as timing, volume, timbre, accents,
and articulation -- sometimes implemented on many levels simultaneously.
Musicians intentionally apply these techniques in the form of time-varying
modulations on the structures in the music in order to express feelings
and dramatic ideas -- some of which are pre-rehearsed and some of which
change based on their own moods and whims.
This idea, while supported in the recent literature of
computational musicology and musical research, is perhaps controversial.
For one thing, some might argue that there is no inherent meaning in this
variation, since musicians are not able to verbally articulate what it
is that they do. That is, since people intuitively and un-analytically
perform these variations, then they cannot be quantified or codified. However,
it has been shown that there are rules and expectations for musical functions
like tempo and dynamics, and recent research has uncovered underlying structure
behind these variations. I describe the work of several scientists and
musicologists on this subject in Chapter 2.
Secondly, it might be countered that the dynamic range
of such variation is relatively small, compared with the scale of the piece.
For example, a very widely interpreted symphonic movement by Mahler might
only vary between 8 and 9 minutes in length. The maximum variability in
timing would reflect a ratio of 9:8 or 8:7 36. However, this
is perhaps an inappropriate level at which to be scrutinizing the issue
of timing variation – instead of generalizing across the macrostructure
of an entire movement, one should look for the more significant events
on the local, microstructural level. For example, rubato might be taken
at a particular point in a phrase in order to emphasize those notes, but
then the subsequent notes might accelerando to catch up to the original
tempo. Thus, on the macrostructural level, the timing between a highly
rubato phrase and a strict-tempo phrase might look the same, but on the
microstructural level they differ tremendously. Robert Rowe gave an example
of this by suggesting the comparison between two performances of a Bach
cello suite -- one with expression, and one absolutely quantized: "They
could be of exactly equal length, but the difference comes with the shaping
of phrases and other structural points. The issue is not 8 minutes or 9
minutes, but 1 second or 2 seconds at the end of a phrase."37
1.3.6 The Significance of Music for Us
"music is significant for us as human beings principally
because it embodies movement of a specifically human type that goes to
the roots of our being and takes shape in the inner gestures which embody
our deepest and most intimate responses. This is of itself not yet art;
it is not yet even language. But it is the material of which musical art
is made, and to which musical art gives significance."38
Having described the significance of interpretive variation
in musical structure, I have to also acknowledge that, for myself, the
significance of a great performance does not strictly lie in the microstructural
variation alone. Instead, I think that great performers are marked by their
abilities as storytellers and dramatists. Great musicians have the ability
to capture an audience’s attention and lead them spellbound through the
material.39 Of course, this is not something that could be easily
proven or discussed empirically. It might be that the dramatic aspect of
great performances could be modeled in terms of the microstructural variation,
but it’s far from clear that we could determine this. Another possibility
is that great performers hear the ratios between contrasting sections and
feel pulse differences more sensitively than others, or that the proportions
of the expressive relationships work out in fractal patterns. However,
it would be very difficult to measure this. Therefore, for practical purposes,
I chose not to study it. It’s possible that we may one day be able to explain
why one musician is masterful, and why another is merely earnest, but that
is beyond the scope of the present project.
"Music is that art form that takes a certain technique,
requires a certain logical approach, but at the same time, needs subconscious
magic to be successful. In our art form, there is a balance between logic
and intuition."40
Aside from the issue of quantifying the microstructural variations
and determining the ‘rules’ of musicality, there is another dimension to
music that must be acknowledged: the magical, deeply felt, emotional (some
might call it spiritual) aspect that touches the core of our humanity.
Many dedicated musicians believe that this aspect is not quantifiable.
I tend to agree. I also think that it is the basic reason why we as a species
have musical behaviors. And I think that our current technologies are not
yet, for the most part, able to convey this aspect.41 This is
one of their most damning flaws. However, I also think that if pieces of
wood and metal can be carefully designed and constructed so as to be good
conveyors of this magic, then there is no reason that we can’t do the same
with silicon and electrons. It just might take more time to figure out
how.
Chapter 1.4