5.2 Hypotheses of Expression

5.2.1 Efficiency

"The best conducting technique is that which achieves the maximum musical result with the minimum of effort." The most basic theorem of expression has to do with the efficiency of the gesture. The physical definition of efficiency is the ratio of the energy output to the energy input of a system. In general, the more expert the performer is, the more efficient she is in the mechanics of the performance. She learns over time to expend less effort on activating individual notes or indicating regular tempos. Conversely, when indicating diversions or changes from a normal state, she becomes purposefully less efficient in the motion and tension in her gestures.

That is, in general, conductors tend to economize their gestures under normal, unchanging conditions, and add effort when conditions change. They can do this because of certain common-sense assumptions they share with the human musicians in the orchestra. For example, it is expected that a tempo will remain roughly constant during a passage of music, and so once it is established less effort is needed to indicate it. It was shown in Chapter Four that during long passages at a constant tempo, the EMG signals decreased to the point where the beats were nearly undistinguishable. That is because the musicians are operating under normal, unchanging expectations, and therefore it is appropriate to reduce the size of the gesture and become more efficient. Once they respond to a tempo signal by playing at that tempo, the maintenance of the signal is not crucial, particularly if the musicians are skilled. Other phenomena, such as the ‘flatlining’ effect, operate according to the efficiency principle.

The efficiency principle doesn’t mean that the performance sounds metronomic, because the orchestra and conductor will also assume a certain amount of inflection in the tempo where they have a shared (if intuitive or unconscious) understanding about what is expected. These culturally acceptable, minimal inflections are part of the performance tradition of western music and therefore I would not call them expressive, but rather, musical. Expression happens in the investment of effort and the divergence from economy, efficiency, and sometimes, clarity.

One example of the efficiency theory is that signals are kinesthetically linked. For example, the movement of the arms is what generates the breath – an upward gesture is (causes) an inhalation, and a downward gesture is (causes) an exhalation. The respiration signal exhibits features that correlate with the biceps EMG, but also contain other information. That is, the respiration signal gives a lower-frequency view of the gesture, which gives more of the overall shaping of the phrase. The biceps EMG tends to give the beat-level phenomena, which reflects more of a quantum-level view. Both signals complement each other.

5.2.2 Intentionality

A corollary of the efficiency theorem is that expression is necessarily intentional. "Expressive intention," a phrase that is often used by musicologists, is redundant. As I showed in Chapter Four, intentional signals convey far more information than non-intentional signals; that is, the features are clearer and have a more continuous envelope. The EMG signals for page turns and scratching are fuzzy and indistinct, whereas the EMG signals for beats show up in relief. Volitional, intentional signals like Respiration and EMG seem to correlate closely with musical expression, whereas Heart Rate, temp, and GSR don’t. It may be that this phenomenon is related to the ‘Duchennne smile,’ where the true expression of joy engages the muscles differently than a fixed, forced smile does. The conducting data from this study supports the hypothesis that when one performs an action with intention (or expression), the muscles are engaged uniquely and differently from unintended (unexpressive) actions.

5.2.3 Polyphony

Another observation that is significant for the issue of meaning in expressive music is the phenomenon of gestural polyphony. Polyphony is defined as "music that simultaneously combines several lines," and much of Western music from the ninth century onwards can be said to be polyphonic. One of the high points in the development of polyphony (before it evolved into counterpoint in the 17th century) was the 4-part mass and motet style used by Guillaume de Machaut in the 14th century. In this style, the lowest voice, called the "tenor" (from the Latin tenere, "to hold"), holds the slow-moving liturgical chant melody. The three upper voices (usually called the "countertenor," the "motetus," and the "triplum") get increasingly higher in pitch, more elaborate in melodic contour, and more rhythmically active. This four-part polyphonic structure, slowest at the bottom and most florid at the top, is analogous to the distribution of motion between the structures of the body of a conductor.

That is, while gesturing, conductors indicate different levels of rhythmic and dynamic structure with muscle activation patterns in the different limbs. In both the Conductor’s Jacket data and more recent informal observations I’ve made with the real-time system, I have found that the movements of the trapezius muscle in the shoulder seem to reflect the fundamental underlying structure, over which the other muscle groups add increasingly intricate expressive material. The level of structure in which these muscle groups respond seems to have a direct relationship with their size and distance from the torso. For example, the trapezius muscle of the shoulder seems to be activated every two or four bars, roughly on the level of the phrase, whereas the biceps muscle is active every beat, and the forearm extensor muscle gives the internal structure of the notes within that beat (such as articulation and sustain), and the hand gives occasional, spikey, indications of small accents and energy. It has long been known that the activation

frequency of a muscle fiber is dependent upon its length (that is, bigger muscle fibers fire at lower frequencies than smaller ones), and that smaller appendages are used for quicker events (i.e., fingers are used to play notes on a piano or violin, whereas arms are used for larger, slower things), but somehow this division of labor across the major areas of the arm was not anticipated to map so directly with frequency of event. Also, the animation researcher Ken Perlin has shown a similar phenomenon with his own work: in order to make animated movements look more natural and realistic, he adds a certain baseline frequency of noise to the largest joints and then doubles that frequency for each successive joint.146

This polyphony phenomenon involving muscle groups and events at different frequencies resembles the division of voices in a chorus. There are numerous contrapuntal lines being activated in the muscles of the arms at all times. The larger muscle groups seem to take care of the lower-frequency events (in much the same manner as the bass voices in a choir sing the lower and slower notes), whereas the smaller muscle groups seem to indicate the higher-frequency events (analogous to the soprano voices taking the higher pitches and faster notes). All gestures given by a conductor are highly correlated and similar, but exhibit important differences. They are giving direction on many different levels simultaneously.

The basic subdivisions, as discussed in point nine in Chapter Four, are the shoulders, the upper arms, the forearms, and the hands. I originally assumed that the lateral deltoid area of the shoulder would reflect the overall shoulder movement, but it turns out that its signal is highly correlated with the biceps contractions. Then, when I tried the trapezius, it turned out to have different activation patterns from the biceps. These patterns had much more to do with the overall activation of the arm and the vertical lifting and lowering of the upper arm, as happens with extreme changes or structural points where extra emphasis is needed. The trapezius muscle is not necessarily engaged when a beat is made, and seems to be active mostly on the phrase-unit level. Unfortunately, at the time of my experiments, I didn’t realize this and therefore didn’t collect any data on the trapezius activity of my conductors. The biceps muscle is crucial for the beat-level activity; it is the main generator of the action of generating a beat; it seems as if the falling of the arm is due almost entirely to gravity (since the triceps muscle doesn’t engage much in the pre-beat falling of the arm), whereas at the moment the biceps muscle engages, the arm stops accelerating downwards and ultimately rebounds up as the biceps tension increases (this is followed by a small triceps signal, which seems to moderate the biceps activity). The biceps is therefore active on the beat level. The forearm extensor muscle seems to be active much more on the individual note level; it seems to indicate articulations and smaller, note-level phenomena like timbre and sustain. When my subjects wanted quicker, lighter articulations they would use this muscle much differently than if they wanted sustained, legato sounds. The opponens pollicis muscle in the thumb and palm seems to be for quick, discrete phenomena on the note level; not quite for sustained events, but more for accents and discrete, quantized, digital events. Again, as with the trapezius, I was not able to gather conductor data for this muscle, but I ended up incorporating it into the synthesis system with the Gesture Construction.

The division of labor between these different muscle groups is crucial, because they each have their own frequencies wherein they act best, and the smaller fibers contract at higher frequencies. Therefore, the four-part vocal model is a good one, because it essentially has the same ratios across all four elements. This would also corroborate a result I found in a 1995 study of piano interpretations, that pianists treated the different four registers of a Bach Prelude with very clearly delineated tempo and dynamics profiles. This happened on a sort of logarithmic scale, as in the early motets: whole notes, quarter notes, eighth-notes, and quarter-note triplets, in a ratio of 12:3:1.5:1.

5.2.4 Signal-to-Noise Ratio of Expertise

Based on the data I collected, it seems as if experienced conductors have a higher signal-to-noise ratio in the content of their gestures than do students. This seems to come from two sources: reduced sources of noise (fewer extraneous gestures with little information content), and an increased clarity of signal. For example, as was shown above, the students tended to have much ‘noisier,’ more active EMG signals in places where the activity didn’t necessarily make sense (i.e., they would scratch themselves, adjust their stands, and give stronger beats when the music wasn’t changing), while having reduced clarity in places where the signal was needed, as in legato passages. This suggests that gaining expertise involves two things: learning to reduce noise (i.e., suppressing the amplitude and frequency of non-informative signals) and learning to amplify and clarify signal (i.e., optimally conveying the most informative signal, and giving the changes more vividly than the repetitions).

This phenomenon is consistent with Manfred Clynes’ note that it is "possible to alter degrees of inhibition, readiness to express, and the selection of particular motor functions of the body for expression." That is, the students’ lack of clarity in their signals might reflect inhibition or incorrect selection of motor functions. It might turn out that certain adaptive filters could be used to determine the relative signal-to-noise ratio for each conductor. A simple analysis of the signal sources would determine if the noise sources had a normal (Gaussian) distribution; if not, then perhaps a Kalman filter might be able to characterize some of the noise and predict it or filter it out.

5.2.5 Tri-Phasic Structure of Communicative Gestures

It appears that in order to have meaning, a gesture must start from a particular place, present its content, and then return to the same place. It seems as if it gains its significance from the context established by the initial and ending conditions, and when those conditions are identical, then they provide a solid baseline from which to interpret the content. The theorist Francis Quek observed the same phenomenon in 1993 and wrote that if a hand moves from a spot, gestures, and returns to that spot, that the gesture was likely to intentionally convey meaning. He concluded that there are three phases for natural gestures: preparation, gesticulation, and retraction. Perhaps these phases are echoed in other forms of human behavior or communication, such as the processional and recessional marches in a wedding service (it would seem odd if the bride and groom ducked out a side entrance at the end of the ceremony), the outer panels of a triptych framing the central subject, and the beginning and ending of a classical symphonic score being in the same key while the inner content is free to modulate.

Conducting beats also conform to this notion; All beats consist of a preparation (accelerational) phase, an inflection point (where the maximum force is generated in the abrupt change in direction and acceleration/velocity), and a post (decelerational) phase. The only thing that differentiates them is the direction in which they are given. The traditional term tactus refers to the falling and rising motions that make up a beat. There is always a preparation phase where the beat leaves the previous beat and heads toward the new ictus. Even if the beat pattern causes the hand to go to a new place after gesturing, the understanding is that this new place is a neutral preparation place for the transition to the next beat.

Finally, it seems that it is is precisely the trajectory of the preparation and retraction phases in a beat that set up its qualitative associations. It is the velocity of the first half of the preparation phase that tells the orchestra when to expect the beat, and it is the emphasis (tension) of the entire preparation phase that tells the musicians how loudly to play it. Ultimately, the actual moment of the beat (tactus) is not very significant at all; by the time it comes, it is too late to be of much information to the musicians, but it establishes the quantum of information that allows them to adjust tempo and be able to anticipate the next one.

5.2.6 Bi-Phasic Pulse Structure

At successively larger time-scales from the tri-phasic beat structure can be found the layers of pulse. Pulse in music can be understood as the alternation between feelings of heaviness (emphasis, or tension) and lightness (non-emphasis, or repose). Pulse is therefore a binary system composed of heavy and light modes, analogous to the send and receive modes that allow a feedback loop to adjust and stay in balance. However, unlike other binary systems, pulse runs on many levels simultaneously, such as the beat (in the ictus), the bar (where it defines the meter), the phrase, and the section. For example, upbeats are usually in the light mode, and downbeats are usually heavy. I found evidence in the EMG and Respiration signals of the conductors that indicate patterns of pulse structure.

5.2.7 Evolution of Conducting Gestures

Some of the findings of the Conductor’s Jacket project have caused me to hypothesize about how the language of conducting evolved. It can be assumed that competing styles and systems went through a process resembling natural selection where the fittest survived. Presumably, the fittest would have been those that the musicians found to be the clearest and most information-rich. We do have some historical understanding about early conducting gestures – the earliest account comes from descriptions of the French composer and conductor Lully, who used a large wooden staff and would bang it on the floor to keep time. Ironically, he stabbed his foot with it and died of the resulting gangrene. In addition to the dire afflictions of this method, it can be presumed that it lost favor because the movement of the arm holding the large staff could not have been very visible to the players. Later conductors had a double role as the concertmaster, or principal violinist, who would conduct while playing by using very exaggerated gestures with his bow. When orchestras became very large in the second half of the ninteenth century, this method became too flimsy, and an independent conductor became de rigeur.

5.2.8 Unidirectional Rate Sensitivity

Before writing Sentics, Manfred Clynes formulated a biological/neurophysiological law of Unidirectional Rate Sensitivity, which holds that sensory information is perceived more acutely under changing conditions than under static conditions. That is, our sensory systems are tuned to pay attention to deltas, or changes, and that situations which lack change become almost unnoticeable over time. The second part of the law is that increasing and decreasing changes are sensed and controlled by different channels; we perceive things like heating and cooling through two different biological mechanisms, which can operate at different rates. For example, the dilation response in pupils is much slower than the contraction response. The first part of this law is important for music; it reinforces point number five that I made in the last chapter, namely that repetitive signals are minimized until new information appears. If we apply the first part of the law of Unidirectional Rate Sensitivity, we can see why conductors will do this – for example, if they have a section of music where a four-bar phrase is repeated four times, if they continue to give signals of the same amplitude then the musicians will no longer need to look at them and will become bored. However, if they decrease the size of their gestures during that passage, then the musicians will notice a change and stay focused and energetic in order to track the diminishing amplitude of the gestures. This not only keeps the energy level in the music high, but also keeps the attention with the conductor so that if a change is imminent then he has everyone’s attention.

5.2.9 Musical Flow State

One of the most satisfying musical experiences happens when a musician knows a piece so well that the technique happens as if without effort, and all the cognitive energy can go to the blissful, Dionysian state of free expression. I assume that this state is closely related to the phenomenon described as ‘flow’ by Mihaly Csikszentmihalyi; this refers to the pleasurable state when one can focus utterly on a task without interruption. I think that the ‘flow state’ in music happens when the neurological system reaches a state of ‘facilitation.’ This is the neurological effect that happens when sound is not processed cognitively by the brain but rather is translated directly into electrical signals and connected to the spine. The excitation at the spine causes the motor neurons in the skeletal muscles to depolarize in time to the music, which in turn causes the ‘facilitation’ effect. The music must be in a particular tempo range for this effect to take place. I think that the Dionysian state in music happens when the muscles are well-trained and can handle the challenge of the performance easily, and when the body goes into a neurological flow state. The rapturous expression that very rarely happens (but which all musicians can describe) might be understood in this way.

 Chapter 5.3