7.4 Future Work

7.4.1 Analytical Improvements

The analytical results of the Conductor’s Jacket project point to the strong possibility for a rich new area of study. However, much remains to be done. Given that this limited study of six individuals yielded fourteen significant results in the data, with sixteen more hypotheses, it is reasonable to think that there are many more features to be found in similar future experiments. Another issue is that I did not go back to the conductors to get systematic self-report and commentary from them. While I had initially planned to do this, it seemed from early informal conversations that the subjects did not have much insight into the nature of their signals, and instead seemed to have preconceived notions about ‘successful’ signals and sought to evaluate their performances by these criteria. A future study might benefit from the use of questionnaires or debriefing sessions, however, where the conductor could review his videotape and data and talk about the aspects that they thought were most significant in the rehearsal or concert.

One limitation of the Conductor’s Jacket data analysis was its lack of statistical methods; since there were few subjects and no one normalizing factor between them all, I relied primarily on inter-subject comparisons. Future studies should increase the number of subjects and incorporate a normalizing factor. A good factor to use would be a single piece of music conducted by multiple individuals; this, if feasible, would allow for a much more extensive statistical analysis. Many more axes of comparison would be possible.

However, collecting this kind of data is extremely time-consuming and difficult, and it takes time and ingenuity to find willing subjects. So, even using the same data set that I have collected, many more analyses could be performed. For one thing, I have not yet described the relationship between the biceps and breathing signals; it would be interesting to correlate them so that I could remove the motion artifact from the breathing signal. One way to do this would be to perform a Pearson correlation on the two data streams. Another aspect that I did not investigate was P3’s positional data from the Polhemus system. The first, coarsest measurement that could be taken would be to look at P3’s posture (forward/backward and horizontal/vertical movement of the torso) and compare it with the content of the music. Since people tend to lean in when they are attracted and away when they are disgusted, perhaps this might yield insight into the conductor’s understanding of the affective content of the music.

Another measurement that I would like to explore is the range of the respiration signal -- particularly how the upper and lower extrema change over successive breathing cycles, and how their outer envelopes correlate with the musical structure. My sense is that a great deal of useful information could be gleaned from comparing shallow and deep intakes of breath and looking for musical structures in the score that might cause conductors to breathe differently. Such measurements could be readily taken with the data I already have, using a technique developed by Raul Fernandez for looking at the envelope of a blood volume pressure (BVP) signal.

Many other techniques could be used on the existent data from the Conductor’s Jacket. For example, I would guess that breathing rates and inhale/exhale slopes contain a great deal of information and would be a good place to start. Also, automatic techniques such as principle component analysis and cluster-weighted modeling could find correlations that are not visually obvious. Thirdly, I did not spend much time with frequency-domain analyses after noting that EMG beat features have wide distributions across all frequencies; this is an area that could be followed up with many studies. For example, it is known that abrupt, vertical rises in a signal contain all frequencies, and yet the typical EMG peak does not usually resemble an impulse response but rather an exponential rise and fall; it would be interesting to understand the source of the wide range of frequencies. Additionally, I suspect that further studies of human physiological data would benefit tremendously from a perspective of nonlinear system dynamics and chaos theory, since a great deal of the phenomena in conducting seems to be sensitive to initial conditions.

Finally, a future study might also make use of the following axes of comparison, which were applied informally in the Conductor’s Jacket study:

Many interesting results might be obtained by systematically comparing features across these sets of contrasting pairs.

7.4.2 Hardware Improvements

One sorely needed hardware improvement for the Conductor’s Jacket system is a reliable, non-direction-dependent, high bandwidth wireless connection. My preference has been to transmit all sensor lines independently as analog signals on their own frequency. This was to reduce the sampling error by using more established, ISA-bus cards for the A/D. The other, more common option would be to sample the sensor lines at the jacket itself, multiplex them into one line, and transmit over a single radio channel. Gregory Harman designed and built a prototype radio transmission system for the Conductor’s Jacket system and integrated it with a wearable data bus, which successfully transmitted four data channels at high rates. However, due to issues with power consumption and battery life, as well as problems with noise on additional sensor channels, we decided not to use it for stage performances. This is not so much a research question as it is a design issue that could be solved with more expertise and time.

In any case, the jacket would also benefit from proper optical isolation. While many large capacitors were used to protect the wearers from becoming a path to ground (and the physiological sensors also had built-in protection), power spikes still remain a small risk to the wearer. This is not an issue for research but rather a technical detail to be quickly implemented. And since the voltages on the jacket are low (+/- 8 volts), it has not yet become a priority.

Finally, the Conductor’s Jacket would benefit tremendously from a positional sensing system. Full 3D positional sensing would allow the system to anticipate the placement of beats, an aspect that I don’t believe has been explored by anyone who has worked on conducting systems to date. (To a certain extent this can be done with EMG sensors, but they usually only generate a signal at the moment of the beat, and do not give a sense of the trajectory of the arm in the phases leading up to each beat. This trajectory is essential in giving the musicians a sense of where the midpoint of the gesture is, so that they can anticipate the time at which the beat will occur.) It could also improve the accuracy of the beat detector by having a second mechanism running in parallel. If properly modeled, trajectories between beats give valuable information about the current tempo, future changes in tempo, and upcoming changes in emphasis. During a discussion with Rosalind Picard and subject P3, P3 told us that the trajectory defines the half-beat, which in turn defines when/where the beat will land. On the importance of the trajectory between beats, Sir Adrian Boult wrote: "The stick must show not only the actual beats but also the movement through the spaces between them."

A positional measurement was not included in the Conductor’s Jacket hardware because it would have added extra complexity and issues that could not have been easily handled. For example, the Polhemus UltraTrak motion capture system that was used to measure P3’s movements was quite difficult to use; it needed to be set up and calibrated for every session. It also required a separate computer, and was only available to us on a special loan by the Polhemus Corporation. Cheaper, handmade solutions would have been possible but would have taken precious time to build; among the more practical solutions would have included the infrared LED and photodiode technology that was designed for the Digital Baton by Joseph Paradiso. A simpler solution would just look at the geometric relationships between the different segments of the arms and look at the degree of flexion or bend at the various joints. Future systems might also include measures for eye gaze and facial expressions, although it seems that for the near future such systems will be too cumbersome to be useful.

Another hardware improvement I would like to have made would have been to establish a ground truth so as to remove motion artifacts from the GSR signals. The problem I encountered with the GSR was that it was very susceptible to the movement of the conductor’s torso; therefore when I did notice phenomena that seemed to correlate with the internal state of the subject, I was not able to prove it. Most of the identifiable features in the subjects’ GSR signals seemed to mostly reflect motion artifact. Solutions to this problem, such as placing the electrodes in the shoe or on the wearer’s palm, were not practical for the experiments that I ran.

7.4.3 Software Improvements

Most important for the Gesture Construction software system will be better algorithms to segment, filter, recognize, and characterize the expressive features that were described in Chapter 4. Successful filters will make the data more usable by exposing its underlying structure. Segmentation will be necessary in order to pick the areas where the data is richest, such as conducting vs. non-conducting, informative vs. non-informative gestures, and beginnings and endings of pieces. Automatic recognition tasks will involve the implementation of feature detection systems using models with properties such as clustered weights, hidden markov processes, or hierarchical mixtures of experts. Finally, these filters and recognition algorithms must be adapted to real-time, so that they can control aspects of live interactions.

In addition to the general higher-level filters that must be included, there are several practical improvements that can be implemented quickly. For example, the problem of double triggers is annoying and detracts from the overall effect of the performance. I had the opportunity to ask Max Mathews about this problem in May 1999 and his suggestion was to add a refractory period after each beat. According to him, a space of approximately 100 milliseconds or so after a beat in which the system does not look for new beats should take care of the problem. He also explained that nerves also operate under a similar principle and have a natural recovery period after each firing. Another option to use is a phase-lock loop, but that is generally not sufficient because it is not flexible enough to account for immediate tempo changes. Another, simpler solution might be found by improving the real-time envelope or smoothing filters that I have written for the right biceps EMG signal, so that its upper values would not be so jagged. In addition, the performance of my beat detection system might have improved if I had first low-pass filtered the signal and then looked at inflection points (peak and trough events with associated strengths).

Nearly as annoying as double-triggered beats is the problem of missed beats. The system sometimes fails to detect weak beats because their signals fall within the noise band; this is inconvenient for the conductors because they must exaggerate their beats in order for them to be recognized by the system. One solution would be to add a positional measurement system into the hardware and run an inflection point recognition system in parallel with the beat recognition system; this would allow the detection threshold to be lowered in the beat recognition system. Only when an EMG spike and an inflection point co-occur would the final system send a beat.

The Gesture Construction software does not currently allow you to crescendo on a sustained note – this is a problem, because such a thing is an important technique for sustaining interest and intensity in slow music. For example, in the opening of the Bach Toccata and Fugue movement, the opening mordent is followed by a long note with a fermata over it, indicating that it should be held for some time at the discretion of the player. My quick hack was to pick a synthesizer voice that had a built-in sustain in it, so that it would ‘feel right.’ When I showed the system to an experienced composer, however, he immediately saw that I was not controlling the effect and thereby felt misled. After seeing this from his perspective, I realized that it could be fixed very simply by mapping the MIDI channel aftertouch command of each sustained note to be controlled by the left arm. It would be more effective this way and give the left arm something to do during this part. There are several other issues with the constraints of MIDI; it would have been preferable to use other synthesis methods where the timbre could have been more directly affected, such as Max MSP. The note-based paradigm employed by MIDI is useful for certain abstractions, but unfortunately very limited for controlling synthesis in mid-note.

A fourth software issue is that my software currently lacks a notion of pulse; within each beat, the volumes of consecutive sixteenth notes would conform to a fixed profile, scaled by the volume of the first one. This worked reasonably well to ‘humanize’ the sound of the music on the microstructural level, but could have perhaps included more variety. I should also have implemented a more flexible pulse framework on the level of the measure and the phrase. I would also have liked the Gesture Construction software to use other musical qualities such as onset articulations (attack envelopes) and vibrato.

Finally, I should implement another tempo framework that does not rely so completely on individual beats, as my current system does. Some people found my ‘direct-drive’ beat model too direct; they wanted the relationship between the gesture and the performance to be more loosely coupled. To them, it was distracting to have the ‘orchestra’ stop instantly, knowing that a human orchestra would take a few beats to slow down and stop. So perhaps a final system should have a built-in switch between the existing direct-drive mode and a tempo system with more latency. Another solution would be to use rate encoding to make continuous changes in the tempo.

Future work on the Gesture Construction will focus on extending and improving upon the mappings that take gestural signals and convert them to music. These mappings must be intuitive and powerful enough to respond appropriately to the structure, quality, and character in the gestures. They must not only satisfy the performer’s needs for intuition and naturalness, but also make the gestural-musical relationships clear to the audience that is witnessing the performance. In future work I would also like to improve greatly upon the sound quality and controllability of the audio output of the Conductor’s Jacket system; the current reliance on MIDI synthesizers for musical output severely limits its ability to shape the timbres and amplitude envelopes of individual notes. Finally, I would have liked to have improved upon the quality and finishing of the sound; this would not have involved research but rather extensive time spent in a professional music studio.

7.4.4 Artistic and Theoretical Improvements

Theoretically, I think there is still quite a bit of uncertainty about whether or not the Conductor’s Jacket is better suited for a more improvisational or score-based interpretational role. My suspicion is that if it is to be a successful instrument, it will need to do both. In the pieces that have already been implemented for the jacket I have taken more of a perspective of interpretation of pre-existing musical materials. One issue is that the jacket so far does not include a good way to pick discrete notes, since there is no intuitive gesture for note-picking or carving up the continuous gesture-space into discrete regions. Perhaps if it included a positional system then a two-dimensional map could be used to pick out notes and chords, but I think that perhaps that is not the best way to use the affordances of the jacket. Conductors do not use their gestures to define notes, and there is no intuitive gestural vocabulary for such an action.

Secondly, I began this project with the naive anticipation that the physiological sensors in the Conductor’s Jacket would give some sense of the wearer’s internal state. In this I was disappointed, but given the complexity of the music and the constant motion of the gestures, this turned out to be an almost intractable problem. The GSR sensors turned out to be quite susceptible to motion artifact when placed on the torso, but very occasionally I would notice a strong feature in the GSR that would correlate with an event that had excited or upset the conductor. For example, during a segment when I was not recording data, one subject became alarmed upon learning that the principle oboist was absent. Although he did not show any outward gestures or exaggerated expressions, his GSR shot up dramatically. I noticed several other events like this during the sessions that I held, but unfortunately they were so rare and unrepeatable that I ultimately did not document them systematically. One way to reduce motion artifact in the GSR signal would have been to put the sensors on two separate places where the signal would be roughly equivalent, such as on the hands and feet of the conductors, but ultimately we ruled that out because it was impractical.

Finally, while I never achieved my lofty artistic goals for the Conductor’s Jacket, I produced a successful proof-of-concept and created an instrument with which I would like to create future performances. The analytical power, intuitive mappings, and responsiveness of the Conductor’s Jacket have endowed it with an enormous range of possibility for expression. I look forward to future projects with the jacket where I can spend as much time perfecting the ‘production values’ as I do on the software. A future possibility for improving the sound of the Conductor’s Jacket would be to team it up with a more musical software system, such as Manfred Clynes’ SuperConductor, for which many years of development have gone into creating affecting interpretations.
 
 

 Chapter 8.1