We are building a functional computer model of sound organization and understanding in human listeners. In particular, we would like to be able to detect and locate events in the acoustic signal that will be perceived as separate objects. Our model aims to duplicate this function of the auditory system by the same means, although the level of correspondence is abstract and necessarily speculative given our current state of knowledge. None the less we believe that recent theories of auditory organization are sufficiently detaile to make possible computer modeling, and that sounch models can help develop and refine these theories.
Psychoacoustic experiments have demonstrated the importance of various `cues' in the perception of sound as single or multiple objects [A. S. Bregman, Auditory Scene Analysis, M.I.T. Press, 1990]. We describe an implementation of grouping rules corresponding to the cues of harmonicity, common onset, continuity and proximity. We implement these rules in a very direct fashion based upon our constant-Q sinewave representation of the sound [D. P. W. Ellis, "A perceptual representation of sound", SM thesis, M.I.T., 1992]. However, the rules alone are simple and prone to error; we greatly increase the robustness of the results by adding a second layer of grouping that looks for corroboration between groupings based on the different rules. We believe that such a system of repeated hierarchic grouping is critical for the successful modeling of auditory functions.
Our preliminary results from applying the system to real sounds (isolated speech and a musical signal corrupted by impulsive noise) show an encouraging ability to identify and isolate the perceptually significant events. We hope that by extending the set of rules, and refining both the primary rules and the secondary grouping procedures, we will obtain a usefule simulation of the organization and segregation performed by the human auditory system.