TR#219: Hierarchic models of hearing for sound separation and reconstruction

D. P. W. Ellis

Appeared in:
Final program of the IEEE Workshop on Applications of Signal Processing to Acoustics and Audio
Mohonk Mountain House, New Paltz, New York, November 1993

In building a machine to detect and segregate individual components in sound mixtures, the best example to copy is the human auditory system. Several models of auditory organization impelement various rules of psychoacoustic grouping [A. S. Bregman, Auditory Scene Analysis, M.I.T. Press, 1990]; we propose in addition to model auditory inference as exhibited in the well-known `phonemic- restoration illusion' of [R. M. Warren "Perceptual restoration of missing speech sounds", Science 167, 1970]. A hierarchy of abstracted features and source hypotheses similar to [S. H. Nawab and V. Lesser, "Integrated signal processing and understanding", in Symbolic and Knowledge-Based Signal Processing, ed. A. V. Oppenheim and S. H. Nawab, Prentice Hall, 1992] allows reconstruction of obliterated detail whcih can then be used to recreate an `idealized' sound without corruption. A preliminary example of fitting a harmonic model to a noisy recording of a clarinet gives a very convincing resynthesis with the interference totally removed. However, there are many issues including the design of the representation and the control architecture still to be addressed in building a more general system.