TR#330: Mid-level representations for Computational Auditory Scene Analysis

D. P. W. Ellis and D. F. Rosenthal

To appear in the working papers for
1995 International Joint Conference on Artificial Intelligence
Workshop on Computational Auditory Scene Analysis
Montreal, Canada, August 1995

In this paper we consider representations for use in models of the processing that occurs between the eardrum and our conscious experience of sound. We first list `good' properties for such mid-level representations, then present a framework within which to discuss some examples. We compare in detail two popular schemes -- sinusoid tracks and correlograms -- and propose a new representation, wefts, which seeks to combine their advantages.