Most image coding systems rely on signal processing concepts such as transforms, VQ, and motion compensation. In order to achieve significantly lower bit rates, it will be necessary to devise encoding schemes that involve mid-level and high-level computer vision. Model-based systems have been described, but these are usually restricted to some special class of images such as head-and-shoulders sequences. We propose to use mid-level vision concepts to achieve a decomposition that can be applied to a wider domain of image material. In particular, we describe a coding scheme based on a set of overlapping layers. The layers, which are ordered in depth and move over one another, are composited in a manner similar to traditional ``cel'' animation. The decomposition (the vision problem) is challenging, but we have attained promising results on simple sequences. Once the decomposition has been achieved, the synthesis is straightforward.