This paper presents a model for the perception of transparently combined moving images. We advocate a framework consisting of a local motion mechanism which can operate in the presence of transparency, and a global mechanism that integrates information across space. We present a new method for the local motion testing mechanism, using ``donut'' velocity selective mechanisms formed from the weighted combination of spatio-temporal energy units. This method has the advantage over traditional methods that it does not fail when there are multiple motions in the sequence. The global layer selection mechanism attempts to account for the local velocity distributions with a small set of global functions. Using donut mechanisms permits a simplified layer selection optimization, in which inhibition between layers is determined by the product of their predicted velocity distributions. With this scheme, we demonstrate the decomposition of image sequences containing additively combined multiple moving objects into a set of layers corresponding to each object.