Hand and face gestures are modeled using an appearance-based approach in which patterns are represented as a vector of similarity scores to a {\sl set} of view models defined in space and time. These view models are learned from examples using unsupervised clustering techniques. A supervised learning paradigm is used to interpolate view scores into a task-dependent coordinate system appropriate for recognition and control tasks. We apply this analysis to the problem of context-specific gesture interpolation and recognition, and demonstrate real-time systems which perform these tasks.