We describe a computer vision system for observing the ``action units'' of a face using video sequences as input. The visual observation (sensing) is achieved by using an optimal estimation optical flow method coupled with a geometric and a physical (muscle) model describing the facial structure. This modeling results in a time-varying spatial patterning of facial shape and a parametric representation of the independent muscle action groups, responsible for the observed facial motions. These muscle action patterns may then be used for analysis, interpretation, and synthesis. Thus, by interpreting facial motions within a physics-based optimal estimation framework, a new control model of facial movement is developed. The newly extracted action units (which we name FACS+) are both physics and geometry-based, and extend the well-known FACS parameters for facial expressions by adding temporal information and non-local spatial patterning of facial motion.