next up previous
Next: Implementation Up: HyperPlex: a World of Previous: Human-Computer Communication for

Interaction and Gesture Recognition

The interactive environment interface is built to be entirely non-invasive. The use of a computer vision system to measure the user eliminates the need to harness the user with many sensors and wires. A large display format allows an immersive experience without the need for head-mounted displays and opens the environment up to multiple users [Russell et al. 1995].

  
Figure: Backsub window showing a dithered sketch of the input video, figure/ground segmentation, and blob classifications (grey reagions within the foreground silhouette).

The vision system is composed of several layers. The lowest layer uses adaptive models to segment the user from the background. This allows the system to track users without the need for chromakey backgrounds or special garments. The models also identify color segments within the users silhouette (see fig. 4). This allows the system to track important features (hands) even when these features aren't discernible from the figure-ground segmentation. This added information may make it possible to deduce general 3D structure of the user: allowing better gesture tracking at the next leyer.

The next layer uses the information from segmentation and blob classification to identify interesting features: bounding box, head, hands, feet, and centroid. These features can be recognized by thier characteristic impact on the silhouette (high edge curvature, occulsion) and a priori knowledge about people (heads are usually on top).

The highest layer then uses these features, combined with knowledge of the human body, to detect significant gestures. Audio processing included at the various levels will allow the system to use knowledge of human dialog to better recognize both audio and visual gestures.

  
Figure: System architecture of the HyperPlex

These gestures become the input to the behavioral systems of the agents in the simulated environment. This abstraction allows the environment to react to the user on a higher, more meaningful and inflected level (see fig. 5). It can also allow us to avoid the distracting lag inherent in many other immersive systems.



next up previous
Next: Implementation Up: HyperPlex: a World of Previous: Human-Computer Communication for



Flavia Sparacino
Mon Apr 1 11:15:21 EST 1996