next up previous
Next: Acknowledgements Up: Real-Time American Sign Language Previous: The first person view:

Discussion and Future Work

We have shown a high accuracy computer vision-based method of recognizing sentence-level American Sign Language selected from a 40 word lexicon. The first experiment shows how the system can be used to communicate with a desk-based computer. The second experiment demonstrates how a wearable computer might use this method as part of an ASL to English translator. Both experiments argue that HMM's will be a powerful method for sign language recognition, much as they have been for speech and handwriting recognition. In addition, the experiments suggest that the first person view provides a valid perspective for creating a wearable ASL translator.

While it can be argued that sign evolved to have maximum intelligibility from a frontal view, further thought reveals that sign also may have to be distinguishable by the signer himself, both for learning and to provide control feedback. To determine which view is superior for recognition, we have begun a new experiment. Native signers will be given a task to complete. The task will be designed to encourage a small vocabulary (e.g. a few hundred words) and to encourage natural sign. Four views of the signers will be recorded simulaneously: a stereo pair from the front, a view from the side, and the wearable computer view. Thus, both 3D and 2D tracking from various views can be compared directly.

Head motion and facial gestures also have roles in sign which the wearable system would seem to have trouble addressing. In fact, uncompensated head rotation would significantly impair the current system. However, as shown by the effects in the first experiment, body/head rotation is an issue from either viewpoint. Simple fiducials, such as a belt buckle or lettering on a t-shirt may be used to compensate tracking or even provide additional features. Another option for the wearable system is to add inertial sensors to compensate for head motion. In addition, EMG's may be placed in the cap's head band along the forehead to analyze eyebrow motion as has been discussed by Picard [9]. In this way facial gesture information may be recovered. As the system grows in lexicon size, finger and palm tracking information may be added. This may be as simple as counting how many fingers are visible along the contour of the hand and whether the palm is facing up or down. In addition, tri-sign context models and statistical grammars may be added which may reduce error up to a factor of eight if speech and handwriting trends hold true for sign [16].

These improvements do not address user independence. Just as in speech, making a system which can understand different subjects with their own variations of language involves collecting data from many subjects. Until such a system is tried, it is hard to estimate the number of subjects and the amount of data that would comprise a suitable training database. Independent recognition often places new requirements on the feature set as well. While the modifications mentioned above may be initially sufficient, the development process is highly empirical.

Similarly, we have not yet addressed the problem of finger spelling. Changes to the feature vector to address finger information are vital, but adjusting the context modeling is also of importance. With finger spelling, a closer parallel can be made to speech recognition. Tri-sign context occurs at the sub-word level while grammar modeling occurs at the word level. However, this is at odds with context across word signs. Can tri-sign context be used across finger spelling and signing? Is it beneficial to switch to a separate mode for finger spelling recognition? Can natural language techniques be applied, and if so, can they also be used to address the spatial positioning issues in ASL? The answers to these questions may be key to creating an unconstrained sign language recognition system.


next up previous
Next: Acknowledgements Up: Real-Time American Sign Language Previous: The first person view:
Thad Starner
1998-09-17