next up previous
Next: Hidden Markov Modeling Up: Introduction Previous: Related Work

The Task

In this paper, we describe two extensible systems which use one color camera to track unadorned hands in real time and interpret American Sign Language using hidden Markov models. The tracking stage of the system does not attempt a fine description of hand shape, instead concentrating on the evolution of the gesture through time. Studies of human sign readers suggest that surprisingly little hand detail is necessary for humans to interpret sign language [10,14]. In fact, in movies shot from the waist up of isolated signs, Sperling et al. [14] show that the movies retain 85% of their full resolution intelligibility when subsampled to 24 by 16 pixels! For this experiment, the tracking process produces only a coarse description of hand shape, orientation, and trajectory. The resulting information is input to a HMM for recognition of the signed words.

While the scope of this work is not to create a user independent, full lexicon system for recognizing ASL, the system is extensible toward this goal. The ``continuous'' sign language recognition of full sentences demonstrates the feasibility of recognizing complicated series of gestures. In addition, the real-time recognition techniques described here allow easier experimentation, demonstrate the possibility of a commercial product in the future, and simplify archival of test data.

 
Table 1: ASL Test Lexicon
part of speech vocabulary
pronoun I, you, he, we, you(pl), they
verb want, like, lose, dontwant, dontlike,
  love, pack, hit, loan
noun box, car, book, table, paper, pants,
  bicycle, bottle, can, wristwatch,
  umbrella, coat, pencil, shoes, food,
  magazine, fish, mouse, pill, bowl
adjective red, brown, black, gray, yellow
 

For this recognition system, sentences of the form ``personal pronoun, verb, noun, adjective, (the same) personal pronoun'' are to be recognized. This structure allows a large variety of meaningful sentences to be generated using randomly chosen words from each class as shown in Table 1. Six personal pronouns, nine verbs, twenty nouns, and five adjectives are included for a total lexicon of forty words. The words were chosen by paging through Humphries et al. [7] and selecting those words which would generate coherent sentences given the grammar constraint. Words were not chosen based on distinctiveness or lack of detail in the finger positioning. Note that finger position plays an important role in several of the signs (pack vs. car, food vs. pill, red vs. mouse, etc.)
next up previous
Next: Hidden Markov Modeling Up: Introduction Previous: Related Work
Thad Starner
1998-09-17