The resulting word accuracies from the experiment are listed in Table
3.
In this experiment 400 sentences were used for training, and an
independent 100 sentences were used for testing. A new grammar was
added for this experiment. This grammar simply restricts the
recognizer to five word sentences without regard to part of speech.
Thus, the percent correct words expected by chance using this
``5-word'' grammar would be 2.5%. Deletions and insertions are
possible with this grammar since a repeated word can be thought of as
a deletion and an insertion instead of two substitutions.
grammar | training set | independent |
test set | ||
part-of- | 99.3% | 97.8% |
speech | ||
5-word | 98.2% (98.4%) | 97.8% |
sentence | (D = 5, S=36, | |
I=5 N =2500) | ||
unrestricted | 96.4% (97.8%) | 96.8% (98.0%) |
(D=24, S=32, | (D=4, S=6, | |
I=35, N=2500) | I=6, N=500) |
Interestingly, for the part-of-speech, 5-word, and unrestricted tests, the accuracies are essentially the same, suggesting that all the signs in the lexicon can be distinguished from each other using this feature set and method. As in the previous experiment, repeated words represent 25% of the errors in the unrestricted grammar test. In fact, if a simple repeated word filter is applied post process to the recognition, the unrestricted grammar test accuracy becomes 97.6%, almost exactly that of the most restrictive grammar! Looking carefully at the details of the part-of-speech and 5-word grammar tests indicate that the same beginning and ending pronoun restriction may have hurt the performance of the part-of-speech grammar! Thus, the strong grammars are superfluous for this task. In addition, the very similar results between fair-test and test-on-training cases indicate that the HMM's training converged and generalized extremely well for the task.
The main result is the high accuracies themselves, which indicate that harder tasks should be attempted. However, why is the wearable system so much more accurate than the desk system? There are several possible factors. First, the wearable system has less occlusion problems, both with the face and between the hands. Second, the wearable data set did not have the problem with body rotation that the first data set experienced. Third, each data set was created and verified by separate subjects, with successively better data recording methods. Controlling for these various factors requires a new experiment, described in the next section.