Future Work

Ogg That There is very fragile. The FSM inside the Interpretor limits the robustness of the system as does the rigid grammar. Limiting the user to deictic gestures denies the potential expressiveness of a gestural interface. The reliance on imperative commands issued to specific robots doesn't keep up with the pace of the game, and leads to frustration as the situation changes faster than commands can be issued. Ogg That There succeeded in solving many integration issues involved in coupling research systems to existing game code, but it's now time to redesign the interface to more accurately match the flexibility of the perceptual technologies, the pace of play, and the need for a game interface to be fluid and fun.

Work is already underway to replace the FSM framework with a production system that can propagate time-varying constraints against recent perceptual events to generate parses. This should alleviate much of the brittleness of the Ogg That There implementation. Unfortunately the state-of-the-art for production systems fall short of what we would like to have for this project. However, even if we aren't able to break out of the need for grammars, it should be straightforward to support a wider array of possible grammars as well as to recover from simple speech recognition failures. Utterances and gestures that fall outside the scope of the implemented grammars will have to be handled outside the system. Some ideas for this special handling are explored below. Ogg That There doesn't make use of the machinery in Chapter 2 except as a means to increase tracking performance. An interpretor that made more elaborate use of gesture could provide a much richer interface. Building a gestural language system is a popular interface technique, but it requires users to be trained and distances the user from the control task by interposing a symbolic layer.

**Figure 3.4:** A situation where this feint path might be provided by the user gesturally even though the robot is not explicitly programmed to execute feints.
[width=120mm]figs/feint

The innovations-based representations for behavior described in Section 2.2.3 combined with the embodied, physics-based, nature of the Netrek robots presents a possibility for non-symbolic communication with the robots. Innovation steams, or parametric models of the innovations, could be provided to the robots as a control strategy to be layered on top of the current task. These controls can be thought of as non-verbal adverbs that would be difficult or impossible to convey verbally. Figure 3.4 illustrates a possible situation where the user may want a carrier, F0, to execute a feint toward Capella before hitting the real target, Indi. Human teammates might type to F0, ``Take Ind, feint at Cap''. If the robot isn't explicitly coded to execute feints (or if the human player doesn't know the word feint), then this symbolic strategy will fail.

Similar strategies, with appropriate intermediate representations, may also be possible for the audio modality. A command ``F1, get in there!'' said in a staccato, high-energy way might bias F1 toward higher speeds and maximal accelerations even if the system was only able to recognize the word ``F1'' (or maybe not even this if the viewport is on F1 or the user is gesturing toward F1). It seems that this may end up being somewhat more symbolic since the feature space of speech is so different from the control space of the robots. An analysis system might recognize agitated speech and generate a symbol representing agitation that the Interpretor could choose to pass on to one or more robots.

**Figure 3.5:** Modification of the navigation motor skill can affect a warping of space for any code that uses the low-level skill.
[width=60mm]figs/newtonian [width=60mm]figs/einsteinian

While the preceding discussion is hypothetical, one modification to the Robot motor skill code has already affected qualitative changes in the robots behavior without the need to modify the existing programs that use that skill. This functionality is not used in Ogg That There. It allows the Interpreter to specify a warping of space that affects how the low-level navigation skill expresses itself. Figure 3.5 illustrates an example of how a ship might avoid an area indicated as dangerous in its way to a target. This communication channel is probably more useful for global influences, as opposed to the more local, control-level example of the feint described above.

**Figure 3.6:** Modification of the action-selection algorithm to include user supplied weights. In this example the ship may choose to attack a different target because the most desirable target exists in an area deemed undesirable by the user.
[width=60mm]figs/newtonian2 [width=60mm]figs/einsteinian2

An even more indirect, global, non-symbolic influence involves the possibility of modifying the way that the robot decides what action to take. Ogg That There over-rides the action-selection algorithm completely. So, either the robot is making it's own decisions or it is following the imperative commands from the user. There is no possibility to simply bias the decisions of the robot in a certain direction. Figure 3.6 shows a possible scenario where a robot chooses a different target depending on the presence of a bias indicating a difference in the value of the targets as perceived by the user.

The current action-selection implementation is the original code that came with the robots from the Netrek community. It is a rather brittle (not to mention obfuscated) collection of heuristics. The first step toward this sort of interaction with the robots will be the adoption of a more sophisticated action-selection implementation. An obvious choice is to use the current implementation of Bruce Blumberg's reactive behavior architecture, since it has proven itself flexible and is readily available [3].

While this mode provides the least direct control over individual robots, it is important to note that this is also a mechanism for specifying high-level strategic goals and hazards that can affect many robots at once. Moving away from the imperative command structure toward a method for specifying abstract goals will increase the ability of the interface to keep pace with the game. Deictics Will Undoubtedly be important for specifying these goals, but natural specification of the polarity and severity to be associated with the demonstrated region will probably rely on stylistic attributes of the deictic and accompanying voice events. That makes this class of communication another interesting challenge.

All of these possibilities have a theme in common: they are attempting to extract content from parts of the user input that are normally ignored by classic user interface techniques like those illustrated in Ogg That There. An extreme example is useful to illustrate: imagine that the user foresees imminent disaster. The user does not have time to communicate in a lucid fashion, but given the desperation of the situation, they are likely to try anyway. Classical interfaces would experience speech and gesture recognition failures, and would either give up or, in the most advanced case, would ask the user a leading question. This is exactly the wrong response. There are probably only a few bits of information present in the user's desperate squeaking, but they are very important bits: ``look out!'' The only kind of process that is going to recover these bits is one that is attending to the nature of the signals: the energy and pitch of the voice, and the style (in an innovations-based, statistical sense) of the gesticulations.