Chapter 1

Evaluation & Results

Cambridge Pilot Test

Three pre-pilot cases were conducted with four children, two who were diagnosed with autism and two who were siblings of children with autism. Their parents offered thoughtful remarks during these sessions. The objective was to observe how these children responded to the application environment. The interaction with the screen interface and doll interface was observed. Case One The first subject was an eight-year-old boy and the younger brother of a ten-year-old autistic boy. He was very helpful, curious, and at times, seemingly bored. He had technical savvy and was creative with his suggestions. He appeared to be interested in the narrative and imaginative quality of an interactive system.

His interaction with the doll was interesting. He picked up on the method of interaction quickly. When a clip played, he picked up the appropriate doll and oriented it to the system receiver without much instruction. He became immediately curious about the embedded technology and his attention diverted to how the dolls did what they did. The infrared communication intrigued him and at one point he disassembled the doll to investigate the internal hardware in search of the iRX board.

Most of the surprise clips confused him, or he didn’t think they matched other labels in the interface because of the additional emotions, mostly happy, that were also shown either immediately before or just after the expression of surprise. This led us to question whether to make the clip shorter or focus on the salient expression displayed.

The clip content became an interesting topic of discussion with the developer team and his mother. She was helpful in pointing out the dichotomy between the audio content and visual content. Her background in linguistics may have contributed to her observation of script development prior to the animators creating the visual content. She could understand the challenge of capturing a visual representation of the emotion, while encapsulating the audio content, without including multiple characters, secondary emotions, or adjoining scenes. Because the audio tended to merge characters and scenes, she suggested that the entire scene be included in one clip or that the audio be removed altogether.

The disjointed way different movie clips were shown one after another concerned her too. Because the clips were subsets of a whole movie, a child who may have seen that video before might be distracted by the familiarity of the footage and try to recall where that scene would appear in the original film or what should come next. Her comments motivated us to develop the story-based mode mentioned earlier. Additionally the clips were reviewed for content from both the visual and audio perspective.

Something happened during their visit that surprised me. While setting up the system, cable television was showing on the big screen used to project the application. When adult programming aired, the boy had no interest in what was playing at all and he preferred to draw on the white board in the room. However, when a child’s program or animation came on the screen, his attention immediately focused on that program. This illustrated to me how child programming, with its music and color, can engage children’s attention in a way that adult programming with adult dialogues might not.

Case Two Next a four-year old normal developing girl and her six-year old high-functioning autistic brother were observed. She didn't show interest in the application, and deferred all interactions to her older brother. On the other hand, he displayed excitement and enthusiasm towards the gaming interaction with the doll and the video clips. He had a good grasp of emotions already and he quickly matched the emotion shown in the clips. Removing all visual cues from the screen interface (icon, word, and dwarf’s face) tested this. His response slowed a little, but he was able to match the doll, with the exception of surprise. Consistently, he thought the surprise clips were displays of embarrassment.

Interaction with the doll appeared confusing to him at first, but after several illustrations, he was in total control. At first, he oriented the face of the doll towards his face, instead of pointing the head of the doll towards the system. It was surprising that he didn’t focus on the doll peripherals like the boy from the previous visit.

What appeared to frustrate this boy the most was that the clips stopped playing; this was when the online guide would prompt the child to match the emotion by picking a doll. He wanted the animation to continue for longer than a few seconds and kept saying, "now, don’t stop the clip." at the beginning of each new clip. This observation supported the engaging and motivating quality of the videos in the screen interface and the doll interaction while pointing out that the application interface might possibly be a frustrating deterrent as well.

In an effort to engage his sister, she was encouraged to play with her brother. A new game was suggested. Each child was given two dolls. They were encouraged to guess the emotion displayed on the screen together and then identify who had the matching doll representing that emotion. If he had the matching doll, she would tell him to orient his doll towards the system to indicate their selection. This engaged both of them for a short time.

Their mother commented on the content of the clips and noticed that many of them contained complex representations of the basic emotional expression. She also commented on how the Disney’s earlier produced animations displayed purer representations of emotion as compared to their current movies, such as Little Mermaid.

Case Three The last subject was a nine-year-old boy diagnosed along the pervasive development disorder spectrum. He and his parents interacted with ASQ. As with the previous boy, he engaged in the animated footage and gaming interaction, without appearing to be distracted by the peripherals. Again, the examples of surprise created the longest delay in response. He selected the other emotions with ease. At first his interaction with the dolls started out slow, but after several illustrations he understood how to orient the dolls. The social form of interaction with ASQ was also successful with this family. He was given two dolls and his parents were given the other two. He was great at attending to the person who had the matching doll and excitedly gestured and directed the appropriate parent to match the doll selection to advance to the next clip.

While visiting the lab that day, he interacted with ‘Swamped’ (Johnson 99). His motor coordination limited the number of chicken gestures he could direct, but nevertheless, he seemed happily engaged. At times he appeared over-aroused by the music, color, and animations on the screen and jumped up and down and shrieked without any correlated affect. Later, his parents asked him what he liked best during his visit to the Media Lab, and to both our surprise, he said he preferred ASQ; I thought he would have picked ‘Swamped’ by the way he enthusiastically gestured when he played with the plush toy chicken.

These few pre-pilot tests were conducted while the system was in development. Their interaction pointed out key parts of the system. The dolls and animated clip segments engaged the children. As suspected, the orientation of the doll to the system was difficult for the children to understand at first, but after a few demonstrations they understood what to do. Picking the surprise emotional expression from the total number of clips was the most difficult for these children.

What was most surprising was how ASQ naturally included a play mode for social interaction. While not part of the initial design, this serendipitous finding added a new multi-person dimension to the way the system interaction could take place. Participation by the parents and their comments were helpful and some of their suggestions were included in the application design.

Method

A pilot study was conducted to determine whether an application run on a computer system is engaging to children with autism and whether this type of an application may potentially help children learn emotion recognition. The hope is that through using the system these children will learn to generalize the recognition of emotion in self and in others; however, only observations of their interaction with the application were gathered. The pilot study was approved by the Committee on The Use of Humans as Experimental Subjects (COUHES) at MIT and by the Internal Review Board (IRB) at Miami Children’s Hospital. The IRB proposes a longer study, for which this serves as the initial pilot test (see appendix). Subjects Six children diagnosed with PDD or within the autistic spectrum served as subjects. Subjects were recruited as volunteers through advertisements posted at the Dan Marino Child NETT Center. Standardized assessment tools, as well as direct observation by trained psychologists and neurologists, were used to identify children whose primary deficits are related to social-emotional responding and appropriate affect.

To participate in the pilot study, children needed to come to the center to play with ASQ for at least three days for up to one-hour sessions. Afternoon time slots were the most convenient time for families to schedule a session, which constrained the number of participating families during the two-week time frame of this pilot. Fortunately, those families that participated represented a broad range of children along the PDD spectrum. At the beginning, eight children were part of the study, however only six continued for the three days and are included in the observational findings.

Two children who planned to be part of the three-day pilot dropped out after the first day. One child required a behavioral practitioner in the room during the session due to non-compliance. After his first visit, his mother preferred he spend his time in behavioral therapy instead. The second child selected the correct doll accurately on each clip during his first visit and his mother thought that interacting with the application was not the best use of his time and she withdrew him.

Data Analysis

Response Measures

To objectively measure the effectiveness and efficiency of the training procedures, response measures were collected based on the child’s interaction. Measures of response rate were used to track the number of training trials and response accuracy was used as an index of how effective the training procedures were for teaching the children to match the targeted emotion during a given clip by tracking changes in the proportion of correct responses. These measures were described earlier in the statistics section, which contain the formulas for the different measures: rate, accuracy, and fluency. Test Sessions Subjects were each tested on three different day visits. On each visit they interacted with the system for multiple sessions. Each session was approximately ten minutes long with the exact length determined by the child’s interaction and enthusiasm during their session or application problems. Each session was followed by a five-minute ‘play’ break. Within each session the child watched clips displaying four basic emotions. Each clip represent a trial for that session, several trials represent a session, and several sessions represent the visit for one day. Transduction The software application collected data as part of the general system function. Video footage was also captured on each child’s interaction. The application had difficulties detecting the different ways children touched the dolls and some of the erroneous data was not recorded by the system. Therefore, the transduction for this pilot was from the practitioner viewing the video footage captured during the child’s interaction. From the video, observational data was collected and tabulated as done in manual trial intervention sessions. The data collected included the number of system and human prompts, the number of correct and incorrect doll match selections, and the session time for the three days. Individual child session results are presented in the appendix.

Dan Marino Test

Figure 29: Child Testing Profile 1 Figure 30: Child Testing Profile 2

Nineteen different children with deficits along the pervasive development disorder (PDD) spectrum from the Dan Marino Center were exposed to ASQ. Throughout the day families stopped by to visit and spent up to twenty minutes playing with the system. For the Miami study, six of these nineteen children were observed over three days and multiple sessions were played during each visit as stated earlier. General Observations The overall reaction to the environment was positive; maybe due to its novelty. The children appeared to be drawn to the system because of the toys. Initially, they appeared to be engaged and enthused by the soft plush dolls, the colorful screen interface, and the video clips; the child interface created a playful environment for the children as compared to their customary visit to behavioral therapy.

The screen interface for the child received positive appraisal by the staff and the children’s parents; many families requested a copy to take home with them. The video animation clips absorbed the attention of all but four of the nineteen children, whose cases are discussed next.

One six-year-old child treated for behavioral compliance exhibited more interest in devastating the room than in interacting with the system. In contrast, another boy who was five years old (Subject 2), known to be mild mannered, cried and wanted to leave the room on his second and third day visit. Another child, seven years old, thought to be technologically savvy by his parents, displayed minimal interest in the video clips. He inspected the dolls, but he displayed more interest in a toy phone instead of in the dolls and video clips. The youngest child to play with the system was a 19-month-old boy who was recently diagnosed with autism. He did not pay much attention to the dolls, video clips, or other objects in the room. His parents tried to assist him in touching the dolls or looking at the screen, nevertheless his eye gaze never focused on any of the application elements.

Young children showed interest in the system, yet they were unable to master the interface without the help of their parents for the first session. This was expected. In fact, David Lubin thought it would take up to seven days to teach all children to use the dolls. Surprisingly, the children were able to learn to doll interaction the first day. Two low functioning autistic children, between the ages of 2 and 3, engaged in the video clips, yet displayed little interest in the doll interface without direct assistance. One boy, age 4, demonstrated an understanding of the interaction, although he struggled to match the appropriate doll. Another boy who was five years old appeared to understand the interaction, yet had such a soft touch that he required assistance in touching the doll so that the system could detect what was selected.

A three-year-old child, with Spanish as the native tongue, appeared very interested in the application regardless of the language difference. He and his family were visiting from Chili and together they played with ASQ for one hour. Two visiting neurologists from Argentina sat in on the child session. Earlier they expressed their opinion about the interface. They were certain that the screen interface had too many images (referring to the icon, word, and dwarf’s face) and cluttered the screen. They didn’t believe that a child could be engaged in the application because of the noise on the screen. Also, they thought that the dolls were not a good way for the child to select the correct emotion. After they saw this boy interact with the application, both the physicians and the boy’s parents were surprised at this boy’s quick adaptation to the doll interface and his ability to recognize the emotions, despite the screen interface. His parents also requested to take a copy of the application home with them.

As suspected, higher functioning and older children, age 6-9, demonstrated ease with understanding the doll interaction, exhibited pleasure with the gaming aspect, and needed few of the helpful screen cues to make their selection. They were able to match emotional expressions displayed on their screen by selecting the correct doll after only a few sessions. One boy mimicked the displayed emotions on the screen (Subject 3). His mother reported that he was able to recognize other people’s emotional expressions at home also.

Dan Marino Results Interesting observations came from the analysis of the children who participated in the pilot provided. Presented is a sample set of graphs for one child. The individual graphs and session results for each child can be found in the appendix. Formulas used to calculate the different measures are provided in the statistics gathering section. The graphs are briefly explained. Following is a general description of each child. Lastly, a summary set of graphs for the group of children concludes this section. Child Observations

Subject 4	Table of Clips Presented for each Day
Days	Sessions	Total Trials	Happy (S Sessions = Day)	Angry (S Sessions = Day)	Sad (S Sessions = Day)	Surprised (S Sessions = Day)
Day 1	2	13	4+2=6	2+0=2	2+1=3	0+2=2
Day 2	6	38	5+7+1+1+2+3=19	1+1+1+0+0+1=4	0+3+1+0+2+4=10	0+2+0+1+2=5
Day 3	2	21	8+5=13	3+0=3	4+0=4	1+0=1
Totals	10	72	17+14+1+2+3=38	6+1+1+0+0+1=9	6+4+1+2+4=17	1+4+0+1+2=8

These graphs show a sample set of responses with an explanation of the graph illustrations for one child.

The first table presents the number of clips (trials) for each child. The table breaks the data into days, sessions per day, total for each day, and the breakdown of displayed emotions by session for a day. The table data is useful for understanding the relevance of the data shown in the graphs because sessions were different for each child based on the number of trials and randomly generated emotions presented.

The first Performance graph illustrates the success for the child’s matching the four emotions based on fluency, accuracy, and rate/Rmax measures. Fluency is the uppermost line in the graph and represents the combined accuracy and rate of response for the child matching the correct doll across all emotions. Rmax is the minimum time a child could take to select the correct doll. Optimization for each measure is when the lower two lines approach 1, or the upper line approaches 2 on the y-axis. Rate, for the second graph, shows the average time that a child took to select a doll. To represent that number in seconds, the inverse (1/rate) was plotted for each session.

The other graphs represent the child’s emotion recognition using a receiver operator characteristic – used for detecting signal hits, false hits, decoys or misses in communication and used for computer pattern matching – to convey the accuracy trend of the child ‘hitting’ the correct doll. These represent the child’s degree of accuracy of selecting the matching emotion over the three days. For example, when Subject 4 was shown the sad clips on day 1, he picked the sad doll on the first attempt only 33% of the time. On his second attempt Subject 4 correctly picked the sad doll almost 70% of the time the first day, and around 90% of the time the second day, and 100% of the time the third day. A flat, leveled off curve indicates that the attempts to pick a new doll were not getting any more accurate. For instance, the first day, on some of the sad clips the child selected all the other dolls except the sad doll after his second attempt. When a curve lies ‘above’ another on this graph, the one that lies above it has a better performance.

The recognition graphs show the child’s improvement over the three days, especially for angry and sad. For happy, one of the clips was not responded to on the third day, which is indicated by the flat slope at the second attempt to choose the doll. Overall, however, Subject 4 showed improvement for matching all four emotions. The Emotion Recognition graph combines all three days, and shows that the child recognized happy best, followed by sad, followed by surprise, and then angry.

Child Profiles Subject 1 was a very gregarious and active boy. At the time of the pilot he was four years old. He visited the Dan Marino Center for behavioral therapy based on his mild to moderate autism. He had good social skills with deficits in language and social communication. He also had problems with repetitive behavior and attention span. He learned the interface of ASQ very quickly and understood that to interact with the system he had to touch a doll. Difficulties he had with the system were behavioral related; he would verbally repeat many of the prompts, but not associate the prompt with selecting a doll. The video clips engaged him, yet clips he had seen more than once would loose his interest. Sad and surprise were the most difficult. Also, he would get attached to a particular emotion and not easily transition to matching the correct emotion.

Subject 2 was diagnosed with the most severe case of autism. He was five years old at the time of the pilot. He was diagnosed with development skill and social play deficits and had developmental delays with fine motor perceptual problems, which might explain his difficulty with the doll interface. He came to Dan Marino for behavioral and language therapy. According to his parents their son’s interest in televisions or computers was rare and they were curious about how he might react to this type of application. The first day he came with his mother. He calmly played with the application, however he did need his mother’s help with touching the dolls, and appeared to enjoy the interaction with and video clips without being bothered by the application being run from a computer. This surprised his mother. He was described as being mild tempered and compliant, yet with ASQ he showed distress on the second and third day visit. During his second visit he started to play with the application without a problem, but after five minutes started to cry. His father thought he might be resisting ‘work,’ referring to behavioral therapy. His visit the last and third day started with heavy sobs and expressive offense to the environment; he lifted his shirt up and according to his parents this behavior communicates his dislike for something. It was never clear what caused his distress because we removed the dolls and turned the computer off, yet his behavior never changed. To try to understand what was troubling him, as well as to calm him, the dolls and then the computer were removed from his view, but there was no apparent change in his behavior. A child behavior specialist suggested that something might have happened to him after his visit on the first day that created a negative association, which may have caused him to have nightmares that night, possibly explaining the reason he reacted the way he did.

Subject 3 had a lot of personality. He was seven years old at the time of the pilot. He was diagnosed as a high-functioning autistic boy with echolia -- a tendency to repeat things with exact wording and intonation -- and attention deficit disorder. He has poor social interaction, gaze monitoring and social affect and is treated for behavior at the Dan Marino Center. He learned the interaction of the system quickly. He appeared to be intellectually bright and from the observations and could select emotions in the video clips with short latency. Clips that had been shown previously lost his attention. Overall, his performance was high and he rarely needed prompts or assistance to select the correct doll.

Subject 4 was the youngest subject. He was three and a half years old. He had little social affective communication, was non-verbal and had moderate autism. His play was the most promising and he was thought to benefit most from ASQ. The graphs illustrate his ability to match sad and happy within four attempts. The emotions of surprise and angry were more difficult for him to match. With angry, he had only 60% accuracy on average after picking up as many as four dolls. It is expected that a child would be close to 100 % accuracy by the fourth attempt if they tried to pick each doll once before correctly matching the appropriate doll.

Subject 5 was diagnosed with Asperger syndrome and treated for language therapy at the Dan Marino Center. He was six years old at the time of the pilot. He learned the interaction with the dolls quickly and was social while he interacted with the system. He appeared to know most of the emotion expressions in the clips. For those that caused him confusion he was able to guess on the second attempt because of the guide’s verbal prompts. More often he selected a doll before the guide or practitioner prompted him.

Subject 6 was the oldest child. He turned nine on his third day visit. He was diagnosed with autism at the age of seven. At that time he had social difficulties and deficits in language, social communication, and repetitive behavior. During the ASQ interaction, he was able to understand how to choose the matching dolls. His emotional understanding was high for three of the emotions except surprise. He enjoyed the video clips and suggested content he wanted included in the future. He had a sense of humor while he interacted with the clips and the dolls. If a delay or incorrect prompt occurred in the system, he would say, "he’s just kidding," referring to the guide.

Another representation of emotion recognition is shown in the next graph. A child’s ability to recognize the emotion category is shown by the curve representing that emotion. As stated with the last set of individual child graphs, a receiver operator characteristic was used to show the children’s progression of selecting a particular emotion. These convey the accuracy trend of the children’s matching an emotion over multiple attempts at picking dolls. As mentioned earlier, the application design requires the child to select the correct doll based on the emotion in the clip before another clip plays; however, they may attempt to select a wrong doll many times before they touch the right one. For example, when a sad clip was shown, Subject 1 had picked the sad doll only 25% of the time on his first attempt, but had picked it 50% of the time by his second attempt. By his third try, he picked the sad doll 87% of the time. When the curve levels off, it indicates that the attempts were not getting any more accurate: the child picked any doll but the correct one. When a curve lies ‘above’ another on this graph, the one that lies above has a better performance.

For Subject 4, the graph shows that angry and surprised were difficult emotions to match. All children eventually reached 100% indicating the child correctly picked the right doll, which was necessary before the system could advance to another clip. Because there are four dolls, three different attempts can be made at selecting and matching the emotion; by process of elimination. Not selecting the correct doll by the fourth attempt may indicate that either the child did not understand that emotion, did not interact with the application appropriately, or may have been obsessed by a prior doll choice and not been able to transition by picking a different doll.

The first graph shows the children’s recognition accuracy of the happy emotion by the first attempt, second, and so fourth with most of the children getting it perfect by the third attempt. Angry was the most easily recognized by all children except for Subject 4. Sad and surprise appear to have been more difficult emotions for these children. Note that the vertical scales on the four graphs are different to help the reader view the different children’s individual lines.

The higher functioning and older children (Subject 3, 5, and 6) were able to match all emotions by the second attempt, except for sad. The consistency in the correct selection on the second try is attributable to the guide prompt telling them to match the ‘emotion.’ Verbal children were able to use the guide prompt by either waiting for the first prompt or incorrectly selecting a doll after the first try. More often, if these children were confident in their recognition of the emotion, they would select a doll before the guide’s first prompt. This can be seen in the first graph for happy. Most of the older children accurately picked happy on the first attempt and accurately picked angry by the second attempt.