Presented at Lifelike Computer Characters, Utah, October, 1996

Why Put An Agent In a Body: The Importance of Communicative Feedback in Human-Humanoid Dialogue

Kristinn R. Thórisson* & Justine Cassell

Gesture & Narrative Language Group

M.I.T. Media Laboratory

{kris, justine}@media.mit.edu

http://www.media.mit.edu/~{kris, justine}

Although many different human characteristics have been put forth as the key to making humanoid agents lifelike (e.g.. emotional expression, fluid body movement, face and hand gestures, realistic-looking skin) the young field of synthetic computer characters has not seen much research comparing these different putatively "most important" characteristics of believable computer characters. Of course, research on the effectiveness of natural language-based, humanoid agent systems, and on the role of believability in such systems, has to date been hampered by the lack of real computer systems capable of sustaining and supporting spoken dialogue with a human user. To deal with the issue, researchers have turned to various methods such as Wizard-of-Oz techniques, static representations of agents, and keyboard-and-mouse interaction [c.f. Maulsby et al. 1993, Thórisson 1992, Hauptman 1989]. As a result, the findings may not be generalizable to future systems employing the humanoid agent metaphor and capable of full-duplex multimodal interaction.

We used a fully automated character generation system, capable of real-time, multimodal, face-to-face interaction with a user [Thórisson 1996] to assess users' reactions to two human characteristics commonly discussed in this context: non-verbal feedback related to the interaction and facial emotional emblems [Ekman 1979]. Specifically, we examined users' attitudes to, and efficiency of their interaction with, three different humanoids: {1} Content-only character (CT), {2} a content + emotional emblems character (EM), and {3} a content + non-verbal communicative support character (CS). Users' experience of the believability and ease of interaction was assessed with a questionnaire. The efficiency of the interaction was measured by how many times users repeated themselves. We hypothesized that EM and CT will be equal on both measures, but CS feedback will prove to make a significant difference for these variables.

The character, represented by a face and a hand, appears on a normal-sized monitor beside a big screen projector, on which a graphical model of the solar system is displayed. Users can ask the character to take them to each planet and ask it questions about them. The characters' "sensory organs" are a body-tracking suit and a microphone. The characters in each condition are equally knowledgeable about the solar system, and their verbal responses are equally rapid, but they provide the following different non-verbal feedback: In the CT condition the character gives verbal feedback only relating to the content of the dialogue; the EM character gives the same verbal feedback and also smiles occasionally when it has finished some action and looks puzzled if it doesn't understand what the user says; the CS character provides the same verbal feedback as CT with the addition of behaviors relating to the process of dialogue: turning to and/or gazing at the big screen or the user at the right times, giving various non-verbal cues to show when it decides to take the turn (when the user has finished making a request), and hand gestures that support its utterances (beat gestures & pointing at the planets when speaking). It also blinks, and drums with its fingers when its hand is at rest. The experiment was a repeated-measures design, with 12 subjects.

Two hypotheses were tested: {H1} No significant differences will be found in (a) ease of interaction and believability or (b) efficiency between the CT and EM conditions. That is, we didn't expect emotional emblems to add anything to the interaction. {H2} We expected to find a significant difference in both (a) ease/believability and (b) efficiency between the CS condition and the other two conditions. In other words, we expected behaviors relating to the process of dialogue to prove significantly more important to the users' acceptance of the character/interaction, as well as to the effectiveness of the dialogue, than either content feedback alone or content feedback and emotional facial displays.

Both hypotheses were confirmed (p < .05), supporting our claim that what really matters in face-to-face dialogue is, in addition to "classical information exchange", the supportive behaviors that often have been dismissed as incidental to effective cooperation [Ochsman & Chapanis 1974]. Should designers of interactive computer agents (co-spatial, co-temporal speech-based interaction) ignore supportive behaviors relating to communication, they are likely to end up with less believable, less effective agents.

References

Ekman, P. (1979). About Brows: Emotional and Conversational Signals. In M. vonCrahach, K. Foppa, W. Lepenies & D. Ploog (eds.), Human Ethology, 169-243.

Hauptman, A. G. (1989). Speech and Gestures for Graphic Image Manipulation. Proceedings of SIGCHI '89, May, 241-5.

Maes, P. (1994). Agents that Reduce Work and information Overload. Communications of the ACM, July, 37(7), 31-40, 146.

Maulsby, D., Greenberg, D. & Mandler, R. (1993). Prototyping an Intelligent Agent through Wizard of Oz. Proceedings of InterCHI '93, 277-84, April 24-29, Amsterdam.

Ochsman, R. B. & Chapanis, A. (1974). The Effects of 10 Communication Modes on the Behavior of Teams During Co-operative Problem Solving. Int. J. Man-Machine Studies, 6, 579-619.

Thórisson, K. R. (1993). Dialogue Control in Social Interface Agents. InterCHI Adjunct Proceedings '93, 876-881, April 24-29, Amsterdam. [PDF]

Thórisson, K. R. (1996). Communicative Humanoids: A Computational Model of Psychosocial Dialogue Skills. Doctoral Dissertation, Massachusetts Institute of Technology, September 1996.


*Now at LEGO A/S, Klovermarken 120, 7190 Billund, Denmark. kris@digi.lego.com