index back 5.1. Summary - 5.2. Evaluation - 5.3. Future directions  

5. Conclusion


5.1 Summary

This thesis has introduced a novel approach to the design and implementation of avatars, drawing from literature in context analysis and discourse theory. The thesis opened by revisiting the notion of cyberspace as a virtual gathering place for geographically separated people. As motivation, it went on to specifically mention chatting, telecommuting, and gaming as some of the major applications for avatar technology. By presenting examples of current systems, it was argued that today’s avatars merely serve as presence indicators, rather than actually contributing to the experience of having a face-to-face conversation. In order to understand the important communicative functions of the body, the thesis covered previous research in social sciences on multi-modal communication. Finally the thesis described BodyChat, a system that employs those findings in the automation of communicative behaviors in avatars.

This thesis is more than a presentation of a solution to an engineering problem. It touches on a very important problem concerning embodiment in virtual spaces, notably how do we map a person onto that person’s virtual representation. In particular, by discussing the various communicative functions of the body, this work brings up the issue of displaying spontaneous and involuntary visual cues that are essential for initiating and sustaining a face-to-face conversation. Since the person sitting at the desktop neither shows the appropriate visual cues for the virtual setting nor consciously thinks about them, we need a way to generate them. This work suggests looking at the avatar as a personal conversational agent, monitoring the user’s intentions and applying knowledge about social behavior to come up with appropriate non-verbal cues.

 

5.2 Evaluation

BodyChat is a prototype that is intended to demonstrate a particular approach to avatar design and implementation. It is not meant as a product ready for distribution and general use, and therefore lacks many of the functions featured in comparable products. However, when comparing the communicative behaviors of avatars in different systems, it is clear that BodyChat starts to fill a vacuum. It presents a new approach that takes avatars from being a mere visual gimmick to being an integral part of a conversation. Although no organized user testing has been performed, reaction to BodyChat has been positive and encouraging, reinforcing the belief that the modeling of autonomous communicative behavior is worthwhile.

Regarding the approach in general, a few limitations should be considered. The first thing to keep in mind is that although communicative non-verbal behavior adheres to some general principles, it is far from being fully understood. Any computational models are therefore going to be relatively simplistic and constrain available behavior to a limited set of displays void of many real world nuances. This raises concerns about the system’s capability to accurately reflect the user’s intentions under unforeseen circumstances or resolve issues of ambiguity. If the avatar makes a choice that conflicts with what the user had in mind, reliability is severely undermined and the user is left in an uncomfortable skeptical state. The balance between autonomy and direct user control is a really tricky issue.

Another consideration is that it is hard to personalize the autonomous behavior and give it a flavor that reflects the distinct character and mood of the user. A solution may be provided by the use of a template of personality traits filled out for each user that then affects the manner of behavior execution. However the dynamic nature and context dependency of these traits pose a major challenge. Again the question is how much autonomy should be incorporated into the avatar and to what extent the direct control of the user carries the character.

 

5.3. Future Directions

Expansion in two areas

The issue of avatar control is far from trivial and presents many interesting problems. As described above, the current work introduces an approach rather than a solution. This invites further research, both to see how well the approach can be applied to more complex situations and how it can be expanded through integration with other methods and devices. The following two sections elaborate on two different aspects of expansion. The first deals with the capabilities of the avatar and the second with the monitoring of the user’s intentions.

Avatar behavior

This thesis only starts to build a repertoire of communicative behaviors, beginning with the most essential cues for initiating a conversation. It is important to keep adding to the modeling of conversational phenomena, both drawing from more literature and, perhaps more interestingly, through real world empirical studies conducted with this domain in mind. Behaviors that involve more than two people should be examined and attention should be given to orientation and the spatial formation of group members. The humanoid models in BodyChat are simple and not capable of carrying out detailed, co-articulated movements. In particular, the modeling of the arms and hands needs more work, in conjunction with the expansion of gestural behavior.

User input

An issue that did not get a dedicated discussion in this work, but is nevertheless important to address, is the way by which the user indicates intention to the system. BodyChat makes the user point, click and type to give clear signals about intention, but other input methods may allow for more subtle ways. For example, if the system employed real-time speech communication between users, parameters, such as intonational markers, could be extracted from the speech stream. Although using cameras to directly map the living image of a user onto an avatar is not a good approach, as discussed in section 2.3.4, cameras could still gather important cues about the user’s state. This gathered information would then be used to help constructing the representation of the user’s intentions. Other ways of collecting input, such as novel tangible interfaces and methods in affective computing, can also be considered.