Critique on Walker et al.

Critique on

Using a Human Face in an Interface

by Walker, Sproull, and Subramani (1994)

This is a classic experimental study in Psychology, and it is a very good example for this kind of work, e.g., the article is very clearly structured (abstract, introduction, questions, methods, results, discussion, and conclusions). Nevertheless, although it is a good example, this doesn't mean that it is a good paper!

Articles of this kind are usually quite cautious about the questions they address, and even more cautious about what the results of the experiment are. A lot of studies hide behind the extremely formalized standard structure the fact that they do not have anything new to say, are irrelevant, or unimportant. We will see if this paper is different.

What is this study about?

The gist of the paper is why a too human-like interface should be bad. Answer: It is not bad at all. Most people just expect too much (over generalize) and are disappointed afterwards.

Let's look at the parts of the paper:

The introduction (p. 85-86) is impressive--the authors are aware of what has been done in this field. They have done their homework.

Let's go over to the research questions:

Because the authors are very careful, the first question is if people are willing to do such an experiment (p. 86). They give no answer to this question, because this is not a real question. If you agree to be a subject in an experiment, you would never refuse to cooperate for such an unimportant reason, so this is a superfluous question.

The second question is if a talking face distracts people, and if their performance is seriously degraded. The authors spend about one sentence on this issue: No, the performance is the same.

The third question is how people experience the interaction with the face, if it seems human to them, and if it evokes a social response.

Now we come to the most problematic part of these papers: How to instrumentalize such a question, how can one measure "experience?"

The authors propose an interview survey, because "people are quite familiar with the general social structure and form of interview surveys." (p. 86) That is just not true. Most people are NOT used to interviews. In the contrary, they might behave in a quite unpredictable way. Although Walker et al. mention that there is an extensive literature on how the nature on the agent affects peoples' responses, they do not mention that there is also a lot of literature about how to get the subjects to answer in exactly the way the experimenter would like them to answer. Even more problematic, some interviewers are not aware of their influence on the subjects.

Let's look at the methods:

The subject population is highly problematic. They are not representative at all. Or in other words: What sort of population should they represent? At least they are NOT part of the psychologically best-explored population on this earth: Psychology college students!

Task: What do the authors mean by "Subjects were informed of the purpose of the study?" (p. 86) Do the people know that this study is about the effect of a talking face, or do they just think that it is about user satisfaction with the computer support service? It is VERY IMPORTANT for the outcome of such a survey what sort of instructions the subjects get!

Procedure: I would like to know how the screen really looks like: the displayed text, the window to type in the answers, the help window, and finally the face. Why isn't there a screen shot? Another thing: I would like to read more about the "scrolling back" feature. If one scrolls back to an earlier question, is the face replayed too?

Apparatus: Why did they use synthesized speech, and not digitized speech? Although DECtalk might be "acceptably comprehensible," (p. 87) it is far away from the natural sounding digitized speech. Unless the authors want to see not only the influence of a talking face, but also of synthesized speech, there is no reason to use a speech synthesizer. Another question, which is related to that: If the authors want to measure the effect of a human face in interface design, why don't they use video recordings of real people?

To measure the difference between their stern and neutral face after the actual study looks quite risky to me. Why didn't they evaluate this in advance?

Results: It looks to me like a cell size of around 15 is quite small for an Analysis of Variance (ANOVA). However, in college I was taught a lot of tricks how to influence statistical results just by adjusting the cell size and related factors. The general problem with extensive use of ANOVAs is that one gets easily trapped in the details of the combinations of different significant factors. To explain main effects and significant conditions is tricky, but to find out about the underlying reasons for such significant differences is just not possible with this method.

The interesting question, why would people spend more time, respond more, and respond more carefully to a face that they did no like (p. 89), is addressed only marginally in this paper. The theories they use to explain this effect are well known and on a very basic level--if not to say, out of date.

The paragraphs that I definitively like the best are the two last ones (p. 90). If a computer is a social actor, what is its role: master, apprentice, or partner? And the most important one: Should human facial realism be a goal? To that final question, they give an answer, which I think is the most important insight I gained from this paper.

"The goal of HCI work with synthetic faces should not necessarily be to give a computer a human face but rather to determine when a face-like interface is appropriate."

Eventually, even the final sentence fits perfectly into the tradition of this sort of paper: "Further research is necessary to identify the components of a satisfactory experience with a human-like computer interface." Wasn't this the question they asked at the very beginning of the paper? ;-)

Send me some comments! Stefan Marti

Last updated Feb 16 1998.