Approach

Autonomous Agents and Automatist Storytelling

Decentralized Systems

[1] Resnick, M. Turtles, Termites, and Traffic Jams. MIT Press. 1994. pp. 59-68 The Automatist Storytelling System is an instance of a decentralized system. In Turtles, Termites, and Traffic Jams, Mitchell Resnick describes the operation of "massively parallel microworlds." In the microworld, collections of simple rule-driven entities, or turtles, operate and interact in a controlled environment. Through their interactions with the environment and other turtles, patterns of behavior emerge that are not explicitly represented in the individual turtles' rules. For instance, Resnick describes one example that simulates the food-gathering behavior of ants by "programming" each ant with four simple rules -- none of which explicitly refers to the presence of other ants. When hundreds of such ants are placed in a virtual environment containing food, the ants eventually appear to "work together" to relocate food to the ant colony's nest. [1]
For several reasons, a decentralized approach is particularly well suited to the task of building responsive storytelling systems.

Incorporating the presence of the viewer into a decentralized system is straightforward. The viewer may exert influence over the emergent functionality of the system the same way any other component of the system does, by altering an aspect of the environment or influencing the operation of other components.

In this way, the viewer is a "full-fledged" member of the system and consistently integrated into the experience. This contrasts with the model of hypermedia, where the consistency of viewer interactivity depends on the author's consistency of establishing links. When systems are designed by more than one author, or when the base of content grows, maintaining a consistency of experience becomes an increasing burden. A carefully designed decentralized system inherently limits this problem.

Autonomous Agents

[2] Maes, P. "Guest Editorial" in Designing Autonomous Agents. North-Holland, 1990, p. 1 Introducing the topic Designing Autonomous Agents, Pattie Maes describes a shift in Artificial Intelligence research from approaches based on "deliberate thinking" and "explicit knowledge" to ones based on "distributedness and decentralization." She notes how these new approaches avoid the "brittleness" and "inflexibility" of the former by using "dynamic interaction with the environment and intrinsic mechanisms to cope with resource limitations and incomplete knowledge." [2]
In this excerpt from that introduction, Maes describes the idea of emergent functionality in systems comprising Autonomous Agents:
[3] Ibid.
One key idea in these new architectures is that of "emergent functionality." The functionality of an agent is viewed as an emergent property of the intensive interaction of the system with its dynamic environment. The specification of the behavior of the agent alone does not explain the functionality that is displayed when the agent is operating. Instead the functionality to a large degree is founded on the properties of the environment.

An important implication of this view is that one cannot simply tell these agents how to achieve a goal. Instead one has to find an interaction loop involving the system and the environment which will converge [towards] the desired goal.

An agent is viewed as a collection of modules which each have their own specific competence. These modules operate autonomously and are solely responsible for [the computation] necessary to achieve their specific competence.

Communication among modules is reduced to a minimum and happens on an information-low level. There is no global internal model, nor is there a global planning activity with one hierarchical goal structure. [The] global behavior of the agent is not necessarily a linear composition of the behaviors of its modules, but instead more complex behavior may emerge by the interaction of the behaviors generated by the individual modules. [3]

[4] Maes. "Situated Agents Can Have Goals" Designing Autonomous Agents. pp. 49-70 In a later article, Maes describes an approach to programming the mechanical behavior of a robot based on autonomous agents. Decisions about what action the robot should take at any given moment are based on an "action selection" algorithm. In this scheme, the "competency modules" are based on specific actions the robot arm can perform. The applicability or usefulness of each action is a function of the current state of the environment. When an action is selected and performed, its invocation alters the environment, thus influencing the selection of future actions. In this way, a sequence of actions -- a plan -- emerges. [4]
Instead of using strict Boolean logic for the action selection algorithm, Maes' system relies on the idea of a spreading activation network. Modules are invoked when their "predecessors" are sufficiently active; once invoked, activation spreads to a module's "successors." In addition, modules might inhibit or repress the activation of conflicting modules. In a spreading activation network, selection decisions are made simply by picking the most active module at a particular point in the plan. In this way, the system remains highly decentralized as the selection criteria is distributed by the effects of spreading activation.

Automatist Storytelling

The operation of an Automatist Storytelling System exhibits many of the properties of an Autonomous Agent-based system. In ConTour and Dexter, materials and keywords act as modules with an "internal representation" consisting of a list of associated modules; materials are associated with a set of keywords and conversely, keywords are associated with materials. Both materials and keywords spread activation, when invoked, to their associated modules. The resulting interaction of the spreading activation forms the basis of how materials are selected and sequenced. Thus, the resulting structure of the story is an "emergent property" of the interaction of individual material presentations.

Although the approach taken in the Automatist Storytelling System closely conforms to the ideas of Autonomous Agents, it is significantly different than previous applications of this methodology to the area of storytelling.

For instance, in Maes' own subsequent work, agents are applied in the following way:

[5] Maes, P. "Artificial Life meets Entertainment: Lifelike Autonomous Agents" in Communications of the ACM. Nov. 1995. V38 No. 11.
Many forms of entertainment employ characters that act in some environment. This is the case for video games, simulation rides, movies, animation, animatronics, theater, puppetry, certain toys and even party lines. Each of these entertainment forms could potentially benefit from the casting of autonomous semi-intelligent agents as entertaining characters. [5]
Thus, research originally developed in the context of coordinating the actions of a robot arm in an industrial environment is translated quite literally to the the idea of planning the actions of virtual characters in a fictive environment. Viewers are considered a part of the environment and thus, as in Laurel's ideal, "inside the story."

Describing the operation of one such system designed around the story of The Three Little Pigs, Maes and Rhodes state that:

[6] Maes, P. and Rhodes, B. "The Stage as a Character: Automatic Creation of Acts of God for Dramatic Effect" Presented at the AAAI '95 Spreing Symposium on Interactive Story Systems: Plot and Character
In our model, a story emerges from the interaction between discrete, autonomous characters, controlled either by humans or artificial systems. Each character has its own beliefs [and] motivations... [Characters] choose among their possible actions those that most fit their beliefs and motivations at the time. [6]
In these approaches, the process of story construction is viewed as one of generating a sequence of events, or a plot, based on the potential actions of characters with "motivations" while maintaining a global notion of "believability." Under such a scheme, the challenge of constructing "good stories" is a process of creatively expressing a well-formed chain of events.
[7] Minsky, M. The Society of Mind. Simon and Schuster, New York, 1986. In the Automatist Storytelling System, the fundamental units of structure are not events to be expressed but expressions themselves in the form of discrete units of content, or materials. Instead of characters interacting in an environment which is literally the "story world," individual expressions interact in an environment which is the process of the storytelling. In other words, in the storytelling system, what is simulated is not the story but the process of its telling. As such, the approach draws from theories of association and memory, such as those put forth by Marvin Minsky in The Society of Mind. [7]

Relative Value Systems

[8] Ishizaki, S. and Lokuge I. "GeoSpace: An Interactive Visualization System for Exploring Complex Information Spaces" ACM SIGCHI '95 Proceedings, Denver, Colorado My first exposure to the idea of spreading activation networks was seeing a demonstration of the GeoSpace system by Media Lab students Suguru Ishizaki and Ishantha Lokuge. [8] GeoSpace, described by the authors as an "Interactive Visualization System for Exploring Complex Information Spaces," presents a map of Boston and its surrounding cities. When the user requests information about a particular city, say Cambridge, the map gradually animates to emphasize the streets and other information related to that area. If the user then requests information about another city, say Somerville, the system gradually shifts emphasis to the new area while leaving some residual activation on the Cambridge area. In this way, the graphical presentation of information reflects the focus or attention of the user.
In GeoSpace, the presentation is seamless; instead of abruptly switching contexts, the system provides smooth transitions. In addition, the effects of residual traces of activation convey a powerful notion of "context preservation." After making several queries, the display provides a sense of the history of the experience with particular emphasis on the recent past.

Much of the power of this visualization scheme stems from the notion of a normalized system. When a user requests information about Cambridge, activation is injected into that city and spreads to related graphical elements. The system translates the amount of activation into the amount of visual emphasis given to that element -- its size, brightness, and depth. Every activation value is normalized to the total amount of activation present in the system. As in a monetary system, as activation is added to one element, others implicitly devalue. Thus, by gradually increasing the value of one or more elements, the entire system responds to stay in a kind of visual equilibrium.

Thus, a key property of a closed or normalized system of values is that an individual value is only meaningful with respect to the larger "containing" system. This principle, which might be called a Relative Value System, forms the basis of the ConTour interface and is also used for the "Materials Listing" component of the Dexter interface.

Simple Keyword Representations

Both Dexter and ConTour rely on a relatively simple keyword-based approach to representing their story content.

One if by Clip, Two if by Stream

Schemes for describing video content to a computer have been the subject of research since the dawn of random-access (and now fully digital) video. Approaches may be generally placed in one of two camps: stream-based or clip-based.
[9] Davis, M. "Media Streams: Representing Video for Retrieval and Repurposing" MIT PhD Thesis. 1995. In a stream-based representation, such as Marc Davis' Media Streams [9], the temporal nature of video is explicitly incorporated into the representation. Description, in the form of keywords, or in Davis' case a set of several hundred icons, are applied to some duration of the video "stream" in order to describe it.
If one imagines the sum of available video in a stream-based system to be a single stream, then a particular description may be thought of as simply a set of durations on this larger stream.
[10] Evans, R. "LogBoy Meets FilterGirl: A Toolkit for Multivariant Movies" MIT MS Thesis. 1994. In a clip-based system, video is broken into discrete chunks or clips. Description is then applied to the entire clip. An example of a clip-based system is Ryan Evans' LogBoy. [10] In this system, each clip is described by sets of slot-value pairings such as "Location: Trees" or "Character: Darcy."
The benefit of stream-based systems of representation is the fact that descriptions exist as independent overlapping layers. The start and stop points for each descriptor do not have to match any notion of a clip boundary. In a stream-based system, one could precisely represent a scene with two characters where each character enters and exits the frame independently. For this reason, stream-based systems are particularly well suited to the task of "low-level" editing, where the video content is considered relatively "raw" and a precise level of description is required to make in and out point decisions.

Clip-based annotation, in contrast, provides a further level of control to authors by placing them in the position of structuring the story materials into functional units. Clip-based schemes seem to match the basic filmmaking process of refining "raw footage" by picking out useful or "good shots." Typically, these shots function as a unit; a description scheme that treats them as such is often sufficient. In addition, clip-based annotation may be employed for materials that aren't explicitly temporal such as still images and text documents.

Representation in ConTour and Dexter

In ConTour and Dexter, content is treated as discrete chunks each described by a set of keywords. The relationship between a unit of content and a keyword is neither weighted nor qualified by any notion of "slot" or type. Instead, all weighting occurs only as a function of the presentation; the essential representation remains quite simple.

The granularity of each unit of content is a key issue. Each piece must in some sense be self-contained and coherent on its own. On the other hand, due to the simplicity of the representation, a given chunk should only be about one particular set of things. In other words, each piece should form a kind of story phrase, complete enough to be coherent, yet not covering too large a range of ideas. Ultimately, the pieces must lend themselves to being dynamically edited together with other related pieces.

In the story applications described in this thesis, video clips tend to be between thirty seconds and two minutes in duration.

In ConTour and Dexter, discrete units of content (materials) are described by units of description (keywords).

By connecting a material to a keyword, the author forms a potential link to other materials that are described by that keyword.

Deferred Sequencing and Extensibility

The essential function of keywords in both ConTour and Dexter is to isolate authors from the process of defining explicit relationships or links between units of content. Instead, the author connects materials only to keywords. By connecting a material to a keyword, the author defines a potential kind of connection between the material and others that share that keyword. By connecting each material to a set of keywords, the author enables a material to be related to other materials in more than one way.

Lacking explicit links, sequencing decisions are made during the viewing experience based on implicit connections via keywords. Deferring sequencing decisions in this way has two consequences: First, the base of content is truly extensible. Every new material is simply described by keywords, rather than hardwired to every other relevant material in the system. In this way, the potential exponentially-complex task of adding content is managed and made constant. Second, because sequencing decisions aren't pre-coded, viewers may play a more active role in the construction of the experience. Instead of using pre-determined links bound to a specific purpose or organizational scheme, the viewer may influence how they want to move from one material to the next.

Hierarchies of Description

One feature of the representation scheme used by ConTour is that keywords themselves may be "described" by other keywords. In Boston: Renewed Vistas, this facility is used to group keywords into four "meta-keyword" categories: person, location, time, and theme. Although this facility isn't explicitly available in Dexter, a similar conceptual organization exists for the Random Walk keywords.

Materials in these stories are described by a set of particular people, places, times, and themes. As far as the system is concerned, however, the categorizations are irrelevant. Materials may be connected to more than keyword within the same category, useful for instance when a character in the present recalls an event in the past, or when the material addresses several themes simultaneously. Likewise, materials need not be associated with a particular category's keywords at all, useful for "general" clips not necessarily tied to a specific time, place, or character's voice.

By using a keyword hierarchy as opposed to an explicit notion of slot (such as clips having "person slots" and "location slots") the representation is kept as simple as possible. A key point in both ConTour and Dexter is that it's not the representation, but what's done with the representation that's interesting and powerful. Furthermore, isolating elements of description in this way is in keeping with a decentralized approach; in this case, slots would needlessly bind components of the description together and disallow their potential usefulness as independent entities.