ConTour
[1] An early version of ConTour, called ConText, is described in:

Davenport, G. and Murtaugh, M. "ConText: Towards the Evolving Documentary" Proceedings of ACM Multimedia '95, San Francisco, 1995.

ConTour is a graphical demonstration a simple Automatist Storytelling System. The system represents a potential "back-end" or "narrative engine" for an end-user storytelling system.

As an application, ConTour is a generalized system for producing continuous "steerable" presentations of keyword-annotated movies and pictures. In this capacity, ConTour functions as a "digital editing assistant" -- interactively suggesting possible sequences of materials. The user steers and shapes the presentation by activating and weighting keywords.

ConTour is the result of several iterations of storytelling systems designed in conjunction with the story Boston: Renewed Vistas. [1] This project serves as a model use of ConTour throughout this discussion.

The Evolving Documentary

The "traditional" process of making a documentary film could be roughly described in the following way: The filmmakers collect a large amount of raw material -- original film footage, archive photographs, text articles. These raw materials are organized in progressively larger chunks: shots, scenes, and sequences. Finally, sequences are edited together to form the final "cut" of the film. Often this form is in some way constrained in its timing or structure depending on the conditions of its presentation (e.g. television or theatrical release). Regardless, the resulting experience, as presented to the viewer, is rigid and uniform; every viewer sees the same presentation, no matter when or how they see it.
Traditional Film / Video
Production Model
Described in this way, the filmmaking process may be seen as a kind of funnel, as a large collection of content -- frequently an order of magnitude larger in duration than the final piece -- is gradually refined and reduced to form the program. As editing decisions are made, such as the decision to place a particular shot or scene at the beginning or end of the piece, the program becomes more and more determined; each placement dictates to some degree the shots and scenes to precede or follow. In this way, as the various pieces fall into place, a particular story, with central characters and themes, begins to form.
The Storytelling System Model
[2] The idea of an Evolving Documentary previously appeared in the context of Gilbert Houbart's It was a Knowledge War, an investigation centered on media coverage of the Gulf War.

Houbart, G. "It was a Knowledge War" MIT MS Thesis, 1994.

An experience based around a storytelling system is more hourglass shaped -- open on both the authoring and viewing sides. In this model, the author is isolated from the process of explicitly sequencing their content by the storytelling system; there is no "final cut" of the film. Instead, editing decisions are deferred -- made later by the storytelling system in the context of a particular viewing experience.

The viewer's experience is no longer rigid or uniform; the construction of the experience may be sensitive to the conditions of its presentation, including the actions and any available knowledge of the viewer. The experience itself is extensible; viewers are free to stay with the story for as little or as long a time as they wish. The experience is also repeatable; viewers could leave having only seen a portion of the available material and return later to see more.

The system is open-ended on the author's side as well. Instead of "sealing off" the story with the release of a particular program or film, the base of content is free to grow as the story grows. Furthermore, as structural decisions are deferred, the story remains to some degree undetermined and thus free to support presentations with a range of main characters and central themes, as opposed to one particular configuration. For these reasons, we call this form an Evolving Documentary.

The Evolving Documentary provides a mechanism for presenting a range of stories that have been traditionally difficult to cover. Specifically, the contemporaneous coverage of stories with long and possibly unknown time spans, as well as stories with a large number of influences and possible perspectives, are particularly challenging for a conventional form like television news. Complaints about television news being too focused on "the moment" and failing to do "adequate followup" seem rooted in the inherent constraints of the form. Examples of stories particularly well suited for an Evolving Documentary investigation are those about wars, urban change, and politics.[2]

The Evolving Documentary form provides an appealing mechanism for developing an "intelligent story archive," allowing isolated materials collected in the present to eventually link to relevant materials added in the future.

Boston: Renewed Vistas

ConTour is the result of several iterations of storytelling systems designed around the story Boston: Renewed Vistas. This story, directed by Glorianna Davenport as part of the Workshop in Elastic Movie Time at the MIT Media Lab, concerns the multi-billion dollar public works project known as the "Big Dig" currently taking place in downtown Boston. The project, slated to be finished in the year 2004, represents the largest public works project ever undertaken in the United States and is about 90 percent federally funded. The centerpiece of the project is the removal of the existing Central Artery, a massive elevated highway extension built in the 1950s, and the construction of its underground replacement. The plan calls for the underground construction to take place while the existing roadway remains in operation.

The Big Dig story is well suited to the Evolving Documentary form for a number of reasons. The project is extremely complex and may be seen from a variety of perspectives: the history of the Artery, the politics of how the project came about, the economics of the project's funding, and its impact on adjacent neighborhoods like the North End. The time span of the story is quite extensive and ongoing; the original Artery was built in the 1950s, the formulation of a replacement plan began the 1970s, and its removal will not occur until sometime in the 2000s.

Two video clips annotated by a set of keywords

System Overview

The single input to the ConTour program is a text file describing the database of materials. Each line in the file specifies either a text item (keyword), a still picture, or a video clip. Every item also has a text name and screen position. Pictures and video clips are additionally specified by paths to their respective Macintosh picture or QuickTime files. Finally, all items optionally specify a list of descriptors -- the names of keywords used to describe that item.

After reading the database, ConTour creates thumbnail images of each of the given picture and video files and displays each item at their given position. Once in ConTour, the user is able to arrange elements onscreen by dragging them while holding the shift key -- in its present form, ConTour does not automatically position elements. Once arranged manually on screen, the user may resave the database file with the updated position information.

Initial screen display of
Boston: Renewed Vistas

Activation Values and the Graphical Display

Every keyword and material in ConTour has an associated activation value. When a keyword is clicked on or a material is presented to the viewer, the element's activation value is raised -- the element is "injected" with activation.

Together, the activation values of every keyword and material in ConTour form a closed or relative value system, which serves as the basis for both the automatic material selection algorithm and the system's graphical display.

Activation values are used to determine how elements are drawn on the screen; an element's size, depth or z-coordinate, and brightness, are all derived from its activation value. The system uses activation to represent an individual element's relevance to the current "context" of the story playout. Elements with relatively high activation values are made visually prominent by making them appear brighter and closer than elements with lower activation values.

Each element's size, or relative amount of screen space corresponds to its relative amount of activation. Specifically, each element's "screen area" is determined by multiplying the ratio of its activation value to the total amount of activation in the system by the total amount of screen area.

Each element's depth and brightness are determined in a similar way. Individual activation values are evaluated in the range from the system's minimum to maximum activation value. The minimum value is mapped to the bottom-most position and made least bright while the maximum value is mapped to the top-most position and made most bright. All other values are linearly mapped to in-between depth and brightness values.

In sum, the system uses activation to convey the current context or focus of the story presentation; active elements appear prominently in the "foreground" while less active elements fade to the "background."

Two Basic Rules of Operation

The basic operation of ConTour may be summarized by the following two rules:

  1. When a keyword is activated, it spreads its activation "downward" to materials described by that keyword.
  2. When a material is presented to the viewer, activation is spread simultaneously "upward" to each of the keywords used to describe the material, in turn invoking the effects of rule 1.
When the user clicks on a keyword, the keyword's activation value is incremented by a relatively large amount. As a result, this activation spreads to each of the materials described by the keyword. In this way, the user is given a graphical sense of that keyword's "coverage" or use in the database.
"Database Coverage" views of the keywords (clockwise) Homer Russell, Nancy Caruso, Future, and North End.
When more than one keyword is activated, the effect is additive; materials described by both keywords are the largest and float to the top while materials about just one of the two keywords remain slightly smaller and appear farther back. Materials not described by either of the two keywords recede fully into the background. This is an inherent property of the normalized relative value system: as activation is added to certain elements, the relative value of others implicitly decreases.
The "additive effect" of clicking on both Homer Russell and Future

The two largest thumbnails are described by both keywords -- the rest by either one or the other.

In ConTour, non-text elements below a certain depth threshold are drawn in shades of gray. Visually, the effect is that background elements go "out of focus," drawing the viewer's attention foreground elements. This threshold, defined as a percentage of the maximum activation value at a given time, may be defined at the top of the ConTour database file -- generally the threshold is set at 50%.

Description Feedback

The second rule of operation completes the picture. When a material is selected for playout, activation is simultaneously injected into each of the material's keywords. Indirectly, by the operation of the first rule, activation spreads to other materials described by these keywords. In this way, the presentation of a material has the indirect effect of increasing the selection potential of similarly described materials. Once the next material is selected and presented, the process repeats. This property is termed description feedback.

Automatic Playout

By clicking on the "play" button located in the lower-righthand corner of the screen, the user toggles the automatic playout mode. When activated, the system presents materials continuously, one at a time. The system selects materials for presentation by picking the material with the highest activation value. If there's a "tie" the choice is made at random from among the "high-scorers." As soon as the material is finished, or if the viewer clicks on the material to stop it, the next most active material is selected and presented. The following is an example of an automatically generated sequence of clips:
Playout begins either by a random selection from the database (if no keywords are active and no materials have already been presented), or by the user clicking on a particular thumbnail.

In this clip, Nancy Caruso describes the Central Artery as a "Protective Barrier" for the North End. The clip is described by the keywords: North End, Central Artery, Nancy Caruso, Protection, and Barrier.

The system next selects a clip where Fred Salvucci describes how the Artery's removal may act to "erase the scar" its construction created. This material is also described by the keywords: North End, Central Artery, and Barrier, as well as Fred Salvucci.
In the next clip, Homer Russell describes the Artery as a "Chinese Wall," cutting the North End off from the rest of the city, and the city from the waterfront. He also refers to Boston's past experience with a neighborhood called the West End.
In this clip, Nancy Caruso describes what a "protected community" the North End has traditionally been.

Throughout this sequence, note how the presented clip appears as a prominent thumbnail in the previous step -- when the material is very active and about to be selected -- and as an empty rectangle in the following step.

The viewer clicks on economics to steer the presentation.

The system keeps track of what materials have already been shown to the viewer and makes sure not to select the same material twice. Already viewed materials appear as empty rectangles. In this way, the system keeps "moving forward." By clicking on the lower right hand corner of an empty rectangle, the viewer may manually re-display a material.

ConTour automatically sequences materials based on a kind of "free association" model. As a material is presented, it biases the selection process toward other materials with similar descriptions. When the next material is presented, any new keywords -- those not in common to the previous material -- get "added to the mix." In this way, ConTour moves slowly through the database of materials in as "connected" and coherent a way as possible.

When the system is in automatic playout mode, the viewer may continue to influence the presentation by activating keywords. For instance, in Boston: Renewed Vistas, the viewer might click on the theme "economics" to pull the story in that direction. Due to the systems decentralized selection scheme, the effect of the viewer activating a keyword simply adds to or complements the description feedback process. In this way, viewers may steer the presentation toward particular topics of interest.

Temporality of the Material Presentation Effect

An important subtlety to the effect of material presentation is the way it takes place over time. All materials have a presentation duration. For video clips, it is the inherent duration of the video; for a still picture, the duration is a constant set to something like 5 or 10 seconds.
When the presentation of a material begins, the activation spreading effect occurs gradually over its duration. In this way, the effect of a material presentation is maximal just as the selection of the next material is made. Once a material is finished, 90% of the activation effect gradually dissipates over 1.5 times its original presentation duration. In this way, the majority of a particular clip's effect is restricted to just one or two subsequent editing decisions.

The remaining 10% activation effect from every presentation persists and accumulates over the course of the entire experience (or until the system is reset). This "description sediment" slightly biases the presentation toward keywords the viewer has had some prior exposure to.


The "memory trace" effect results from the persistent effects of all previously presented materials.

When the presentation is stopped, the system will gradually "settle" to a stable state that, due to the persistent effects, exhibits a kind of memory trace showing the degree to which various keywords have been activated.

In sum, the effect of a material's presentation may be seen as having two components, each occurring at a different structural level of the experience. The initial maximal effect is highly localized to a specific material presentation and acts to maximize the descriptive coherency between individual "shots." We might therefore call this effect a "scene-level" competency. On the other hand, the 10% effects are present for the entire experience -- they constitute a "program-level" effect. In terms of its function, the 10% effects accumulate and represent the slowly expanding "scope" of the program -- one that represents the viewers' preferences if they've chosen to steer the story toward topics of interest.

Depth vs. Breadth with Spread-Weights

In its basic mode of operation, the description feedback loop is strictly positive; when a keyword is activated, it tends to emphasize its related materials. The result is a depth-first exploration of the database. When the user sees a material about a particular theme or person, they tend to see more about that theme or person.

We alter this situation if we allow keywords to exhibit a kind of gain control or spread-weight. For instance, we might negatively bias a keyword so that when it is activated, it tends to inhibit or suppress its associated materials rather than emphasizing them.

Clicking on Homer Russell when characters are negatively weighted causes related materials to be suppressed.
Graphically, keyword spread-weights are indicated as varying degrees of red (positive), blue (negative), or gray (for zero). When a category keyword's spread-weight is set -- such as "character" or "location" -- the spread-weight for the entire class of keywords is identically set. [3]

By setting the spread-weight of a class of keywords to a negative value, the system tends to present a breadth of content relative to that class. For instance, in Boston: Renewed Vistas, if character keywords are made negative, then the presentation of a clip featuring Nancy Caruso temporarily suppresses other content associated with this character, increasing the likelihood of a different character in the next clip.

The presentation of materials with zero-weighted keywords has no bearing one way or the other on the selection of other materials. In this way, the user can choose at times to disable certain keywords or classes of keywords. In Boston: Renewed Vistas, for instance, we might choose to zero-weight the location class to make locations irrelevant to the selection process.

The full potential of using spread-weights is realized when two or more classes of keywords are set to different values. For instance, by making character keywords negative and theme keywords positive -- the resulting playout will tend to present a range of characters' viewpoints focused on particular themes. Operating this way, the presentation could be said to "develop themes." In contrast, the spread-weights could be reversed to "develop characters" instead by presenting the range of themes associated with a particular person.

Here, the system is set to present a depth of theme over a breadth of character. Location and time keywords have been rendered inactive by zero weighting.
The addition of keyword spread-weights demonstrates the potential of ConTour's decentralized architecture. Though a relatively simple addition, spread-weights complement the existing selection mechanism to add a powerful new tool to our "story engine" -- a generalized means for controlling the shape of a story presentation.