Design examples in ASPD

Abstraction and Specification in Program Development (ASPD), by Barbara Liskov and John Guttag, has some intriguing comments on design. Their text formatting example highlighted the following design pattern.

Abstract, Process, Specialize

Many computations can be viewed as converting between one representation and another. In reality, though, changing requirements make them more accurately viewed in terms of converting one of N slightly different input formats into any of M slightly different output formats. Implementing each of these N*M converters separately is wasteful and error-prone. Instead, we can implement N converters into a common, abstract format, and then M converters out of this format.

Suppose we want to write a utility for text formatting. The input will be plain text with embedded formatting commands like ".br" for line break, ".rj" for right justification, etc. The output will be formatted text, e.g. in PostScript. Already we can see that there are many variables in both the input and output formats, e.g. how commands are embedded and what commands there are.

We'll start by abstracting the input. No matter what format we use, there will be a notion of characters grouped into words, words grouped into sentences, sentences grouped into paragraphs, etc. This suggests a hierarchical data structure. Formatting commands can be special placeholders.

Then we process the input. For example, justification may require the addition of spaces, or the hypenation of words. All of these changes can be made on the abstract data structure.

The modified data structure is then specialized to a particular output format. Commands remaining in the text, like font changes, will be converted from their abstract form.

For many formatting tasks, we don't have to store the whole input document before we do any output. For example, our data structure may only need to hold a single line of input. If we restrict commands to line breaks and justification, then only one command can appear per line and we don't need to add placeholders. Instead, each input line is converted into either a sequence of words or a command.

This "abstract, process, specialize" pattern is also common in program compilation (GCC being a good example).

Evaluating a Design

Periodically during the design of a large program it is worthwhile to step back and attempt a comprehensive evaluation of the design so far. This process is called a design review. Design reviews should always be conducted by a team composed of people involved in the design and others who are not. Those not involved in the design should be familiar both with the program requirements and with the technology that will be used in implementing the design. They should also be familiar with the design itself.
It is important that both the designers and the outside reviewers understand that the point of a design review is to find problems with the design. While the designers will inevitably find themselves attempting to justify their design decisions to the outside reviewers, they should not view this as their primary goal. In addition, both the reviewers and the designers must keep in mind that the purpose of the review is only to find errors, not to correct them. Errors should be recorded in an error log, and then the review should continue (unless so many errors have been found that continuing is no longer productive).
It is useful for the designers to present not only the design but also the alternatives that were considered and rejected. This will give the outside reviewers a context for evaluating the chosen design. It may also help the reviewers to find flaws in the chosen design. A common problem is failure to apply design criteria uniformly. Explaining that an alternative was rejected because it failed to meet some criterion may well prompt the reviewers to notice that some other part of the design fails to meet that same criterion.

The Relationship of Abstraction to Efficiency

Most of the time, well-chosen abstractions have a salutary effect on efficiency. They can be a big help in designing efficient algorithms. Moreover, by increasing the modularity of programs, they make it far easier to introduce modifications. This, in turn, helps us to eliminate bottlenecks and respond to changes in efficiency requirements. Occasionally, however, the introduction of abstractions can have an adverse impact on efficiency. An example of a serious problem is a procedure that could take advantage of accessing the rep of some type. When such a problem arises, there is nothing to do but consider reworking the design.

This is a poorly understood phenomenon among novice programmers. A reason might be that many computer languages feature "abstract" vs. "efficient" ways of doing the same thing. These alternatives are not only dangerous and conceptually unnecessary, but also confusing, in that people are led to believe that the dichotomy holds in anything but trivial cases.

For example, C++ and Sather both feature statically-dispatched methods as an "efficient", but less flexible, alternative to dynamic dispatch. Not only does static dispatch wreak havoc on the concepts of inheritance and polymorphism, but the efficiency benefit can be easily swamped by the real bottlenecks in the program. Eliminating these bottlenecks often requires many changes and experiments, emphasizing a more flexible program. Furthermore, other languages, e.g. Self and Haskell, have shown that compilation technology can identify where static dispatch is possible, without any programmer intervention.

Thomas Minka

Last modified: Thu Dec 19 16:59:47 EST 1996