Making the Internet Alive

A position paper for the Modeling Dynamic/Emergent Distributed Object Systems Workshop at OOPSLA 98.

By Nelson Minar
nelson@media.mit.edu
October 9, 1998.

Introduction

The Internet is a vastly distributed complex system. It exhibits emergent phenomena at all scales, from the lowest level of bits moving around wires to the highest level of distributed application. Because of its complexity the network is quite literally out of control: no human being or organization can dictate or even really understand the most important and interesting aspects of the Internet's operation.

The complexity of the Internet should be embraced, not feared. The same reasons that the Internet is so hard to control are the reasons it is so successful on so many levels. At the plumbing level, the Internet is a cheaper way to route bits around the world: decentralized packet switching turns out to be a very effective, scalable way to build large networks. And at the cultural level, the simplicity of setting up a new Web site has made the Internet a medium that's changing the nature of our world's discourse, its economy, and its cultures.

The key reason the network is successful is that it is a profoundly decentralized system. Not simply distributed, but decentralized, without any central architect, coordinator, or authority. Decentralization is what allows the Internet to scale effectively, what allows new applications to be built, what allows for surprising and new things to be done with the network. The down side is that decentralization also contributes to the out-of-control aspects of the Internet. Without centralized control, it is very difficult to manage things like quality of service guarantees or the reliability of WWW information sites.

It is impossible to completely control the Internet without wrecking the fundamental principles that make it so effective. As distributed systems designers, we must stop trying to create centralized control and predictability for the Internet systems we build. Instead, we must build systems that are self-organizing, self-repairing, robust and autonomous. The Internet must become alive.

Artificial Life: in Simulation and in the Wild

As a matter of course, many intellectual disciplines deal with complex systems they cannot control. Anthropologists, for example, study the messiness of human culture without being able to simplify it into a simple system. Economics, too, is reinventing itself in the mold of complex systems research, tossing out simplifying assumptions like rational actors and perfect knowledge to understand what really happens in real world economies.

Biology, too, has to deal with the real-world messiness of the strange things that evolution has created. Biological organisms are not neat, simple, cleanly engineered bits of machinery --- living things are full of complicated details, arcane metabolic pathways, redundant and seemingly useless systems. And natural ecosystems are not neatly organized groupings of identical creatures, they are complex systems of subtle and odd interactions. The miracle of all this is that it works: living creatures really do live, ecosystems really do sustain, all without any top-down engineering or central control. We would be lucky to have the Internet function the same way; distributed systems designers have a lot to learn from the natural world.

Artificial life studies the ideas and interdisciplinary problems at the intersection between biology and computer science. Alife research can be roughly divided into two categories: using computers (in particular, simulation) to help understand biology, and using biological ideas to help create and understand computer systems. The methodology behind artificial life simulation can help distributed systems people understand the systems they built, and ideas from biological systems can be taken to help build better distributed systems.

The rest of this position paper presents two projects I have been closely involved in: Swarm, a toolkit for simulating natural systems, and Straum, an architecture for distributed systems that is my best attempt to date at building living computer systems.. These two projects have greatly informed my understanding of complex systems in general, and distributed computer systems in specific.

One note, I make frequent use of the word "agent" in the rest of this paper. This word often makes people uncomfortable, since it is a buzzword that means many different things. For myself, an "agent" is technically nothing more than an object with its own thread of execution. Other aspects of the agent metaphor in common usage - autonomy, self-description, mobility, "intelligence" - will be brought up and discussed as appropriate.

Swarm, an Agent Based Simulation Toolkit

The Swarm project was started at the Santa Fe Institute by Chris Langton as a way to improve the art of simulation among complex systems researchers. Swarm has two major goals: to produce a software tool for building complex systems models, and to improve the theory of simulation by developing paradigms for simulating complex systems. The results have been largely positive: the software is freely available on the Swarm home page [1] and used by many research groups. Several papers are available that describe the Swarm approach to modeling complex systems [2].

Swarm is a domain neutral simulation tool - economics, ecosystems, anthropology, physics, and computer systems can all be simulated with the same software toolkit. Swarm embodies the principle that there are properties shared by all complex systems, properties that can be abstracted and instantiated in a generic tool. This principle is expressed in the key architectural decision of Swarm, creating models with agent based, discrete event simulation engine.

Swarm's model of the world is agent-based. Each actor in a natural system is simulated directly with a software agent. For example, when simulating the population dynamics of a prairie full of coyotes and rabbits, the simulation is built out of a bunch of individual agents, each one representing a rabbit or a coyote. This represents a more direct approach to modeling than traditional population dynamics simulation, where populations are modeled by coupled differential equations representing the total number of rabbits and coyotes. In Swarm, the individual is simulated rather than the group. Any group phenomena, such as the variance in the total population, are allowed to emerge from the interactions specified by the individual agent rules.

Swarm's model of time is a partially ordered set of discrete events. Briefly, Swarm models an agent's behaviors as a series of small behaviors that execute in a partial ordering. For example, a rabbit's life might be summed up as a loop over three behaviors: "eat", "mate", "sleep". A coyote's life can be summed up as "find rabbit", "eat rabbit", and "sleep". The simulation as a whole is then defined as the union of these two schedules applied over all of the agents. Swarm leaves specifics of managing ordering of operations up to the simulation designer. Some events might not be well ordered, in which case the Swarm kernel understands that to mean to execute these events concurrently.

What can Swarm teach us, distributed systems designers? One possibility is that Swarm can itself be a tool for modeling the distributed computer systems we build. Just as we can model populations with agents, interacting concurrent processes can be simulated to study their emergent behavior. I am not aware of any work to date using Swarm in this domain, but I believe it could be a useful tool for modeling the broad-scale dynamics of a distributed system. A particular question will be the appropriateness of discrete event simulation for modeling concurrent programs - discrete events might be a natural match for synchronization points.

A more provocative lesson from Swarm is the power of using agent-based systems to actually build computer systems. The only difference between a simulation and a real system is the intention of the system designer: is this system intended to be used online, or simply as a way of understanding some other phenomenon? Some artificial life systems, such as NetTierra [5], blur the distinction between simulation and reality in interesting ways.

Straum, an Ecology of Distributed Agents

Following working on Swarm, I left the Santa Fe Institute to go to graduate school and apply the ideas I'd learned to building computer systems. My main goal as a Ph.D. student is to make the Internet live, to build artificial life on the network. I've applied this idea at the network plumbing level by using mobile agents to do network routing [6], and at the application level by creating Straum, an environment for mobile agents to interact and share information [3].

Straum, my master's work, is a paradigm for distributed computation. Straum embodies an ecology of distributed agents, a way of building distributed applications out of populations of interacting, autonomous, mobile agents. The core idea is that individual components in the Straum system - the agents themselves - should be relatively simple to build and understand. Complex things, distributed applications, are built out of the interactions of these simple components. By building from the bottom up, hopefully the system as a whole is more robust and scalable.

The main method of my research is to build working software systems, to apply ideas of ecosystems to building distributed systems and test them out on real applications. My thesis describes using Straum to convey people's presence on the Internet and to monitor groups of web servers. My most recent work is with the Hive project, creating a toolkit to integrate networks of small physical objects (in the vein of ubiquitous computing or Things That Think) [4].

An ecology of agents is a simple way to build distributed applications. Each computer in the network has a server on it, a persistent process providing a home for agents to live on. Individual agents live on the servers. Agents are able to access the local information resources on their own server, and to talk to other agents on the network to share information. The analogy is to an island ecosystem. Each server is an island, a relatively isolated place. Each agent is like an organism, living on an individual island, consuming the local resources, and communicating with other organisms. The interactions of the agents produce a ecosystem.

In Straum, applications are built out of interacting agents. For example, a home security system could be built out of a motion detector agent (talking to some motion detector hardware on a specific server), a camera agent, an inference agent to decide if your house really is being broken into, and an alarm agent to notify police. Or a web server monitoring application would be composed of agents to watch for web traffic, watch for server overloading, errors, etc., all reporting to a visualization system so a webmaster could keep up with the status of his or her network.

In practical terms, Straum is a Java program written with the Voyager distributed object library. The main part of Straum is a server process: it runs agents and listens on a well known port for agent requests. Agents live in the server as independent threads, maintaining access to their own resources and communicating via distributed objects hooks with other agents on other servers. The system is entirely peer to peer and decentralized: agents are free to talk to agents on any server and negotiate their own relationships, whether client/server or more complex.

Straum is a deliberately non-transparent model of distributed computation. Individual physical resources, such as access to a computer display or a bit of special purpose hardware (like a motion detector) are deliberately not distributed. An agent must be local to a device in order to use it. This restriction simplifies the description of the system - it is always apparent who has access to your resources simply by looking at the local agent population. However, agents themselves are free to talk over the network: distributed applications can be built out of the interactions between agents. This means that ultimately local resources could be used remotely, but that distribution is explicitly mediated by software agents. The software agent metaphor should provide the necessary tools by which to organize and locally control resource access.

Individual agents in Straum are autonomous - each agent has its own thread of control, its own purpose that it is responsible for. This choice simplifies the design of the system: to the extent that individual tasks can be identified, they can be given to individual agents to carry out. Agents are then responsible for maintaining their own local consistency.

Agents are also self-describing: each agent can be inspected by other agents to determine what it does. Agents are then free to organize and coordinate their own interactions. Descriptions of agents are in terms of two independent ontologies of agent capabilities: syntactic and semantic. The syntactic ontology is simply the Java type system. For example, an agent might publish that it implements the "EventSending" interface. Agents can then query for a list of EventSending agents to determine who they are syntactically capable of communicating with. Agents also describe themselves in terms of a semantic ontology. This ontology is much less specified, reflecting my belief that there is no general solution to semantic ontology problems. Agents simply publish a list of strings, such as "motion-detecting"; it is up to the system designers to come to consensus as to what these strings mean. KQML or XML should be relevant tools to help coordinate the semantic ontology.

Straum agents are mobile: they can move themselves to new computers in order to access needed resources directly. Mobility so far has been less important than hoped for the application scenarios that have been implemented. But I am committed to mobility as an important part of a flexible, open distributed system architecture, and as Straum gets applied to larger problem domains mobility will become a more important part of the system. In particular, mobile agents give great capabilities for flexibility - the individual servers can be implemented and frozen fairly early on, but the system as a whole can be dynamically upgraded by making the agents mobile.

The above description of Straum is necessarily brief: full details are in my master's thesis [3]. The core idea is quite simple: applications can be built out of the interactions of localized, autonomously executing software agents. This architecture is simple enough that individual pieces are easy to construct and understand. But it allows for complexity: these simple components can be composed, enabling complex distributed interactions out of the interactions of the simple parts.

Conclusions

The Internet is a complex. To really use its power, to create true distributed systems, we need to build software that embraces the complexity and emergent phenomena of distributed systems. Traditional engineering tools - centralized control, top-down design, striving for pure understandability - are doomed to failure. Instead, we must explore new paradigms for creating distributed systems.

Complex systems research is a good place to look for ideas on how to understand and manage networked applications. Methods for studying systems, such as agent-based simulation, may be useful for understanding distributed systems. And complex systems suggests a paradigm for creating Internet programs, by assembling populations of relatively simple individual components. As this method of creating systems is explored more and more, the Internet will become increasingly powerful, complex, alive.

References

Swarm home page
Nelson Minar, Roger Burkhart, Chris Langton, and Manor Askenazi. The Swarm Simulation System: A Toolkit for Building Multi-agent Simulations. Overview paper, 1996.
Nelson Minar. Designing an Ecology of Distributed Agents. MIT Master's Thesis, 1998.
Hive home page.
Tom S. Ray. A Proposal to Create a Network-wide Biodiversity Reserve for Digital Organisms. Technical report, ATR, 1995.
Nelson Minar, Kwindla Hultman Kramer, and Pattie Maes. Cooperating Mobile Agents for Mapping Networks. To appear in the Proceedings of the First Hungarian National Conference on Agent Based Computation, 1998.

Nelson Minar	Created: October 11, 1998
`<nelson@media.mit.edu>`	Updated: October 12, 1998