First course paper for Spring 1996 Autonomous Agents course by Pattie Maes.
This paper is online as http://www.media.mit.edu/~ullmer/courses/agents/paper1.html ; if you're seeing this on traditional paper, you're probably missing hyperlinks
The paper assignment is described here; follow-up Agents papers are here.
The paper was submitted to class on March 1, 1996 (tweaks have been made since).

Behavioral Realizations of
Proxy-Distributed Computation

BY BRYGG ULLMER

Introduction

The MIT Media Lab's recent formation of the Things That Think (TTT) consortium has spotlighted research investigating commonplace physical objects which may be vested with interesting computational behavior without loosing their native physical attributes and affordances. In some cases this will be achieved by embedding microprocessors, sensors, displays, wired or wireless network connectivity, and other active electronics inside of physical objects. In other cases, however, devices with potentially interesting computational augmentations may remain entirely silicon-free. For the latter case, otherwise passive objects may be detected by the environment, with interesting behaviors proxied on their behalf by external devices through varying styles of technological mediation.

I am very interested in a range of both computationally passive and active objects which might be vested with interesting computational augmentation, both from the standpoint of specific applications, the design and user interface challenges these applications present, as well as the underlying computational architectures which make these capabilities possible. In my view, one of the central technologies for realizing these tight couplings of physicality and virtuality involves what I call proxy-distributed -- or more compactly, proxdist -- computation.

Proxdist computation is a particular class of agent-based computing where a proxy-agent is bound as a tightly-coupled online agency to a physical or virtual entity operating in the proximity of the Internet. To give a grounding example, envision a kitchen containing many simple silicon-vested appliances -- microwaves, blenders, lights, and the like. In addition, we can logically expect our kitchen to contain many silicon-free objects such as recipe books, spices, and food which may or may not be tagged; as well as perhaps a personal computer or other objects with a high degree of computational capacity and network bandwidth. It is plausible to imagine using speech to control the microwave -- "Microwave -- power high, cook 30 seconds" -- with the microwave responding in turn, "Microwave activated at power level 10 for 30 seconds." However, it is equally plausible to imagine speech as an interface to any other appliance in the kitchen -- or for that matter, to the "Joy of Cooking" recipe book (online through Viacom's Information SuperLibrary), which should read the operable recipe cross-indexed with the refrigerator's contents and our grandmother's "special ingredient" because our hands and eyes are busy with other tasks.

The kitchen example clearly illustrates a case where it is less profitable to embed speech recognition, synthesis, local compute, and IPng networking support into the microwave, measuring cup, cookbook, and brocolli, but rather where we wish to associate certain virtual agencies with physical objects through proxied computation, networking, sensing, and displays hosted within the immediate physical surround, the user's body-area network, or other technological capabilities vested in the environment. Hiroshi Ishii and I are exploring these and other visible, audible, and haptic mediations of physical objects in the Tangible Media Group (TMG), while I am exploring the visual design issues surrounding these virtual and physical spatializations of distributed information, computation, and presence in the Visible Language Workshop (VLW) with Ron MacNeil.

While I have considered the philosophy of proxdist computation for several years (described here among other places), little has been functionally implemented outside of the earliest foundations of Tangible Media's Active Desk and Active Board prototypes. In this paper, I will describe a plan for behavior-driven computation supporting proxy-agents which are functionally bound to physical objects, proxdist facilities which might allow passive objects to dance and virtual objects to cast shadows. Later in the semester, I hope to implement concrete realizations of these techniques.

Foundations

My interest rests on several conceptual foundations. First and foremost is the core protocol architecture of the Internet -- not at the level of the Web, MUD's, and traditional notions of net agents, but at the foundational level of gateways and routers, SNMP, BOOTP, ARP, DNS, and the TCP/IP protocol stack. It can be argued that this foundation of the Internet is a complex and relatively highly evolved behavioral ecology arising from multilayered negotiations betweens tens, if not hundreds of millions of autonomous agents. Every individual node on the Net has a complex repertoire of agencies for sending and receiving messages to a nameservice host, peers on its LAN, one or more gateway machines, and even amongst its own internal protocol agencies. These agencies are resilient across variable connectivities -- connectivities of widely variable latency, bandwidth, transport pathway, and reliability. The Internet's DARPA origins as a robust, adaptive network explicitly designed to survive nuclear attack is well known, and many of the growing pains of the networking in the 70's, 80's, and the present -- where the software failure of a single router disabled large portions of the net, or where simple protocol virii have swamped thousands of Internet hosts -- can be appraised in behavioral terms of complex interactions between simple autonomous agents.

The above description is provided for several reasons. In part, I hope to directly apply the functionality of the BOOTP, ARP, DNS, SNMP, and perhaps other Net protocols into physical spaces where virtual agencies are conducted on behalf of physical objects with widely-varying computational and network capabilities. Here in particular, we aspire to emulate SNMP more closely than Cyc in our system's interaction with the world. Equally important, the Internet provides the framework for the resolution of distributed namespaces where "True Names" take on special meaning. In the proximity of the Internet, a net-resolveable name -- at present most often URL's, but more interestingly URI's including URN's, URC's, etc. -- can be seamlessly resolved to near-arbitrary levels of computational capacity, online storage, etc. Thus, in a "smart room" style of environment where the environment proxies computation on behalf of its physical contents, a silicon-free book, slide, or business card bearing a network-resolvable proxy-name recognizable by the environment can be perceived as potentially "hosting" more online compute and storage than a spatially-adjoining high-end workstation.

Be this as it may, it is unclear that some one True Name (corresponding to some online manifestation of identity) can in fact be resolved for most objects in real-world physical environments, whether these objects resemble workstations, microwaves, people, books, or brocolli. Further, such an attempt would be consistent with many elements of traditional AI and robotics efforts, raising expectations of all their shortcomings. In response to this concern, we attempt to create a system where the environment, upon detecting the presence of what is perceived as a new object, hosts a competition of would-be proxies "competing for embodiment" in the physical object. This competition is mediated by the environment, as are the behaviors expressed by the set of coexisting "dominating co-identities." (We'll see in practice how simply this can in fact be realized, or whether this style of induced schizophrenia is indeed functional.)

(behavior-mediator... but also identity-mediator... like my schizophrenic Attila model for Rodney Brooks' insect-bots.)

Before moving more directly into the assignment, I'd also add that my representation of the physical world is likely to be tightly coupled to my representation of online 3D geometries. In particular, I'm working on extending 3wish, my Tcl / itcl-based Inventor scripting language which has been operable for several months, into a proxy-identity meta-language for hosting proxdist computation on behalf of both virtual and physical spaces. More on this later.

Environment and Goals

In the general case, proxdist computation aspires to support both virtual and physical entities. Virtual entities may be locally hosted, remotely linked at variable latency/bandwidth, or offline altogether. Similarly, physical entities may be highly networked- and compute-endowed, simple powered devices containing only digital serial numbers linked by AC-housewiring X10-like protocols, simple sparsely-networked devices like wristwatches and smart cards, as well as large repertoires of completely passive objects which may or may not be tagged.

For the sake of focus in this paper, I will focus on proxdist augmentation of physical objects which are either passive and untagged, passive and tagged, or functionally passive but actively tagged. For example, a conventional pen or banana are examples of passive, untagged objects; a conventional paperback book or box of cereal are examples of passive, tagged objects; and a book or couch tagged with visible or IR LED's for simple vision tracking are functionally passive but actively tagged. Hopefully, forthcoming systems from Physics and Media which, for instance, use passive silicon-free choke/coil patterns which may be configured to yield a uniquifying absorption pattern from an actively-interrogating ambient electromagnetic field may further blur the line between passively and actively tagged objects.

The environment our agents operate in (and to a certain extent, virtually embody) consist of logical encapsulations of space, possibly hierarchically or diffusely nested, which contain repertoires of sensors, displays, objects, and physical locii of computation, some of which are networked in manners gatewayed to the Internet. Examples of such spatial encapsulations of object-space include Tangible Media's Active Desk and Active Board prototypes, as well as forthcoming Active Room, Active Wall, Active Body, and other systems.

The goal of the agent(s) in this space is to mediate user interactions with the virtual identities of physical objects which have no internal computational capabilities. This happens in several fashions closely tied to the environment's sensor and display repertoires.

Sensor/Display Repertoires

Our proxy-agents virtually inhabit the bodies of passive physical hosts. Proxy-agents sense and react to environmental conditions through sensor and display repertoires embedded within the environment. Stimuli and displays can exist in the virtual or physical realms, and within the physical realm can interact with stimuli both in and outside of the human range of sense.

In physical space within the human senses, we find the core human senses of vision, hearing, touch, smell, and taste, as well as borderline cases like human response to heat and electric charge. As feedback to the system, humans can in turn respond with touch, voice, action at a distance, or bodily manifestation of attention.

In physical spaces outside the human senses, we find both sensing and display capabilities involving electric and magnetic fields, IR, UV, RF, microwave, and other radiative phenomena; ultrasonic and subsonic sound; electrical, chemical, and micro/nano-scale physical contact; and other possible physically sensed and sourced phenomena, including various signal modulations of these carriers.

In virtual space, we can imagine our objects sensing and responding to any form of information, computation, or presence communicable over the Internet, whether spatially situated or entirely disembodied, highly distributed or strongly localized. While "net agents" are often thought of in abstract "pure information" terms, we can also imagine virtual sensation and response integrated in as exquisitely physical a device as Rodney Brooks' Attila and other insect-bots, where in addition to sensing light, sound, and physical contact, sensors and actuators interacting with temporal, entropic, and quidnunctive fields are equally plausible. While the physical world embodies the domain of sight, sound, and feel, the virtual world remains the realm of cognition, contemplation, and desire, whether grandly planned or reactively behavioral.

Again, for the sake of focus we narrow our exploration of this space. Among the human senses we are concerned with first sight, and secondarily sound and haptics; and among the invisible forms of sensing, we first employ electromagnetic fields, mechanical contact, and IR radiation. Among our sensing repertoires are simple vision techniques, barcoding, magnetic-field trackers, IR devices, and mechanical sensors; and among our displays are visible displays which may provide both electronic shadows as well as semantic lensing for physical objects, and secondarily voice, audio and force-feedback haptic actuators.

Stimuli, Responses, and Feedback

Our proposal consists of passive objects mediated by an active environment to both virtually sense and respond to human interactions. Physical objects are virtually enveloped in our space in a multi-stage process. The first stage resolves that a new object or "distinction" has in fact entered the mediated spatial encapsulation. At this point, a generic object proxy-encapsulation is tentatively bound to the sensed distinction -- not so much a True Name as a deictic reference partly in the style of Agre/Chapman.

Associated with this generic proxy are bindings for a series of more specific proxies competing for embodiment. These proxies attempt to achieve several sub-tasks. First is the maintenance of object-constancy within the environment. This is in part aided by the Conservation of Impetus -- the heuristic that "if an entity moves, the motion must either be internally instigated (implying embodied biologicals or silicon) or externally actuated (implying effecting biologicals or silicon)." Given that all physical entities within our proposed environment beside the human user are functionally passive, and that the user at any instant is limited in ability to affect other objects' physical state, this is quite useful.

Second is the evolving mediation of object identity. Within our proxy-framework, we define a series of hierarchically-nested types defining successively-refined object-classes, all deriving from a common-denominator passive object node partially reminescent of Open Inventor's SoNode object hierarchy. Each type-proxy is a structure defining the object's repertoire of sensors, displays, and accumulated state. The sensor repertoires form a generalized multimodal standing query of the style "here are the signs/names/behaviors by which my presence/identity/activity can be recognized." At any instant, it is expected (even engineered) that multiple of these possible type-proxy identities partially matches the constraints posed by various sensors. Thus, a primary function at this third level is fostering and mediating between a set of coexisting but competive "dominating co-identities" for each object.

Thirdly is the attempted sensing of human interaction with the proxied-object, and mediated manifestation of object-behavior and state. It is here that the virtual proxy-agencies for each object attempt to manifest behaviors through the display-agencies of the environment which manifest prior state and respond to a users ongoing object interactions.

Each of these nested tiers of agency -- presence, constancy, identity, and behavior -- is layered within an evolving reactive suite in a constant feedback loop with the changing state of the environment. Again, the model of the TCP/IP protocol stack is invoked, with BOOTP and DNS (domain-name service) carrying forward fairly closely, ARP (address-resolution protocol) mapping to IMP (identity-mediation protocol), SNMP (simple network managment protocol) to SOMP (simple object management protocol), etc.

At the present, I'm wrapping a first pass at presence/constancy/identity/behavior mediation into itcl proxy objects within 3wish -- February 28, 1996 saw the integration of barcode name-scanning into the WinNT version of 3wish, and vision, Flock of Birds, Lego Dacta sensors, and Softboard pen/eraser input repertoires are also being integrated into an expressive framework which already supports relatively sophisticated 3D object/behavior display. I'm unsure how quickly the integration of layered dynamically-resolving protocols will work out, but -- so far, so good!

Brygg / ullmer@media.mit.edu