Distributed Aspect

Previous: Resource Management Up: Resource Management Next: User Primitives

Distributed Aspect

While a resource manager is capable of ``stand-alone'' operation, it is designed to operate in conjuction with identical resource managers running on other processing nodes in a distributed system. While all processing nodes in a locale are typically connected into a single system shared by all users (allowing maximum peak computational resources usage), support is provided for easily separating a subset of processing nodes into a separate system.

A means of identity verification, ensuring that machines entering a system (or separating a subset) are indeed trusted members of the system, will eventually be part of the initialization process. This is required to maintain the protection provided by the tag system.

Logical Organization

The distributed system is organized logically as a tree with multiple processing nodes at the top (level 0). At each level in the tree below the top, a node recognizes a primary node in the level above it, an alternate in the level above it to be used in case of primary failure, peer nodes (sharing the same primary node) at the same level, and nodes for which it is the primary. No one processing node in a system is crucial to the operation of the system.

Logical Hierarchy of Processors in a System

A system's logical organization is never fixed. Over time, it is altered in response to the addition or removal of nodes and the gathering of performance metrics. The ``optimum number'' of nodes in a level varies dynamically with the total number of active nodes in the system. Larger systems have a larger number of nodes at each level of the hierarchy, as well as more levels in the hierarchy. The rules used in generating and maintaining the logical organization are described in a separate technical report [Wat97].

Task/Stream Scheduling

When the evaluation of a graph of one or more task tokens and streams is requested, the evaluation is performed locally (if possible). As opportunities for data and control parallelism become apparent, the tasks will be assigned to additional remote processing nodes. This assignment of tasks to processing resources is not done in a single step. Rather, the assignment is done hierachically. When a task is to be applied to a stream, and the stream and task parameters indicate that it may be ``split'' to provide parallelism, the task is subdivided and dispatched to processors in the next level of the hierarchy. Upon evaluation, these processors may further subdivide the task and dispatch it to even lower levels.

This has two effects:

The computational and network I/O load of performing the task/stream splitting and assignment is distributed.
Each processing node only needs to concern itself with maintaining aggregate resource availability information for its peer and slave nodes.

In performing the initial task subdivision and assignment at the top level of the logical organization, the remote computational resources are modeled as aggregates. When a finer subdivision is performed in the next level it utilizes a more local (and hopefully more accurate) model of available resources. This attempt to ameliorate the effects of errors in the model of remote computational resources.

Once the ``scheduling'', or mapping onto currently available processor resources is performed, the dataflow graph is executed relatively independently of the originating node. The ``serial'' portions of the application will be scheduled for execution on the primary node, but may migrate to alternates (which become primaries and name new alternates) if needed (e.g. due to overloading or failure of the originating node.)

In a similar manner, the data storage for streams and other distributed data objects is allocated and searched in a hierarchical manner. When a particular context of a stream is referenced by the parameters of a task, and the stream isn't known to the processor node, it queries other nodes in the system for information about the stream. Cues about which remote nodes might have a particular context of the stream is part of the stream data structure replicated on each node in a system accessing that stream.

Previous: Resource Management Up: Resource Management Next: User Primitives

magiceight-web@media.mit.edu