Dist-sys-97 Notes for October 3

Global Namespaces

The big insight for me this week was that global namespaces are easier if you have some natural hierarchy in them.

Big global namespaces require some sort of caching - there's just too much data out there to serve flat. The hard part is how to manage the cache infrastructure. DNS works in a classical hierarchic fashion: the NIC is responsible for root servers, sites like mit.edu are responsible for their own hosts, etc. It's fairly simple to resolve a name like pinotnoir.media.mit.edu - you just keep walking up the name until you find something in your cache, or you walk back down the name until you find the appropriate authority. Hierarchies are easy to understand.

The proposed URN spec doesn't admit much hierarchy. The syntax URN:<NID><NSS> has embedded within it a one level hierarchy - the authority who manages the NID ("Namespace identifier") space. But the NSS (namesspace specific string) has no structure at all to it within the URN spec, so it makes it very hard to see how to cache or lookup generic URNs beyond "uh, ask the NID authority". (Maybe the NIDs could themselves structure name spaces in a hierarchic fashion?)

The draft URN requirements document we read seemed to be trying to explain how to make URN resolution work without hierarchic authority, to use some decentralized resolution mechanism. (Why? The usual reasons - less brittle, less political issues, etc). I'm sympathetic to this goal, but it wasn't clear from the reading how to manage it at all. The structure of hints is clearly the key but none of us could figure out quite how it would work. (Maybe someone should read Edward Slottow's MIT MEng thesis, which is a sketch of a URN system. It's on the Information Mesh publications page.)

Pattie suggested using an ontology to manage breaking up a URN-like space. So we could use a URN like URN:Papers:Computers:Internet:Nelson:"Computational Media for Mobile Agents". This would be resolved much like a DNS name is resolved, but instead of having authorities for "mit.edu" and the like you have authorities for "Papers", "Papers:Computers", etc.

It's a nice proposal and I think bears out some research. There are all the usual ontology questions - how do you manage the ontology structure, how do you ensure sensible names, how do you handle aliasing (presumably documents will have multiple spots in the ontology), etc. You also have the political problems of hierarchies - who runs the "Papers" authority, who makes editorial decisions, etc. But it's nice to mix the semantic information about a document into its name - makes the names memorable, and (maybe) provides a natural way to break up the namespace of all documents in the world.

Other issues..

XML, etc

We also talked about XML and why it's so great. Short answer - you can embed semantic tags in documents. XML is technically in the same class as SGML, but it looks to me like most people's experience of it will be that it's like HTML only they can make up their own tags.

One major unanswered question about XML - how will meaning be assigned to various tags? If you have an XSL stylesheet, then the "meaning" of the tag will be its visual layout on screen. But what about other meanings, like database functionality? How semantics will play out remains to be seen.

A related question is how many different XML DTDs we expect there to be, whether every web site designer will want to do their own or not. I really hope there are only a few standard DTDs that most people will use. It's looking likely - DTDs are unpleasant to write and there are some good efforts to develop standard DTDs (chemistry, math, and of course HTML). It's a real shame that XML doesn't seem to include any way to nest DTDs so you can augment existing ones.

Also, a question I was wondering about - are ontologies by necessity hierarchic? A lot of the schemes for ontology I've seen are fundamentally hierarchic, but then I have in the back of my head that CYC isn't. Also you can have overlapping ontologies (in a KQML like setup), which effectively creates a graph with big trees embedded in it.

One last comment, an insight from Rob. The Internet now is largely a web of connected (named) documents. Some of us eventually want to turn the Internet into a mesh of interconnected objects. (I want to make those active objects, with bits of code flying all over the place.) XML is sort of a bridge between these two things - not really objects, but more structured than raw documents.


Nelson Minar <nelson@media.mit.edu>
Last modified: Tue Oct 14 17:47:25 EDT 1997