001113; Armstrong Consulting: Preliminary Meeting with Ken Holman

Armstrong Consulting
1200 Dale Avenue #100
Mountain View, CA 94040

Date: Mon, 13 Nov 2000 14:05:59 -0800

From: Eric Armstrong
eric.armstrong@eng.sun.com>
Reply-To: unrev-II@egroups.com

To: unrev2
unrev-II@egroups.com

Subject: Preliminary Meeting with Ken Holman

Several of us had a chance to meet with Ken Holman over the weekend. He was brought to the party by John Deneen, and he was quite happy to meet Doug. He very much wants to make whatever contribution he can, which pretty much makes him "one of the team".

Ken is very knowledgable about XML and related disciplines. And he is, or has been, very active at OASIS (Organization for the Advancement of Structured Information Standards). He is looking forward to helping us define an interchange standard, and shepard it through the various committees, and so on.

He also has a remarkable flair for design. He picked up a rough sense of what we were about in fairly short order, and began making insightful observations based on his past design experience.

Here are some of the technical points he developed during the meeting...

XML Basics

XPATH is a basic structure-identification mechanism

XPOINTER uses that representation mechanism, and builds on it to add concepts like a structure-range (from struct X to struct Y)

XSL/XSLT also uses XPATH as part of its representation mechanisms

XSLT is a translation mechanism that can generate XML, which can then be parsed.

XSL is the format-presentation layer. It defines a ton of constructs that can be used to specify how material prints, or is displayed.

RELAX is a very nice schema definition mechanism that defines a theory-based representation mechanism that lets you construct DTD *diffs* and DTD *unions*. Unions let you modularize DTDs, and ensure that a document conforms to the result of combining them.)

SCHEMATRON is an assertion-based validation mechanism. Using that mechanisms makes it possible to validate assertions like "mixed content containing text and inline elements occurs only before substructure elements, never between or after".

[For me, this one was worth the price of admission. It totally solves the XML limitation described in my paper on XML Editor Design.]

Design Principles

Most application designs define an application-specific language, and parse that. They tend to consider XSLT as an afterthought. To make use of it, a different representatiion is parsed, written out as XML, and then reparsed into the app.
But XSLT can quite easily produce SAX or DOM output *directly*. So the kind of design Ken recommends, uses XSL and a style sheet to process any particular XML data. The result becomes SAX events or a DOM in the app, so that part of the app doesn't change. But now you can process any other variant of the XML that encodes the information, simply by creating a new stylesheet, without a big peformance hit -- the result is roughly equivalent to having defined that language (or any other variant) as the "reference langauge" for the application.
Ken declared emphatically that DEFINING THE XML EARLY ON IS INAPPROPRIATE. He's seen the mistake made dozens of times, and counsels his clients against it. His take on the matter is that XML IS AN INTERCHANGE STANDARD and that the core of the application is the services it provides. Therefore, the only sequence that works in the real world is to define those services, and *then* come up with an XML form for the data that needs to be interchanged.

OHS Design

In terms of the OHS, Ken's approach had some remarkable implications for the design. Rather than attempting to define a DTD for a "normal form" OHS document, Ken suggests focusing on the services, and building (or at least desiging) those services. So for example, we need granular addressibility. And we want it to apply to legacy documents. Ok, then, the system requires mechanisms for adding addresses to a legacy document! The orginal document continues its existence, unchanged. The OHS contains a pointer to it, along with a collection of addresses that point into it. The "HyperDocument" you view in the "HyperScope" is then the product of those addresses applied to that document.
Note that we have *not* defined a DTD for a HyperDocument. We have defined functionality. Now, when it comes to interchange data, how does that happen? Well, what do you need to send? You need to send a pointer to the original document, at a minimum -- or possibly the document itself if it is inaccessible. And you need to send the additional information (like the addresses) that are necessary to carry out HyperScope functions!
Ken's point here, is that XML definition is dictated by functional needs -- by what you need to transmit to provide the desired services, and the resulting XML definition is far removed from any sort of "HyperDocument definition" we may construct at the outset.
[Note: From personal experience, I concur wholeheartedly. The orginal stab I took at XML syntax for such a document looks nothing like the node library I am currently constructing. More instructively, none of the last 4 versions of that library look very much like any of the others.]

Topic Maps

Ken also talked about topic maps for a bit. (Although I have yet to "get" them, Ken was very big on them, and mentioned Jack Park's advocacy several times in this context.)

What I gleaned from our short forays into the subject was:

Topic maps provide a way of defining the semantic content of a structure or, perhaps more accurately, it is a way of specifying the syntax that is used to represent different semantic constructs. (I believe that is accurate, although I didn't quite get how it works. More info: More info: http://www.topicmaps.org
Ken suspects we want to use topic maps to define the OHS interchange mechansims. (Again, I don't see how that works, exactly, but I suspect that he and Jack will be able to arrive at a meeting of the minds.)
My one little "aha" on the subject is that if XSLT + a stylesheet can be used as the input to an application, then if the input is defined using a topic map, then anyone can use any syntax they want to encode the data -- the syntax will be transformed by XSLT for use by the application anyway, and that translation will be governed by the topic map. (I think that is somewhere within a Silicon-valley commute of being correct, but...)

Sincerely,

Eric Armstrong
eric.armstrong@eng.sun.com>

Copy to:

extende-dev@lists.sourceforge.net

From:	Eric Armstrong
	eric.armstrong@eng.sun.com> Reply-To: unrev-II@egroups.com