Colloquium at Stanford
The Unfinished Revolution


Date: Sat, 12 Feb 2000 18:32:38 -0800

From:   Eric Armstrong


Subject:   [unrev-II] Document as "View" ==> True KR

Driving home from the colloquium the other night, I began thinking about how an integrated document/email system would be used. In particular, it occurred to me that perhaps my thinking had been too limited by my experience with what we traditionally think of as a "document".

Taking the software design process as a familiar example, let's assume that we are using the document/email system proposed earlier. We start with a functional specification, and carry on lots of email interactions about aspects of that document. We go through several iterations, and produced a completed specification that we are happy with. So far, so good. But what happens next?

The next step in the development process is to construct a design document. But now, we're starting from scratch! We can still make use of the information adduced during the functional specification process, but we have to laboriously find all the information nodes that are applicable, and either integrate them into the new document or make links. True, the process is easier than it would have been in the past, but we are still doing a ton of manual labor that should be avoidable.

What we would like, instead, is to capture design notes for each part of the specification, as we think of them. When we look at the "functional specification", we wouldn't see them. But authoring the design document can start with a request to the system that says, in effect: "Give me all the design notes that correspond to this version of the specification".

Implication #1: Document as "Labeled View"
At this point, it occurred to me that maybe a "document" isn't (or shouldn't be) an isolated entity, but rather a "labeled view" on a nexus of information nodes.

In this formulation, the collection of email messages and document segments forms a nexus of notes, thoughts, and ideas. Each node exists as independent unit. The "hierarchy" is created by organizing them into a document.

Perhaps that occurs by "tagging" the nodes to identify them as belonging to, for example, "Functional Specification version 2.0", or "Design Document, version 0.0". The label on the nodes might be what constitutes a "document".

Or perhaps the label is a mechanism that retrieves information for parts of a document. Once retrieved they might then be organized into the design document. Perhaps they would be used as originally written, or perhaps referenced (linked) in a new formulation. However it works, it would create a natural starting point for the new document that brought to light all of the thinking that was originally directed towards it.

Implication #2: Database, not Flat Files

Before I get to the serious implications for human thinking, I'm going to pause long enough to make a low level observation: When we start talking about storing every information node as an individual unit so it can be tagged and used in multiple ways, the idea of using a plain text filing system rapidly loses its appeal!

For such a system, a database clearly seems like the right approach. XML then becomes the email/interchange mechanism, by which we communicate ideas to one another. It should also be possible to export the contents of the database as XML, for backup, archiving, filtering, and transfer to new and better databases that come along.

Ok. That's done. Now for the important stuff...

Implication #3: Augmented Memory
The greatest initial appeal for such a system is that it _dovetails nicely with the way we think_. When we are thinking about how the system functions, design ideas pop up. Code fragments may pop into our head, as well.

A system of this kind makes it possible to capture those ideas as they occur, in such a way that they do not interfere with the flow of the original document -- the functional specification.

Most importantly, if the system operates as it should, none of those ideas are lost! They all come surging back when the designing starts, so that none is overlooked.

At this point, the system has begun to carry out important functions of a knowledge repository, by

  1. ensuring that ideas and information are not lost, and
  2. making them available when they are needed

Implication #4: Augmented Reasoning

The system has also made an important step towards augmented reasoning, however. What the system is allowing us to do is to record and track *implications*.

Thinking of implications brings up the topic of logic. All you need for a "logic" system is 4 operations: implication, and, or, and not. When we record a design note, we are saying that functional requirement "A'" implies design consideration "B".

Note that a "document" is a view of the system that instantiates the "and" operation. (For this system, we need feature 1 and feature 2 and feature 3...)

IBIS/GIBIS-style investigations on the other hand, implement the "or" operation. (For this question, we have 3 alternatives...)

If the system can integrate domain-exploration (IBIS-style conversation) with domain-proposal (documents), it then begins to support the reasoning process.

Again, this mechanism mirrors the way we think! When we look at how the system will function, it is natural to explore design alternatives. We may reject a function because the design is impossible! And that is a fair thing to do. Even more important, however, is recording the thinking that went into the decision so that it can be easily reversed at a later stage, when new technology is discovered.

Negation is the last important part of the logical process. In a design effort, though, simple negation (proposition "A" is not true) is the least important kind. The kind of negation that occurs a lot in design discussion is "compound" negation, like this:

   (F1 & F2)  ==>  !(D1 & D2) ==>  !D1 + !D2  =?=>  !F1 + !F2

Here, a feature (F1) implies a design idea (D1). The statement above says that the required features imply incompatible designs. (Either D1 or D2 can't be.) That means one of the design ideas has to be dropped. In consequence, unless some compatible design can be found, one of the features must be dropped.

This is fairly complex reasoning, but it is typical of the design process. As someone said earlier, getting the specification "right" is the key to a successful project, but note that getting it "right" involves deep reasoning about the implementation feasibility of the system, as well as its user-oriented goals.

For example, previously I had argued that the system's documents should be stored in plain text files, because of the advantages that attend that strategy. However, keeping thousands of information nodes as individual files is clearly onerous, so that design idea is contradicted by the these functional requirements.

When we record implications, then, we need the ability to explicitly specify "and" and "or" relationships. We need to be able to say that F1 ==> (D1 & D2 & D3) as well as the ability to capture alternatives: (D1 & D2 & D3) + (D4 & D5).

In other words, my document (and this message) should not exist as a single indivisible whole, but rather as a concatenation ("and") of individual suggestions which can be organized, rearranged, responded to, and have implications drawn, one at time.

Capturing implications and making it easy to find them assists the human reasoning process. A system that does that will make it easier for us to carry on intelligent design conversations and discover inherent contradictions earlier in the process.

[Note: This analysis begs the intriguing question: What is the best way integrate, display, and use both "and" and "or" lists in a hierarchy?]

Implication #5: Automated Reasoning?
At some point, it may be possible to begin automating the reasoning process -- or least parts of it. The dream for such a system would be the ability to ask "how can we do this" and get back a collection of design ideas captured from previous designs. (The "design patterns" concept falls into place, here.)

The system might present the patterns, with the pros and cons for each, with the most typical pattern first. That begins to turn the design process into more of a "review and select" operation.

In addition, asking the system to "enumerate all the design ideas that correspond to this version of the functional specification" is in essence asking it to complete the implication statement:

   (F1 & F9 & F12 &...) ==> ??

In a sense, then, the system has made a first step towards automated reasoning. Harder operations like identifying contradictions and synthesizing a unified view will probably remain human operations for a very long time, but they might one day be automatable, as well.

Implication #6: Augmented Synthesis
All that has been said so far applies to deductive logic. To be complete, the system should also augment the task of inductive thinking, or "synthesis".

In essence, "synthesis" consists of building on other ideas: combining them, abstracting them in some way, or incorporating them into a larger whole. The augmentation for that process lies in the ability to:

  1. Enumerate related nodes (e.g. "design ideas")

  2. Create new nodes that either link to the originals or create new segments that incorporate them

For example, when two classes are implied by design, you may derive a superclass and add that to the design. The system path would look something like this:

  F1 ==> D1 ==> C1  \
  F2 ==> D2 ==> C2  /

Here, each feature led to a design that implied a particular class. The synthesis into a superclass (S1) was augmented by the availability of the class definitions and the ability to reuse that information.

At some point, too, F2 might be removed from the list of requirements. Ideally, the system would proactively flag the implications as invalid for that version of the system, making the superclass unnecessary.

At a minimum, though, it would be possible to retroactively answer the question, "Why does S1 exist?", by exploring the links to C1 and C2 and finding that

  1. One of them does not in fact exist, and

  2. It was planned for a while, but depended for its existence on a feature that was later dropped.

Implication #7: Standardized Tags

(Back to a more mundane topic to finish up this post.) To be able to pull out all the design notes for a system, they must all be tagged in some standard way. That's an area that needs work.

Perhaps specialized domains like software development could have a predefined set of tags for the major documents and processes that go on during software development.

As helpful as that is for processes we (believe) we understand, though, it's not much use for "wicked" problems, where understanding only emerges as we explore the problem. For such cases, perhaps a generalized "derivative" or "implication" tag is more appropriate.

One might then ask the system to enumerate all the derivative ideas, or implied ideas, in order to construct the next document. It might be, then, that design nodes are a "first order derivative" from specification nodes, while code is a "2nd order derivative". Mathematical people will probably like that. Most everyone else would probably prefer labeled tags.


Eric Armstrong