Garold L. Johnson
dynalt@dynalt.com



Date: Fri Dec 22 2000 - 10:38:01 PST


From:   Garold L. Johnson
dynalt@dynalt.com
Reply-To: unrev-II@yahoogroups.com

To:     UnRev II eGroup
unrev-II@yahoogroups.com

Subject:   Refactoring and information annealing


In an earlier email I pointed out:

Studies show that people resist formalisms in capturing information even if they use such conventions by observations.

http://www.csdl.tamu.edu/~shipman/,

http://www.csdl.tamu.edu/~shipman/formality-paper/harmful.html .

http://www.csdl.tamu.edu/~shipman/aiedam/aiedam.html

Given this, we need ways to capture information as it arises, and connect it in various ways later.

Refactoring is the term used in Object Oriented design to refer to the necessity for reorganizing class and other code structures as a project progresses and better abstractions are discovered.

Neil Larson uses the term "information annealing" to refer to a similar process in group hypertext systems, and believes that at least some of it must be done by a knowledge expert in order to be effective.

I like the information annealing concept. Annealing in materials refers to the process of heating a material until it is possible for molecules in the material to move slightly without distorting the material, and then allowing the material to cool slowly. This process removes local strains from the system as the molecules adjust their positions.

The short form of the arguments presented in the references is that we resist formalisms that categorize things before we are comfortable with the categories that are available:

I found it fascinating that people hesitate to follow formalisms such as IBIS even when study of their actual behavior indicates that the formalism is just what they do. I think this is related to the similar problem of word processing where we have to look at issues of structure and appearance before we are really certain of the content.

Extreme Programming uses continual refactoring as a major tool. If changes to the code in an area indicate the abstraction could be better and cleaner, ii is reorganized at once.

I used MaxThink, an outline based information tool for DOS, for years and use an outliner today. With MaxThink I found the following to be extremely useful:

Using this approach I could generate a hierarchical document that was very coherent, and do it very quickly. This applies formalisms largely after the fact. Once there was a framework, some things had logical places to go, and I applied the formalisms to some new content as the content was generated.

Once I moved the information into a word processor to get the nice formatting, I was dead. There was no good way to manipulate the content again in the word processor. If the organization go too messy I would export the word processing file back into MaxThink and work on the organization. This of course lost all the formatting, so that had to be done all over when I brought it back into the word processor.

This of course still didn't support linking at the time. Cloning of topics and other advanced operation were available but somewhat expensive.

I conclude that it is necessary to be able to refactor information at any stage. It would be nice if the base material were still available also as a check on the refactoring, but the refactoring is essential. Human knowledge, particularly in the hard sciences is subject to this sort of refactoring in a largely uncontrolled way. We no longer use the alchemists theory of the universe as part of our current knowledge base - it has been refactored out. We do, however, have some degree of access to the historical documents that reflect that refactoring process.

Hierarchical organization is generally superior to linear presentation. The topic context provides all sorts of support for things that don't have to be said. The extensive quoting that we do in emails is indicative of the lack of context that we have in an essentially linear system (threads are linear internally).

On of the problems with hierarchy is that is implicitly assumes that any piece of information has exactly one place to go. This is essentially saying that the universe of discourse can be represented as disjoint sets so that each element belongs in exactly one parent set. It doesn't take long working with this assumption before its invalidity is apparent.

The first response to this in a hierarchical system is the ability to clone a topic so that it can exist in multiple places in the hierarchy and still be a single topic - edit it anywhere and it changes everywhere.

This now assumes that a single hierarchy will handle the information organization. That notion is soon dispelled. It becomes clear with just a little observation and introspection that humans organize information using multiple hierarchies, at least. The hierarchy used depends on the context of the information and the attributes that are considered relevant in that context. This leads to the observation that hierarchy needs to be separate from the content - it should be possible to supply as many hierarchies, indices, or other similar organizing schemes as desired to the content without forcing assumption on the content. This is recursive in that hierarchies become content which can in turn be referenced and reorganized.

This leads us to bi-directional, typed, links which may have added attribute information about the link. A truly general linking mechanism (fine grained links) with the ability to copy (make a new instance for revision) and reference (clone if it appears to be apart of the content rather than an external reference) provides a logical representation in which it is an possible to simulate any other relationship mechanisms that we might choose. Note that it is often of value to hide some of these mechanisms in larger concepts - an outliner, an IBIS-like system or a semantic net. The representation provide for ease of use of commonly applied representational formalisms. We may introduce link types to support some of these formalisms more directly in the underlying representation.

The power of KM is the observation that the problem of representing knowledge has at least some elements which are independent of the knowledge being represented.

Consider the evolution of software development approaches:

  1. There is disagreement as to whether there can be multiple hierarchies or only one.

  2. Class membership is generally fixed. There are some systems and some techniques to allow an object to change its class dynamically.

  3. The set of behaviors and data structure that represent an object is fixed.

  4. Some say that the data and behavior is inadequate, that we have to understand and enforce the logical contract that represents the invariant promise of a functions behavior.

The entire evolution of software development methodologies has been one of improved knowledge representation as we work out ways of translating these higher level representations into executable formalisms. Note that the underlying executable system is not constrained, in principle, by the nature of the formalisms that it implements. At the level of machine instructions we generally have the ability to use spaghetti code at will - there are branches that are essentially arbitrary links, and it is potentially possible to reference any point in the entire memory space of the problem. The translators add the constraints, but many of them do not exist at the hardware level. As we progress and it becomes clearer that certain approaches work in all cases, machine instruction sets are being refactored to enforce some of the higher level constructs - access is restricted based on program data, there are special loop constructs, etc.

I submit that we can use the knowledge of the path we followed in the development of software development techniques to do a better job of evolving KM. Note also that software development itself could benefit greatly from even a relatively rudimentary KM solution that didn't get too much in the way. Note that most software tools manage only a part of the information and very little of the knowledge - most of them are simply editors for some representation of the code, and really exhibit no understanding of what it is they are manipulating. They are word processors for diagrams, with no understanding of the structure of the knowledge represented by the diagram.

By refactoring the path we have already followed, we should be able to get a set of requirements for an eventual KM system. Not being constrained by even reasonable amounts of caution, let's try it.

  1. When we represent an object, we have only a collection of representations of a selected set of attributes -- we don't have the object.

  2. We know object only in terms of their attributes. The set of values for the set of attributes is the state vector for the object

  3. We recognize changes in an object in terms of changes in the values of its attributes.

  4. Behaviors are functions that can change the attributes of an object.

  5. We restrict the set of behaviors that can be applied to an object to some set that has meaning (produces what we consider meaningful results for that object.

  1. It must be possible to modify the attributes of an object. Things change as a result of actions taken - paint it blue and the color changes.

  2. It must be possible to add attributes to an object. We aren't going to get all of them at once (there are an infinite number) so we need to be able to add attributes as we become interested in them. Note that this implies the ability to add attributes to the equivalent of classes and to create other classes that may specialize on the new attributes.

  3. Having already found this out, links should be bi-directional so that we can link to sources, etc.

All of this combines into a set of requirements that, if realized, results in the lowest level of chaos referred to as spaghetti code in software. We are close to being able to represent arbitrary relationships and essentially arbitrary and extendible operations on information nodes.

Now we consider adding structure to this mess.

  1. Copy -

    a node is copied in place. The contents can be edited. It retains an external link.

  2. Hierarchy -

    specific references organize a set of nodes and node references into a hierarchy. Not that this allows the hierarchy to contain or reference the external information at the choice of the author, and for the nodes in the hierarchy or the entire hierarchy to be treated as a node in its own right. This also supports the use of multiple hierarchies on the same underlying information.

  3. Versioning of nodes.

    This includes versioning and archiving of documents, which establish points in time where a give node can no longer be edited.

At this point we can begin to add any sort of operations that we can dream up and experiment with. Want an annotation system? Add a button to generate a comment, typed or not, request whatever information is needed for link type information, and have the associated action create the information node and the links. If the link information is small enough, use multiple buttons. Add a section to the node data to control the use of this node in this abstraction.

The basic reference engine is a real bear, but the initial requirements seem to be a relatively small set, and I think achievable. Certainly the designs of existing systems can be run against these to validate or modify the requirements.

Add the scalability requirements from several emails back (individual to group to merged knowledge) and I would be delight with a system that implemented the requirements in a reasonable fashion.

Thanks,

Sincerely,



Garold (Gary) L. Johnson
DYNAMIC Alternatives
http://www.dynalt.com/