Dynamic Alternatives
P.O. Box 59237
Norwalk, CA 90652
dynalt@dynalt.com


S U M M A R Y


DIARY: January 22, 2004 07:07 AM Thursday; Garold L. Johnson

SDS++ -- Convert SDS Record to HTML

1...Summary/Objective
2...Read SDS records and parse sections.
3...Line numbers
4...Description Line
5...Title Line
6...Contacts
7...SDS Records Reference Section
8...Document Log Entries
9...Other Sections?
10...Control Fields
11...Text of the Record
12...Anchors
13...References
14...Perl Structures
15...Conversion Strategy
16...Subroutines in the Input Loop
17...Operations Performed by Macro 070405.

ACTION ITEMS.................. Click here to comment!

1...Consider: Write a Perl script to add the day of the week to SDS

CONTACTS 

SUBJECTS
SI Templates, Generalize

0403 -
0403 -    ..
0404 - Summary/Objective
0405 -
040501 - Follow up ref SDS 20 0000. ref SDS 19 0000.
040503 -  ..
040504 - Read SDS records and parse sections.
040505 -
040506 - This is a general record reading and parsing routine.
040508 -  ..
040509 - It would be nice to make this a true object, but I don't know if I
040510 - want to attempt that yet.
040511 -
040512 -
040514 -  ..
040515 - Line numbers
040516 -
040517 - SDS lines are numbered. The line number starts in column 1, has a mark
040518 - / highlight character after it, and then a dash and a space.
040519 -
040520 -
040522 -  ..
040523 - Description Line
040524 -
040525 - This line has the date, time, User, and similar information.
040527 -  ..
040528 - It is followed by a blank line.
040529 -
040530 -
040532 -  ..
040533 - Title Line
040534 -
040535 - This line is numbered 01 and has the record title. If there is a
040536 - non-blank character after the line number, the record is confidential
040537 - and not to be published.
040538 -
040539 -
040541 -  ..
040542 - Contacts
040543 -
040544 - The 2-digit 02 line is a single underline (0xc4).
040546 -  ..
040547 - Information about the individual is in 02xx01 lines.
040549 -  ..
040550 - Phone numbers are not converted into the HTML.
040552 -  ..
040553 - Q: This is one form. Are there others? Experiment at home. Try a
040554 - personal contact as well as a company contact.
040556 -  ..
040557 - At label lpprs in Macro 0704050 is the comment:
040559 -     ..
040560 -    If this is a personal contact, remove the entire line so there is
040561 -    no reference to people as individuals on the web, including the
040562 -    phone line string.
040564 -  ..
040565 - At label lpprs2 is the comment:
040567 -     ..
040568 -    See if the next line has a - because this means it is a subsidiary
040569 -    personal contact line beloning to the line above, and should be
040570 -    deleted.
040572 -  ..
040573 - However, some older records on the web aren't confomed, such as:
040575 -     ..
040576 -    http://www.welchco.com/sd/08/00101/02/02/01/10/200202.HTM
040577 -    http://www.welchco.com/sd/08/00101/02/03/01/03/171101.HTM
040578 -    http://www.welchco.com/sd/08/00101/02/03/01/06/101428.HTM
040580 -  ..
040581 - Internet addresses in contacts are removed.
040583 -  ..
040584 - Here is one sort of area where sequential processing works well as
040585 - only the portion we want has to be handled rather than having to
040586 - remove everything we don't want.
040588 -  ..
040589 - Q: What do we do about phone numbers we *want* in the published
040590 - record?
040591 -
040592 -
040594 -  ..
040595 - SDS Records Reference Section
040596 -
040597 - This section has a line for each SDS record that has been linked,
040598 - whether there is a link currently in the record or not.
040600 -     ..
040601 -    0301 -   SDS records
040602 -    0301xx - where xx = [01 .. ZZ]
040604 -  ..
040605 - The record date is in column 10.
040607 -  ..
040608 - The record title is in column 17.
040610 -  ..
040611 - Starting in column 171 is <date> <time> <User ID>, which can be used to
040612 - construct the record location.
040614 -  ..
040615 - This should be converted to a numbered array of URL suffixes so that
040616 - SDS references can be converted to links.
040617 -
040618 -
040620 -  ..
040621 - Document Log Entries
040622 -
040623 - Documents [Issued / Received] [Prior / Today] are in a standard
040624 - format.
040626 -  ..
040627 - Other Files (OF) have a similar but not identical structure.
040629 -  ..
040630 - Introduced by a 4-digit line number followed by 2 highlighted title
040631 - lines with 4-digit numbers.
040633 -  ..
040634 - The document type is in columns 29 - 31 of the first title line.
040636 -  ..
040637 - Entries have 6-digit line numbers with the last 2 being the sequence
040638 - or index number. I assume that this can include [AA .. ZZ] as can SDS
040639 - record entries.
040641 -  ..
040642 - File path elements are space separated in columns 10 - 40.
040644 -  ..
040645 - File paths and therefore URLs can be constructed from all lines.
040647 -  ..
040648 - Description begins in column 44.
040649 -
040650 -
040652 -  ..
040653 - Other Sections?
040654 -
040655 - There may be other sorts of sections that I haven't encountered.
040656 -
040657 -
040659 -  ..
040660 - Control Fields
040661 -
040662 - Control fields list Subjects and define a record segment. There can be
040663 - multiple segments in a record.
040665 -  ..
040666 - The last control field is followed by a list of 4-digit line numbers
040667 - with Subjects text and followed by a double underline (0xcd).
040669 -  ..
040670 - These fields don't seem to affect the HTML except to trigger the
040671 - output of the Subjects text. Actually, SDS deletes the actual control
040672 - fields leaving the Subjects text.
040674 -  ..
040675 - Consider: Refuse to convert unless first control segment contain an
040676 - "Ok to Publish"  subject. This may be too much, since I often want to
040677 - get an HTML conversion of a record that I don't want to put on the
040678 - web. A script to process records before uploading is probably the
040679 - better idea.
040680 -
040681 -
040683 -  ..
040684 - Text of the Record
040685 -
040686 - This is the textual portion of the record. It has its own set of rules
040687 - related to StructuredText.
040689 -  ..
040690 - Indentation is significant
040691 -
040692 -    • Headlines - One or more lines followed by a single underline
040693 -      (0xc4). The lines are highlighted with 'j'. Headlines are used to
040694 -      create the TOC. Q: Is only the first line of a multiple line set
040695 -      used in the TOC, or do they all appear (would seem excessive)?
040696 -      The underline is rendered as a blank line.
040698 -       ..
040699 -    • Outlines - bulleted, numbered
040701 -       ..
040702 -    • Highlights with various meanings
040703 -
040704 -      • f - User Action Item
040705 -
040706 -      • k - Other person Action Item - Both types of Action Items are
040707 -        place in a Action Items TOC at the top of the HTML record.  The
040708 -        "Action Items" title is omitted if there are no Action Items.
040710 -         ..
040711 -      • m - Completed Action Item
040713 -         ..
040714 -      • j - Higlight. Headlines when series is followed by a single
040715 -        underline. Q: Are highlighted lines rendered differently from
040716 -        headlines?
040718 -         ..
040719 -      • S - Single confidential line
040721 -         ..
040722 -      • s - Confidential to and including next 's' line
040724 -         ..
040725 -        Confidential lines not to be converted. Individual lines are
040726 -        tagged w  ith 'S'. Sets of lines have a ben and end tag of 's'.
040727 -        These are checked for balance on every Save. This is not a
040728 -        certain test but at worst it should suppress too many lines.
040730 -  ..
040731 - It probably makes sense for the record read routine to accept an
040732 - argument that will automatically strip confidential lines while
040733 - reading.
040735 -  ..
040736 - A section consists of 3 4-digit line numbers with a description in the
040737 - center. I assume that there can be more than one description line.
040738 - The text is rendered as bold.
040740 -  ..
040741 - The record ends with a 4-digit line number followed by a blank line.
040742 - See if these are rendered in any special way.
040744 -  ..
040745 - When the first non-blank character is a left bracket ('['), the line
040746 - is assumed to be a forward reference and is rendered in bold.  Some
040747 - such paragraphs are terminated with a right bracket (']'), but this is
040748 - not a requirement.
040749 -
040750 -
040752 -  ..
040753 - Anchors
040754 -
040755 - Anchors are 4-character strings following a less than ('<') as the
040756 - first non-blank character on qa line or after an outline designator.
040757 - They are rendered as a link using two dots.
040758 -
040759 -    <A NAME="HL53"></A> <A HREF="#HL53">..</A>
040760 -
040761 -
040763 -  ..
040764 - References
040765 -
040766 - References convert to links and are of the form:
040768 -     ..
040769 -    'ref' <ref type> <ref number> <anchor>
040771 -  ..
040772 - A reference can be split across lines before the anchor.
040773 -
040774 -
040776 -  ..
040777 - Perl Structures
040778 -
040779 - Consider a hash based on reference types to resolve all references.
040780 -
040781 -    $rec->{refs}{<ref type>}[<ref number>] = <file suffix / path>
040782 -
040783 -    <ref_type> = SDS | DRT | DRP | DIT | DIP | OF
040785 -  ..
040786 - An array of control segment references.
040787 -
040788 -    $rec->{seg}[<seg number>] = [<SI code, Title>]
040789 -
040790 -    $rec->{seg}{txt}[<line>] = <text>
040791 -
040792 -
040794 -  ..
040795 - Conversion Strategy
040796 -
040797 - There are at least two basic strategies:
040798 -
040799 -      1) Read the record and convert to output in a single pass.  This
040800 -         is the general Perl approach.
040802 -          ..
040803 -      2) Read the entire record and convert it to HTML by making
040804 -         multiple passes over the text. This is the approach that SDS
040805 -         uses.
040807 -  ..
040808 - Perl can accomodate either one, or even a mixture. The choice is
040809 - somewhat a matter of taste, but the question of ease of programming
040810 - also comes into play.
040811 -
040812 -
040814 -  ..
040815 - Subroutines in the Input Loop
040816 -
040817 - At one time I wrote some scripts that made use of subroutines to clean
040818 - up parsing in the input script -- possibly in the recipe conversions.
040820 -  ..
040821 - I recall the general strategy, but it would be nice to locate the
040822 - scripts to get specifics, and this time to document the approach.
040824 -  ..
040825 - The approach is to recognize a section to be processed and then to
040826 - call a subroutine to do it. The result is a nice, clean program.
040828 -  ..
040829 - The issues involve what to do when returning from a subroutine:
040830 -
040831 -    1) Has the next line been read?
040832 -
040833 -    2) Did the read encounter the end of file?
040835 -  ..
040836 - The best approach is to read the next line in all cases so that all
040837 - returns are processeed using 'redo' to address the new line. What I
040838 - don't recall is how I handled EOF. I could just make the decision
040839 - again, but where is the fun in that?
040840 -
040841 -
040843 -  ..
040844 - Operations Performed by Macro 070405.
040845 -
040846 - Note that in many cases, the zone is set to 6 - 170 to prevent changes
040847 - from messing up information in column 171+ in reference fields.  I
040848 - can't see anything in the reference fields that would require
040849 - translation since they are never output to the HTML.
040851 -  ..
040852 - Quit if record is marked confidential (title line has non-blank
040853 - highlight character).
040855 -  ..
040856 - Remove lines marked for suppression (single 'S' and lines from one 's'
040857 - highlight to next).
040859 -  ..
040860 - See if the Reference line numbers are 6 places, since if they are not,
040861 - this is an old record that we will not convert to HTML, ref OF 1 NF30.
040863 -  ..
040864 - Transfer default anchor 0001 to above Summary/Objective. Delete it
040865 - from Follow up line. In fact, the Follw up line is not rendered at
040866 - all.
040868 -  ..
040869 - The decision not to include the list of referenced records makes
040870 - sense, but it seems that the records in the Follow up change are valid
040871 - reference links.
040873 -  ..
040874 - Q: Shouldn't the Follow up links be available in the HTML?
040876 -  ..
040877 - Conditionally convert k follow ups to f for faster processing.  All
040878 - Action Items are handled the same.
040880 -  ..
040881 - Action: Determine what condition bypasses this conversion step
040883 -  ..
040884 - At label hlDC, the comment states:
040886 -     ..
040887 -    This seems to be the first indented heading and it is saved because
040888 -    it is the Summary/Objective and so is used in the Summary, but all
040889 -    other indented headings are deleted.
040891 -  ..
040892 - This isn't clear, since such headings are rendered in bold and *do*
040893 - appear in the HTML.
040895 -  ..
040896 - At label Hdlin is the comment
040897 -
040898 -    "Check to avoid a graphical box situation".
040900 -  ..
040901 - This checks for a corner when an underline is found.
040903 -  ..
040904 - Separate Action Items and Move to Below Headlines.
040906 -  ..
040907 - Older records did not show day of week, so enter a filler, ref OF 1
040908 - 3164. With Perl, can determine the day of the week.
040910 -  ..
040911 - Consider: Write a Perl script to add the day of the week to SDS
040912 - records.
040914 -  ..
040915 - Q: Are there any other format issues that we should consider
040916 - conforming in bulk?
040917 -
040918 -    • Individual names in Contacts. Are there any with phone numbers?
040919 -      Any other confidentiality issues that should be considered?
040920 -
040921 -    • References using the old line number mechanism should be replaced
040922 -      with the new mechanism, including adding anchors to the target
040923 -      record.
040925 -  ..
040926 - Removes email addresses in body text.
040928 -  ..
040929 - Q: How do we get email addresses we want converted into the HTML?
040931 -  ..
040932 - Mark anchors by replacing the leading '<' with a special character.
040933 - Convert '<', '&', and '>' to HTML entities.
040935 -  ..
040936 - Q: Is there any way to support some form of embedded markup?
040938 -  ..
040939 - If the anchor is on a line by itself, and the line above is blank,
040940 - then delete the blank line to improve appearance, so that paragraphs
040941 - are not double spaced.
040943 -  ..
040944 - Load letterhead for person sending the SDS record, including the
040945 - organization and email address.
040947 -  ..
040948 - Bold each line number level 4 heading, which usually is a single word,
040949 - like "Discussion" or "Progress," or the line begins with the time of a
040950 - call from using F1 F8, such as "1324 called Tom back"
040952 -  ..
040953 - Links to Citations in Body Text "ref" for all citation types.  Note
040954 - that "SDS 0" is the current document and the HREF needs only the
040955 - anchor.
040957 -  ..
040958 - Handle POIMS and NWO as special cases since the records are segmented.
040960 -  ..
040961 - Make sure there are 20 lines below the last anchor, so accessing the
040962 - anchor with a link will display the line at the top of the browser.
040963 -
040964 -
040965 -
040966 -
040967 -
040968 -
040969 -
040970 -
040971 -
040972 -
0410 -