[Access to Wang masthead]

Word Work

Text tools to manage manuscript preparation

From "VS Workshop",  Access to Wang, July 1989
  [ Prior Article ]     [ Return to the Catalog of articles ]     [ Next Article ]  

Anyone who has worked on group writing projects of any size knows of the uncertainty and lack of control that having several authors brings. It never seems that you have the latest version of the documents, or even it anyone KNOWS which version is which. As I see more writing and analysis projects myself, I have come to see some parallels with the the programmer's task; as a programmer, I have reached for some of the technical tools that I found helpful before. The results have been very helpful for me, so I share them here.

Tools available from ISWU

Some ISWUAIDS that were intended for source file processing also have application in manuscript preparation. These include:

Preparation: converting documents to source files

The utilities to be covered here expect 80-character consecutive files (known variously as source, editor, or text files) as their input. All documents must be converted to this format before proceeding. This is easily accomplished through Wang's COPYWP utility (or COPYPLUS for users of WP+). Choose the option to convert a document to a VS file and select TYPE = SOURCE at the output screen. (Naturally, this conversion can be automated with a procedure. See Figure 1, below, for an example of this.)

In many cases it will also be necessary to convert the completed source file to uppercase so that case-sensitive utilities (e.g. CONVERTC, SCANSRC) will correctly find text in all combinations of case. The TRANSRC ISWUAID can be used for this purpose. It accepts the input file name and creates a new file with lowercase characters converted to uppercase.

Document comparisons with DIFF

A frequent need in manuscript preparation is a comparison between versions of a document. This process is called redlining - a reference to the red ink used to mark up draft copies. Redlining software has been available for a while, but only the most expensive versions are sophisticated enough to detect all types of changes in a document. In particular, the DIFFWP utility from ISWU does not work well enough to warrant the effort required to use it. A good compromise is to compare source file versions using the DIFF utility.

DIFF compares lines in source files in specified columns and lists differences between them. It has a limited ability to identify lines that have been moved, but will often find items that have been moved short distances. Since the comparison is made at the line level, changes in the text that affect the line wraps will be recorded as changes by DIFF.

Figure 1 shows a sample procedure for comparing two documents. It uses COPYWP to convert the files to source format, runs DIFF to compare them, then displays the results with DISPLAY. Depending on need, COPYPLUS could be substituted for COPYWP and the call to DISPLAY replaced by a print management utility.

Figure 2 shows the results of a comparison of two text passages using a section of Walden as the test material. (Apologies to H.D. Thoreau.) As you can see, lines that do not match are shown with the old lines marked as deleted and the new lines marked as inserted. As you can see, this approach is best in a static format where there are few revisions.

Text searches with FINDTEXT and SCANSRC

FINDTEXT and SCANSRC represent two approaches to text searches. I covered both in greater detail in a previous column (see "VS Search Utilities", Nov. 1987 ACCESS 87, pp 20-24). FINDTEXT will search any file type; SCANSRC works only with source files but will accept up to 12 search strings at a time. FINDTEXT offers an option for case-insensitive searches; SCANSRC looks for exact image matches.

A likely question is: if FINDTEXT will search any file type, why not search in the original document format? One reason is that the internal organization of Word Processing files is not necessary consecutive, and there is a good likelihood that the text you search for will not be contiguous - and, therefore, not found.

Since SCANSRC is sensitive to case, it is usually necessary to convert to upper case before searching. As previously mentioned, TRANSRC can be used for this purpose. Also note that SCANSRC does not work completely correctly on 7-level operating systems - particularly when the wild card file selections are used - so you may have to use other methods.

FINDTEXT searches for only one string, but it can ignore case and, thus, find more occurences. One way of getting around the single-item limitation is by conducting your searches in background with the GUMSHOE procedure featured in the March column (see "The Cheap Detective", Mar. 1989 Access to Wang; pp. 14-16). Typical searches across 116 text files were conducted in 30 seconds or less when run on a VS 7310 - even when the slower case-insensitive option was selected.

Text extraction with CONVERTC

With CONVERTC you can take a more sophisticated approach to text searching. Designed as a source file search and conversion utility, CONVERTC can search within columns and even replace text within similar text areas. The search options include searches of known and unknown elements within columns, the non-occurence of text within areas, and hexidecimal and numeric searches. The actions that can be taken include report listings, replacement of text strings, and on-line modification of the line.

For example, consult the search specification in Figure 3. The task was to print similar sections from many different documents. The items on the left specify a search for a header ("Purpose") within column one and 15; the occurence of this element triggered a print line. Additional print lines are generated until a blank line (defined as blanks from column 1 to 30) is encountered, when the printer is turned off. The results are shown in Figure 4.

Library comparison with COMPRLIB

If you have been involved with large manuscripts before, you know that it is imperative that you know the versions of the documents in progress. If you maintain copies of each version as source files in libraries, this task might be easily maintained with COMPRLIB.

COMPRLIB compares two specified libraries, checking file names, number of records, file type (organization), record sizes, and, for text files, the records themselves. It reports on files that are found in only one of the libraries and those found in both. The files found in both also display letter codes indicating any differences that might exist between the files themselves. COMPRLIB can also link to DIFF to compare source files, and the optional reports provide good documentation for version control.

Summary

The application of source file tools to manuscript projects can offer some release from the drudgery of manual control and better information. While the source file approach described here is far from elegant, it does work and doesn't cost a fortune.


Figure 1: DOCDIFF - Sample Procedure to Compare Documents


PROCEDURE DOCDIFF - compares documents using DIFF

DECLARE &DOC1, &DOC2 STRING (05)
DECLARE &KEY INTEGER
DECLARE &WKLIB, &PRTLIB STRING (08)
DECLARE &WKVOL, &PRTVOL STRING (06)

EXTRACT &WKLIB   = WORKLIB,
        &WKVOL   = WORKVOL,
        &PRTLIB  = SPOOLIB,
        &PRTVOL  = SPOOLVOL

PROMPT PFKEY = &KEY
CENTER LINE "Please enter the document IDs:";;
CENTER "OLD document =", UPLOW &DOC1;;
CENTER "NEW document =", UPLOW &DOC2;;
CENTER LINE "Press PF(16) to exit";;

IF &KEY = 16 RETURN

RUN COPYWP [ in LIBRARY on VOLUME ]
ENTER   FUNCTION  13
ENTER   INPUT     DOCUMENT = &DOC1
O1: ENTER OUTPUT  FILE     = &DOC1,
                  LIBRARY  = &WKLIB,
                  VOLUME   = &WKVOL,
                  TYPE     = SOURCE
ENTER   FUNCTION  13
ENTER   INPUT     DOCUMENT = &DOC2
O2: ENTER OUTPUT  FILE     = &DOC2,
                  LIBRARY  = &WKLIB,
                  VOLUME   = &WKVOL,
                  TYPE     = SOURCE
ENTER   FUNCTION  16

RUN DIFF [ in LIBRARY on VOLUME ]
ENTER   DOC
ENTER   OPTIONS   LISTOPT  = C
ENTER   FORMAT    FORMAT   = X
O3: ENTER FILES   F1NAME   = (O1.FILE),
                  F1LIB    = (O1.LIBRARY),
                  F1VOL    = (O1.VOLUME),
                  F2NAME   = (O2.FILE),
                  F2LIB    = (O2.LIBRARY),
                  F2VOL    = (O2.VOLUME),
                  LISTNAME = ##DIFF,
                  LISTLIB  = &PRTLIB,
                  LISTVOL  = &PRTVOL
ENTER   EOJ

RUN DISPLAY [ in LIBRARY on VOLUME ]
ENTER   INPUT     FILE     = (O3.LISTNAME),
                  LIBRARY  = (O3.LISTLIB),
                  VOLUME   = (O3.LISTVOL),
                  MODE     = PRINT

RETURN

Figure 2: Sample Text Comparison with DIFF


00001 ====>     1 LINE MOVED TO LINE 13 IN NEW FILE.
00002       D
00003       D   The greater part of what my neighbors call
00004       D   good I believe in my soul to be evil, and if
00005       D   I repent of anything, it is very likely to be
00006       D   my good behavior.  What devil possessed me
      00001 I   "The greater part of what my neighbors call
      00002 I   good I believe in my soul to be bad, and if I
      00003 I   repent of anything, it is very likely to be
      00004 I   my good behavior.  What demon possessed me

00010       D   kind,--I hear an unrelenting voice which
      00008 I   kind,--I hear an irresistable voice which

00012       D   generation quits the interests of another
00013       D   like stranded ships.
      00010 I   generation abandons the enterprises of
      00011 I   another like stranded vessels."

00015       D
00001 00013 M   --H.D. Thoreau
      00014 I

LINES INSERTED:  8 DELETED:  9 MOVED:  1 MATCHED:  5

Figure 3: CONVERTC Search Specification - Locating Text Within Boundaries

Search options Action
"Purpose "[01/15] 1#
" "[01/15] #
" "[01/30] 2

Figure 4: CONVERTC listing from Sample Search


ALFANMBR IN DSBTEXT         ON VOL333
LANGUAGE: DATA
CHANGE   NEW SOURCE STATEMENT
 CODE
  1   Purpose           Subroutine that converts
  2                     numeric amounts to alphbetic
  2                     equivalents.

ALLOCWP  IN DSBTEXT         ON VOL333
LANGUAGE: DATA
CHANGE   NEW SOURCE STATEMENT
 CODE

  1   Purpose           Creates a Word Processing
  2                     document with a given ID,
  2                     setting tab stops on the format
  2                     line according to the user's
  2                     wishes.

ARCHIVE  IN DSBTEXT         ON VOL333
LANGUAGE: DATA
CHANGE   NEW SOURCE STATEMENT
 CODE

  1   Purpose           Copies all documents within a
  2                     given range to 8" archive
  2                     diskettes, with an option to
  2                     scratch the files after
  2                     copying.  

Figure 5: Test Used For Comparisons (before changes)


--H.D. Thoreau

The greater part of what my neighbors call
good I believe in my soul to be evil, and if
I repent of anything, it is very likely to be
my good behavior.  What devil possessed me
that I behaved so well?  You may say the
wisest thing you can, old man,--you who have
lived seventy years, not without honor of a
kind,--I hear an unrelenting voice which
invites me away from all that.  One
generation quits the interests of another
like stranded ships.  

Figure 6: Test Used For Comparisons (after changes)


"The greater part of what my neighbors call
good I believe in my soul to be bad, and if I
repent of anything, it is very likely to be
my good behavior.  What demon possessed me
that I behaved so well?  You may say the
wisest thing you can, old man,--you who have
lived seventy years, not without honor of a
kind,--I hear an irresistable voice which
invites me away from all that.  One
generation abandons the enterprises of
another like stranded vessels."

--H.D. Thoreau

  [ Prior Article ]     [ Return to the Catalog of articles ]     [ Next Article ]  


Copyright © 1989 Dennis S. Barnes
Reprints of this article are permitted without notification if the source of the information is clearly identified