Word WorkText tools to manage manuscript preparation |
|
From "VS Workshop", Access to Wang, July 1989 |
|
[ Prior Article ] [ Return to the Catalog of articles ] [ Next Article ] |
Anyone who has worked on group writing projects of any size knows of the uncertainty and lack of control that having several authors brings. It never seems that you have the latest version of the documents, or even it anyone KNOWS which version is which. As I see more writing and analysis projects myself, I have come to see some parallels with the the programmer's task; as a programmer, I have reached for some of the technical tools that I found helpful before. The results have been very helpful for me, so I share them here.
Some ISWUAIDS that were intended for source file processing also have application in manuscript preparation. These include:
COMPRLIB: compares two libraries. Shows files found in both libraries, files not found in either, and whether files found in both libraries are the same.
CONVERTC: multi-purpose text conversion and search utility. Accepts a list of search characters (including wild card characters) and can perform many different actions when found.
DIFF: compares source file lines and lists differences between them.
FINDTEXT: search utility. Searches any type of file. Option to ignore case.
SCANSRC: source file search utility. Searches for up to 12 text strings at once.
TRANSRC: converts source files to uppercase letters. Required for reliable use of case-sensitive search utilities such as CONVERTC and SCANSRC.
The utilities to be covered here expect 80-character consecutive files (known variously as source, editor, or text files) as their input. All documents must be converted to this format before proceeding. This is easily accomplished through Wang's COPYWP utility (or COPYPLUS for users of WP+). Choose the option to convert a document to a VS file and select TYPE = SOURCE at the output screen. (Naturally, this conversion can be automated with a procedure. See Figure 1, below, for an example of this.)
In many cases it will also be necessary to convert the completed source file to uppercase so that case-sensitive utilities (e.g. CONVERTC, SCANSRC) will correctly find text in all combinations of case. The TRANSRC ISWUAID can be used for this purpose. It accepts the input file name and creates a new file with lowercase characters converted to uppercase.
A frequent need in manuscript preparation is a comparison between versions of a document. This process is called redlining - a reference to the red ink used to mark up draft copies. Redlining software has been available for a while, but only the most expensive versions are sophisticated enough to detect all types of changes in a document. In particular, the DIFFWP utility from ISWU does not work well enough to warrant the effort required to use it. A good compromise is to compare source file versions using the DIFF utility.
DIFF compares lines in source files in specified columns and lists differences between them. It has a limited ability to identify lines that have been moved, but will often find items that have been moved short distances. Since the comparison is made at the line level, changes in the text that affect the line wraps will be recorded as changes by DIFF.
Figure 1 shows a sample procedure for comparing two documents. It uses COPYWP to convert the files to source format, runs DIFF to compare them, then displays the results with DISPLAY. Depending on need, COPYPLUS could be substituted for COPYWP and the call to DISPLAY replaced by a print management utility.
Figure 2 shows the results of a comparison of two text passages using a section of Walden as the test material. (Apologies to H.D. Thoreau.) As you can see, lines that do not match are shown with the old lines marked as deleted and the new lines marked as inserted. As you can see, this approach is best in a static format where there are few revisions.
FINDTEXT and SCANSRC represent two approaches to text searches. I covered both in greater detail in a previous column (see "VS Search Utilities", Nov. 1987 ACCESS 87, pp 20-24). FINDTEXT will search any file type; SCANSRC works only with source files but will accept up to 12 search strings at a time. FINDTEXT offers an option for case-insensitive searches; SCANSRC looks for exact image matches.
A likely question is: if FINDTEXT will search any file type, why not search in the original document format? One reason is that the internal organization of Word Processing files is not necessary consecutive, and there is a good likelihood that the text you search for will not be contiguous - and, therefore, not found.
Since SCANSRC is sensitive to case, it is usually necessary to convert to upper case before searching. As previously mentioned, TRANSRC can be used for this purpose. Also note that SCANSRC does not work completely correctly on 7-level operating systems - particularly when the wild card file selections are used - so you may have to use other methods.
FINDTEXT searches for only one string, but it can ignore case and, thus, find more occurences. One way of getting around the single-item limitation is by conducting your searches in background with the GUMSHOE procedure featured in the March column (see "The Cheap Detective", Mar. 1989 Access to Wang; pp. 14-16). Typical searches across 116 text files were conducted in 30 seconds or less when run on a VS 7310 - even when the slower case-insensitive option was selected.
With CONVERTC you can take a more sophisticated approach to text searching. Designed as a source file search and conversion utility, CONVERTC can search within columns and even replace text within similar text areas. The search options include searches of known and unknown elements within columns, the non-occurence of text within areas, and hexidecimal and numeric searches. The actions that can be taken include report listings, replacement of text strings, and on-line modification of the line.
For example, consult the search specification in Figure 3. The task was to print similar sections from many different documents. The items on the left specify a search for a header ("Purpose") within column one and 15; the occurence of this element triggered a print line. Additional print lines are generated until a blank line (defined as blanks from column 1 to 30) is encountered, when the printer is turned off. The results are shown in Figure 4.
If you have been involved with large manuscripts before, you know that it is imperative that you know the versions of the documents in progress. If you maintain copies of each version as source files in libraries, this task might be easily maintained with COMPRLIB.
COMPRLIB compares two specified libraries, checking file names, number of records, file type (organization), record sizes, and, for text files, the records themselves. It reports on files that are found in only one of the libraries and those found in both. The files found in both also display letter codes indicating any differences that might exist between the files themselves. COMPRLIB can also link to DIFF to compare source files, and the optional reports provide good documentation for version control.
The application of source file tools to manuscript projects can offer some release from the drudgery of manual control and better information. While the source file approach described here is far from elegant, it does work and doesn't cost a fortune.
Figure 1: DOCDIFF - Sample Procedure to Compare Documents
PROCEDURE DOCDIFF - compares documents using DIFF DECLARE &DOC1, &DOC2 STRING (05) DECLARE &KEY INTEGER DECLARE &WKLIB, &PRTLIB STRING (08) DECLARE &WKVOL, &PRTVOL STRING (06) EXTRACT &WKLIB = WORKLIB, &WKVOL = WORKVOL, &PRTLIB = SPOOLIB, &PRTVOL = SPOOLVOL PROMPT PFKEY = &KEY CENTER LINE "Please enter the document IDs:";; CENTER "OLD document =", UPLOW &DOC1;; CENTER "NEW document =", UPLOW &DOC2;; CENTER LINE "Press PF(16) to exit";; IF &KEY = 16 RETURN RUN COPYWP [ in LIBRARY on VOLUME ] ENTER FUNCTION 13 ENTER INPUT DOCUMENT = &DOC1 O1: ENTER OUTPUT FILE = &DOC1, LIBRARY = &WKLIB, VOLUME = &WKVOL, TYPE = SOURCE ENTER FUNCTION 13 ENTER INPUT DOCUMENT = &DOC2 O2: ENTER OUTPUT FILE = &DOC2, LIBRARY = &WKLIB, VOLUME = &WKVOL, TYPE = SOURCE ENTER FUNCTION 16 RUN DIFF [ in LIBRARY on VOLUME ] ENTER DOC ENTER OPTIONS LISTOPT = C ENTER FORMAT FORMAT = X O3: ENTER FILES F1NAME = (O1.FILE), F1LIB = (O1.LIBRARY), F1VOL = (O1.VOLUME), F2NAME = (O2.FILE), F2LIB = (O2.LIBRARY), F2VOL = (O2.VOLUME), LISTNAME = ##DIFF, LISTLIB = &PRTLIB, LISTVOL = &PRTVOL ENTER EOJ RUN DISPLAY [ in LIBRARY on VOLUME ] ENTER INPUT FILE = (O3.LISTNAME), LIBRARY = (O3.LISTLIB), VOLUME = (O3.LISTVOL), MODE = PRINT RETURN
Figure 2: Sample Text Comparison with DIFF
00001 ====> 1 LINE MOVED TO LINE 13 IN NEW FILE. 00002 D 00003 D The greater part of what my neighbors call 00004 D good I believe in my soul to be evil, and if 00005 D I repent of anything, it is very likely to be 00006 D my good behavior. What devil possessed me 00001 I "The greater part of what my neighbors call 00002 I good I believe in my soul to be bad, and if I 00003 I repent of anything, it is very likely to be 00004 I my good behavior. What demon possessed me 00010 D kind,--I hear an unrelenting voice which 00008 I kind,--I hear an irresistable voice which 00012 D generation quits the interests of another 00013 D like stranded ships. 00010 I generation abandons the enterprises of 00011 I another like stranded vessels." 00015 D 00001 00013 M --H.D. Thoreau 00014 I LINES INSERTED: 8 DELETED: 9 MOVED: 1 MATCHED: 5
Figure 3: CONVERTC Search Specification - Locating Text Within Boundaries
Search options Action "Purpose "[01/15] 1# " "[01/15] # " "[01/30] 2
Figure 4: CONVERTC listing from Sample Search
ALFANMBR IN DSBTEXT ON VOL333 LANGUAGE: DATA CHANGE NEW SOURCE STATEMENT CODE 1 Purpose Subroutine that converts 2 numeric amounts to alphbetic 2 equivalents. ALLOCWP IN DSBTEXT ON VOL333 LANGUAGE: DATA CHANGE NEW SOURCE STATEMENT CODE 1 Purpose Creates a Word Processing 2 document with a given ID, 2 setting tab stops on the format 2 line according to the user's 2 wishes. ARCHIVE IN DSBTEXT ON VOL333 LANGUAGE: DATA CHANGE NEW SOURCE STATEMENT CODE 1 Purpose Copies all documents within a 2 given range to 8" archive 2 diskettes, with an option to 2 scratch the files after 2 copying.
Figure 5: Test Used For Comparisons (before changes)
--H.D. Thoreau The greater part of what my neighbors call good I believe in my soul to be evil, and if I repent of anything, it is very likely to be my good behavior. What devil possessed me that I behaved so well? You may say the wisest thing you can, old man,--you who have lived seventy years, not without honor of a kind,--I hear an unrelenting voice which invites me away from all that. One generation quits the interests of another like stranded ships.
Figure 6: Test Used For Comparisons (after changes)
"The greater part of what my neighbors call good I believe in my soul to be bad, and if I repent of anything, it is very likely to be my good behavior. What demon possessed me that I behaved so well? You may say the wisest thing you can, old man,--you who have lived seventy years, not without honor of a kind,--I hear an irresistable voice which invites me away from all that. One generation abandons the enterprises of another like stranded vessels." --H.D. Thoreau
Copyright © 1989 Dennis S. Barnes
Reprints of this article are permitted without notification
if the source of the information is clearly identified