[Access to Wang masthead]

To Any Length

Variable-length data records: a COBOL example

From "VS Workshop",  Access to Wang, October 1989
  [ Prior Article ]     [ Return to the Catalog of articles ]     [ Next Article ]  

Some aspects of VS life are so prevalent that their importance tends to be overlooked. Examples include the standard programming interface (GETPARMs), the ability for programs written in any language to call others in any language, and data buffers provided by the SHARER. All of these capabilities are handled smoothly and without our attention by the operating system.

Data compression is another area that is typically forgotten. Most VS files are compressed, and most of the time it is handled automatically by the Data Management System (DMS) without our knowledge. Even so, the principles of VS data compression must be understood when attempting even simple transfers to other data storage systems.

I recently had occasion to look carefully at the issue of variable-length records after I had transferred a few text files from the VS to my PC with intent to import them into Lotus 1-2-3. As compressed VS files, they were modest in size; on the PC, however, they were much larger. The size difference was caused by a large number of blank characters at the end of each PC record. Close to the project deadline, I used the PC program RTW ("Remove Trailing Whitespace") to shorten the files and vowed to find a better approach later. (RTW is a classic C program from "The C Programming Language" by Kernighan and Ritchie.)

Further investigation yielded that the VS file transfer utility I used extended the files to the their full record length prior to transferring them; thus, records with shorter lines were "padded" with trailing blanks that greatly increased the size of the files. In other words, the VS considered all records to be equal in length.

Compressed versus variable-length VS records

While compressed files on the VS use variable-length records, they typically extend to their maximum length when decompressed. That is, the information within the record area is automatically squeezed to a manageable size by the Data Management System (DMS), making it unimportant to further reduce disk storage through other means. Thus, text files on the VS are usually stored with all trailing blanks preserved - though in compressed form.

PC disk storage techniques are considerably less sophisticated. Text files within MS-DOS are stored with carriage return and line feed characters to indicated the end of the record area; data compression is limited to reduction of the length of each record to the actual length of the data. (Another PC data compression technique involves replacing leading spaces with tab characters. This is far less universal and unlikely to be correctly recognized by all PC applications.)

The solution was to use variable-length records on the VS side so that that the transferred PC file is shortened to an appropriate length. I wrote a version of RTW (RTWC74) in VS COBOL 74 to convert VS text files to variable-length format. (See figure 1 for listing.) This program is similar in concept to the original version of RTW, but uses the COBOL RECORD VARYING . . .DEPENDING ON format to specify the actual data area of each record.

RTWC74 explained

Variable-length records are specified in VS COBOL through us of the RECORD VARYING. . .DEPENDING ON clause in the File Descriptor (FD). This feature is available in COBOL 74 versions beginning with release 5.10 and in all versions of COBOL 85. The FD for the output file (TEXTOUT) specifies that the record length depends on the value of TEXT-POINTER. Note that the file is also compressed.

The remainder of the program is reasonably straightforward. It accepts the name of an 80-character text file (TEXTIN) and prompts for an output file location. The record count of TEXTIN is extracted with the READFDR subroutine (a VSSUB) and used as the initial value of space for TEXTOUT (see the variable TEXTOUT-FILE-SIZE). As each TEXTIN record is read, the MOVE-LINE paragraph is executed repeatedly as TEXT-POINTER is moved backwards along the record, examining it's contents; non-blank data items trigger CHARACTER-FOUND-FLAG and indicate the logical end of the record.

The version shown here works only for 80-character consecutive files and assumes that the entire record area is to be examined. I have also written more elaborate versions that read and write a variety of record sizes and review other areas of the record, removing line numbers, modification codes, etc. This version will work with both COBOL 74 and COBOL 85.

For those using COBOL 85, the use of Reference Modification (substrings) will improve the performance of the program. Figure 2 shows the two changes necessary to take advantage of this capability. In my tests, this change resulted in a 56% improvement in run time over the COBOL 74 version.

New COMPARE easier to use

Beginning with OS 7.20, Wang has absorbed support of some of the most important utilities formerly considered ISWUAIDS (or USERAIDS). In most cases, the replacements offer much better features and performance than their predecessors - not to mention support through Wang. Over the next few months I will cover a few of these items and compare features with the older versions covered in my book, USERAIDS: A Guide to Low-Cost VS Software.

The COMPARE utility distributed by the International Society of Wang Users (ISWU; version 2.00.00) is simply a menu of three other utilities: COMPAREF, which shows two files simultaneously on a split screen; COMPLIB (later versions are known as COMPRLIB), which compares catalog statistics on two libraries and optionally compares records; and SRCDIFF (later versions are known as DIFF), which compares selected columns in source files and lists the differences. The new COMPARE (version 3.01.01) combines all of these programs into one and neatly merges their function.

The first screen prompts for the files or libraries to be compared and the comparison option. The three comparison options - DISPLAY, SUMMARY, and SRCDIFF - roughly correspond to COMPAREF, COMPLIB, and SRCDIFF; an exception is the library comparison option, which must be done with the DISPLAY option but reviews file statistics like COMPLIB. All three options end with like SUMMARY, with the EOJ screen displaying the number of records in each file (file option) or files compared (library option).

When libraries are compared, the SRCDIFF or COMPAREF options can be invoked by selecting the file from the listing of files found in both libraries and pressing (RETURN). All print files generated by any of the options can be displayed from COMPARE. One minor annoyance remains from prior versions: the SRCDIFF listings are located in a different library (format: #uidDIFF) than the user's usual print files.

If all this wasn't enough, the utility also comes with a Wang INFO file for on-line documentation. All in all, a very worthwhile addition to your arsenal of comparison utilities. Bravo, Wang!


Figure 1: RTWC74 - Remove Trailing Whitespace (COBOL-74 version)


 IDENTIFICATION DIVISION.

 PROGRAM-ID.         RTWC74.

 DATE-WRITTEN.       05/02/89.
 AUTHOR.             Dennis S. Barnes

* This program removes trailing whitespace from a 80-character
* text file.

 ENVIRONMENT DIVISION.
 CONFIGURATION SECTION.

 INPUT-OUTPUT SECTION.
 FILE-CONTROL.

     SELECT TEXTIN
         ASSIGN                  TO "TEXTIN" "DISK"
         ORGANIZATION            IS SEQUENTIAL
         ACCESS MODE             IS SEQUENTIAL
         FILE STATUS             IS TEXTIN-FILE-STATUS.

     SELECT TEXTOUT
         ASSIGN                  TO "TEXTOUT" "DISK"
         ORGANIZATION            IS SEQUENTIAL
         ACCESS MODE             IS SEQUENTIAL
         FILE STATUS             IS TEXTOUT-FILE-STATUS.

 DATA DIVISION.
 FILE SECTION.

 FD  TEXTIN
     LABEL RECORDS ARE STANDARD
     VALUE OF SPACE              IS TEXTIN-SPACE
     VALUE OF FILENAME           IS TEXTIN-FILE
              LIBRARY            IS TEXTIN-LIBRARY
              VOLUME             IS TEXTIN-VOLUME.

 01  TEXTIN-RECORD.
     05  TEXT-CHARACTER                  PIC  X(01)
                                         OCCURS 80 TIMES.

 FD  TEXTOUT
     LABEL RECORDS ARE STANDARD
     RECORD VARYING FROM 1 TO 80 COMPRESSED CHARACTERS
         DEPENDING               ON TEXT-POINTER
     VALUE OF SPACE              IS TEXTIN-SPACE.

 01  TEXTOUT-RECORD                      PIC  X(080).

 WORKING-STORAGE SECTION.

 01  CHARACTER-FOUND-FLAG                PIC  X(01).
     88  CHARACTER-FOUND                 VALUE "Y".

 01  TEXTIN-FILE-STATUS                  PIC  X(02).
     88  TEXTIN-END-OF-FILE              VALUE "10".

 01  TEXTIN-SPACE                        PIC S9(07) COMP.

 01  TEXTIN-FILE-LOCATION.
     05  TEXTIN-FILE                     PIC  X(08).
     05  TEXTIN-LIBRARY                  PIC  X(08).
     05  TEXTIN-VOLUME                   PIC  X(06).

 01  TEXTOUT-FILE-STATUS                 PIC  X(02).

 01  TEXT-POINTER                        USAGE BINARY.


 PROCEDURE DIVISION.

 0000-MAIN-PARAGRAPH.
     PERFORM U-INIT.
     PERFORM 1000-MOVE-LINE UNTIL TEXTIN-END-OF-FILE.
     PERFORM U-END-JOB.

 1000-MOVE-LINE.
     PERFORM 1100-EXAMINE-CHARACTER
         VARYING TEXT-POINTER
         FROM 80 BY -1
             UNTIL TEXT-POINTER < 2
             OR    CHARACTER-FOUND.

     COMPUTE TEXT-POINTER
         =   TEXT-POINTER + 1.

     PERFORM U-TEXTOUT-WRITE.

     MOVE SPACES                 TO CHARACTER-FOUND-FLAG
                                    TEXTOUT-RECORD
                                    TEXTIN-RECORD.

     PERFORM U-TEXTIN-READ-NEXT.

 1100-EXAMINE-CHARACTER.
     IF  TEXT-CHARACTER (TEXT-POINTER) NOT = SPACES
         MOVE "Y"                TO CHARACTER-FOUND-FLAG.

 U-TEXTIN-READ-NEXT.
     READ TEXTIN
         NEXT RECORD
         AT END
             MOVE TEXTIN-FILE-STATUS
                                 TO TEXTIN-FILE-STATUS.

 U-TEXTOUT-WRITE.
     WRITE   TEXTOUT-RECORD      FROM TEXTIN-RECORD.
     MOVE +1                     TO TEXT-POINTER.

 U-INIT.
     OPEN INPUT      TEXTIN.
     OPEN OUTPUT     TEXTOUT.

     PERFORM U-TEXTIN-READ-NEXT.

 U-END-JOB.
     CLOSE  TEXTIN.
     CLOSE  TEXTOUT.
     STOP RUN.

Figure 2: Changes to RTWC74 for COBOL 85 Constructs

Changes to the TEXTIN File Description (FD):


 01  TEXTIN-RECORD.
     05  TEXT-CHARACTER      PIC  X(80).

Changes to the EXAMINE-CHARACTER paragraph:


 1100-EXAMINE-CHARACTER.
     IF  TEXT-CHARACTER (TEXT-POINTER:1) NOT = SPACES
         MOVE "Y"            TO CHARACTER-FOUND-FLAG.

  [ Prior Article ]     [ Return to the Catalog of articles ]     [ Next Article ]  


Copyright © 1989 Dennis S. Barnes
Reprints of this article are permitted without notification if the source of the information is clearly identified