[Access to Wang masthead]

Living Without the CREATE Utility

Making do in Unix

From "Migration",  Access to Wang, April 1994
  [ Prior Article ]     [ Return to the Catalog of articles ]     [ Next Article ]  

One of the first questions I hear from VS programmers when Unix is mentioned is "What about the CREATE utility?" Many of us have come to depend on a few important VS tools, and CREATE is certainly a member of that group. CREATE and the File Management Utilities (REPORT, INQUIRY) have substituted for programs many times for many users. The existence of CREATE and other strong tools distinguishes the VS environment and has contributed to its success with technical users.

Unix contains a rich set of general purpose utilities that are particularly helpful in processing consecutive files - the native file format for Unix systems. Unlike the broad range of tasks performed by most VS utilities, these Unix tools are designed to perform single tasks and must be combined with other small tools to form workable programs. Fortunately, Unix also offers strong methods for combining such programs, including Standard I/O and powerful script languages.

All of the tools mentioned below should be available on all Unix systems. I am intentionally brief in descriptions about each, since most have a plethora of options and minor differences between versions of Unix. Consult the documentation for your system for specific details on each.

Replacements for the CREATE utility

First of all, there are no direct replacements for most functions of CREATE. The examples below attempt to show how some real-world problems can be solved by combining small tools. All of the examples are based on consecutive files; indexed files must be built using vendor-supplied utilities for each indexing scheme.

I assume here that you are generally familiar with the CREATE utility and have access to a Unix prompt. In the examples below the sample files have fields in specific positions (not separated by tabs or other characters) and records end with line feeds; in other words, they are fixed-format consecutive files. (In general, Unix tools expect line feeds as delimiters between records, though some can be set up to use character counts and byte offsets instead.)

Like CREATE, using Unix text tools to create data requires some research and care in specification. Field positions, line and record delimiters, and other technical characteristics of input files must be clearly known for successful results. Complex file merges may require the use of intermediate work files, so it may be necessary to know where space is available on your system.

Creating an empty file

Easy. The touch utility will create a file if no file exists by that name previously. (The other purpose of touch is to update the date and time the file was last modified.) For example: touch myfile.txt creates a new file. Since Unix does not know or care about file parameters (record size, expected record count, field contents, etc.), much of the complexity of defining a Wang output file is removed.

An alternative approach: use the echo command and direct the output to a file:


echo "" > myfile.txt

Merge files

Also simple. I have written many VS procedures that search for and combine files, but the Unix cat function makes them look ridiculously complex. Cat - like the MS-DOS TYPE command - simply dumps the contents of files to Standard Output. By redirecting this output to a file, it is possible to create new files by combining the contents of other files. For example,


cat myfile1.txt myfile2.txt > newfile.txt

will combine the contents of two files into a third. Cat will also accept wild cards, so


cat myfile*txt > newfile.txt 

will find and combine all files that begin with "myfile" and end with "txt".

Select rows from an input file

A frequent use of CREATE is to form a new file using specific rows from an input file. Typically this approach is used with text files of some sort, such as a printed report, where specific records are to be extracted and displayed or processed further. Record selections in CREATE are based on specific column positions and simple match logic, though multiple CREATE selection blocks can be used to select on more criteria.

While it is possible to use awk and other relatively complex tools to select rows based on their column position, it is often easier to look for specific text instead. For example, if you were selecting detail lines from a report and ignoring the headers you might be able to find some text element that is only found on those lines.

(The next examples will be based on a sample report shown in Figure 1, below.) If you were looking for transactions with a NEW status, you could use a search utility to isolate those records and discard the rest. The grep tool performs this function. Entering


grep NEW sample.txt 

searches the file sample.txt for the string 'NEW', producing screen output showing only the lines containing that phrase. Since an output file is normally required, the statement can redirect the output to the screen to a file instead:


grep NEW sample.txt > newfile.txt

Suppose you were interested in all detail lines but not in the headers. You could search for some text that is unique to the detail lines, but there does not appear to be any single characters that are unique. For example, entering


grep / sample.txt

would return all of the detail lines and all of the headers, since that character appears in both. The simple solution is to perform such extractions one parameter at a time, saving the output results from each and using those files as input for the next process. Using the same sample data, enter


grep / sample.txt > work1.txt

to produce a work file containing headers and detail lines, followed by


grep -v Date work1.txt > results.txt 

to filter out the header records. The -v option tells grep to echo all lines except those that contain the string, thus bypassing the header lines.

Piping the output from one command to another can reduce or eliminate the need for intermediate work files. Pipes allow the output from one process to become the input to another. In order to work, the programs involved must allow Standard Input and Standard Output to substitute for files or other I/O, but nearly all Unix utilities meet this requirement. Using pipes, this problem could be restated in a single line as


grep / sample.txt | grep -v Page > results.txt

Selecting columns from a report

Another typical use for CREATE is to rearrange field elements to produce a new output form. The cut utility can be used to select specific columns from rows by their specific character position. To select some specific columns from the results in the prior example, try


cut -c3-6,12-23,31-34 results.txt

Translated, this means "cut positions 3 through 5, 12 through 23, and 31 through 34 from file results.txt." This statement will produce this listing:


CHG 940117 08:46 ABC
CHG 940117 10:17 ABC
NEW 940117 16:30 PHY
NEW 940118 09:54 XYZ
NEW 940118 09:55 DEF
NEW 940118 10:01 XYZ
NEW 940118 10:05 DEF
TRM 940118 10:06 DEF
NEW 940118 10:08 XYZ

If you look at the column positions carefully you might notice I have selected some positions that include spaces as well as the field area. Cut does not provide a routine means of adding separation between fields, so I used spaces in the existing column data to satisfy this need. If your output is intended to be used as a data file, such separation may not be necessary.

Condensing duplicate data

Sometimes your goal is to summarize a large amount of line items by counting their occurrences. In the past, I have performed this kind of analysis by extracting data from reports with CREATE, creating a custom CONTROL file, and using REPORT to sort and summarize. All of these actions can be performed easily using Unix text tools. Consider this statement and the resulting table it produces:


cut -c 3-6,32-34 results.txt | sort | uniq -c

   2 CHG
   6 NEW
   1 TRM

After cutting specific fields from the results file created above, the output is sorted and passed to the uniq utility, which returns only the unique lines in a sorted file, suppressing repeated lines. The -c option causes uniq to add a count of occurrences in the first column. Thus, you can see that there were 2 CHG records, 6 NEW records, and 1 TRM record.

More examples

There are many more examples of replacements for CREATE and other utilities. Please forward your suggestions and I will try to cover them in the future. In the meanwhile, I hope these examples illustrate some of the possibilities of standard text tools for solving daily problems.


Figure 1: Sample Report Using Unix Tools


4/1/94                              TRANSACTIONS - 1/17-1/20
Page:  1

     TRANCODE  TRANDATE  TRANTIME  USERID    EFFDATE

     CHG  00940117  08:46:1662     ABC  01/31/94
     CHG  00940117  10:17:5403     ABC  12/31/93
     NEW  00940117  16:30:1999     PHY  01/07/94
     NEW  00940118  09:54:0173     XYZ  01/01/94
     NEW  00940118  09:55:2205     DEF  01/01/94
     NEW  00940118  10:01:0058     XYZ  01/01/94
     NEW  00940118  10:05:5695     DEF  01/01/94
     TRM  00940118  10:06:4761     DEF  01/31/94
     NEW  00940118  10:08:5311     XYZ  01/01/94

Unix Bookshelf

UNIX Applications Programming: Mastering the Shell
Ray Swartz
Sams, Carmel, Indiana; 1990
ISBN 0-672-22715-0

Provides detailed information on using Unix tools and Bourne shell scripts for application programming. Examples include mail merge, handy utility programs, and a complete Accounts Receivable system. Good description of Regular Expressions.


  [ Prior Article ]     [ Return to the Catalog of articles ]     [ Next Article ]  


Copyright © 1994 Dennis S. Barnes
Reprints of this article are permitted without notification if the source of the information is clearly identified