Living Without the CREATE UtilityMaking do in Unix |
|
From "Migration", Access to Wang, April 1994 |
|
[ Prior Article ] [ Return to the Catalog of articles ] [ Next Article ] |
One of the first questions I hear from VS programmers when Unix is mentioned is "What about the CREATE utility?" Many of us have come to depend on a few important VS tools, and CREATE is certainly a member of that group. CREATE and the File Management Utilities (REPORT, INQUIRY) have substituted for programs many times for many users. The existence of CREATE and other strong tools distinguishes the VS environment and has contributed to its success with technical users.
Unix contains a rich set of general purpose utilities that are particularly helpful in processing consecutive files - the native file format for Unix systems. Unlike the broad range of tasks performed by most VS utilities, these Unix tools are designed to perform single tasks and must be combined with other small tools to form workable programs. Fortunately, Unix also offers strong methods for combining such programs, including Standard I/O and powerful script languages.
All of the tools mentioned below should be available on all Unix systems. I am intentionally brief in descriptions about each, since most have a plethora of options and minor differences between versions of Unix. Consult the documentation for your system for specific details on each.
First of all, there are no direct replacements for most functions of CREATE. The examples below attempt to show how some real-world problems can be solved by combining small tools. All of the examples are based on consecutive files; indexed files must be built using vendor-supplied utilities for each indexing scheme.
I assume here that you are generally familiar with the CREATE utility and have access to a Unix prompt. In the examples below the sample files have fields in specific positions (not separated by tabs or other characters) and records end with line feeds; in other words, they are fixed-format consecutive files. (In general, Unix tools expect line feeds as delimiters between records, though some can be set up to use character counts and byte offsets instead.)
Like CREATE, using Unix text tools to create data requires some research and care in specification. Field positions, line and record delimiters, and other technical characteristics of input files must be clearly known for successful results. Complex file merges may require the use of intermediate work files, so it may be necessary to know where space is available on your system.
Easy. The touch utility will create a file if no file exists by that name previously. (The other purpose of touch is to update the date and time the file was last modified.) For example: touch myfile.txt creates a new file. Since Unix does not know or care about file parameters (record size, expected record count, field contents, etc.), much of the complexity of defining a Wang output file is removed.
An alternative approach: use the echo command and direct the output to a file:
echo "" > myfile.txt
Also simple. I have written many VS procedures that search for and combine files, but the Unix cat function makes them look ridiculously complex. Cat - like the MS-DOS TYPE command - simply dumps the contents of files to Standard Output. By redirecting this output to a file, it is possible to create new files by combining the contents of other files. For example,
cat myfile1.txt myfile2.txt > newfile.txtwill combine the contents of two files into a third. Cat will also accept wild cards, so
cat myfile*txt > newfile.txtwill find and combine all files that begin with "myfile" and end with "txt".
A frequent use of CREATE is to form a new file using specific rows from an input file. Typically this approach is used with text files of some sort, such as a printed report, where specific records are to be extracted and displayed or processed further. Record selections in CREATE are based on specific column positions and simple match logic, though multiple CREATE selection blocks can be used to select on more criteria.
While it is possible to use awk and other relatively complex tools to select rows based on their column position, it is often easier to look for specific text instead. For example, if you were selecting detail lines from a report and ignoring the headers you might be able to find some text element that is only found on those lines.
(The next examples will be based on a sample report shown in Figure 1, below.) If you were looking for transactions with a NEW status, you could use a search utility to isolate those records and discard the rest. The grep tool performs this function. Entering
grep NEW sample.txtsearches the file sample.txt for the string 'NEW', producing screen output showing only the lines containing that phrase. Since an output file is normally required, the statement can redirect the output to the screen to a file instead:
grep NEW sample.txt > newfile.txtSuppose you were interested in all detail lines but not in the headers. You could search for some text that is unique to the detail lines, but there does not appear to be any single characters that are unique. For example, entering
grep / sample.txtwould return all of the detail lines and all of the headers, since that character appears in both. The simple solution is to perform such extractions one parameter at a time, saving the output results from each and using those files as input for the next process. Using the same sample data, enter
grep / sample.txt > work1.txtto produce a work file containing headers and detail lines, followed by
grep -v Date work1.txt > results.txtto filter out the header records. The -v option tells grep to echo all lines except those that contain the string, thus bypassing the header lines.
Piping the output from one command to another can reduce or eliminate the need for intermediate work files. Pipes allow the output from one process to become the input to another. In order to work, the programs involved must allow Standard Input and Standard Output to substitute for files or other I/O, but nearly all Unix utilities meet this requirement. Using pipes, this problem could be restated in a single line as
grep / sample.txt | grep -v Page > results.txt
Another typical use for CREATE is to rearrange field elements to produce a new output form. The cut utility can be used to select specific columns from rows by their specific character position. To select some specific columns from the results in the prior example, try
cut -c3-6,12-23,31-34 results.txtTranslated, this means "cut positions 3 through 5, 12 through 23, and 31 through 34 from file results.txt." This statement will produce this listing:
CHG 940117 08:46 ABC CHG 940117 10:17 ABC NEW 940117 16:30 PHY NEW 940118 09:54 XYZ NEW 940118 09:55 DEF NEW 940118 10:01 XYZ NEW 940118 10:05 DEF TRM 940118 10:06 DEF NEW 940118 10:08 XYZIf you look at the column positions carefully you might notice I have selected some positions that include spaces as well as the field area. Cut does not provide a routine means of adding separation between fields, so I used spaces in the existing column data to satisfy this need. If your output is intended to be used as a data file, such separation may not be necessary.
Sometimes your goal is to summarize a large amount of line items by counting their occurrences. In the past, I have performed this kind of analysis by extracting data from reports with CREATE, creating a custom CONTROL file, and using REPORT to sort and summarize. All of these actions can be performed easily using Unix text tools. Consider this statement and the resulting table it produces:
cut -c 3-6,32-34 results.txt | sort | uniq -c 2 CHG 6 NEW 1 TRMAfter cutting specific fields from the results file created above, the output is sorted and passed to the uniq utility, which returns only the unique lines in a sorted file, suppressing repeated lines. The -c option causes uniq to add a count of occurrences in the first column. Thus, you can see that there were 2 CHG records, 6 NEW records, and 1 TRM record.
There are many more examples of replacements for CREATE and other utilities. Please forward your suggestions and I will try to cover them in the future. In the meanwhile, I hope these examples illustrate some of the possibilities of standard text tools for solving daily problems.
Figure 1: Sample Report Using Unix Tools
4/1/94 TRANSACTIONS - 1/17-1/20 Page: 1 TRANCODE TRANDATE TRANTIME USERID EFFDATE CHG 00940117 08:46:1662 ABC 01/31/94 CHG 00940117 10:17:5403 ABC 12/31/93 NEW 00940117 16:30:1999 PHY 01/07/94 NEW 00940118 09:54:0173 XYZ 01/01/94 NEW 00940118 09:55:2205 DEF 01/01/94 NEW 00940118 10:01:0058 XYZ 01/01/94 NEW 00940118 10:05:5695 DEF 01/01/94 TRM 00940118 10:06:4761 DEF 01/31/94 NEW 00940118 10:08:5311 XYZ 01/01/94
UNIX Applications Programming: Mastering the Shell
Ray Swartz
Sams, Carmel, Indiana; 1990
ISBN 0-672-22715-0
Provides detailed information on using Unix tools and Bourne shell scripts for application programming. Examples include mail merge, handy utility programs, and a complete Accounts Receivable system. Good description of Regular Expressions.
Copyright © 1994 Dennis S. Barnes
Reprints of this article are permitted without notification
if the source of the information is clearly identified