Archiving Tools ContinuedPutting some ZIP in your system |
|
From "Migration", Access to Wang, July 1994 |
|
[ Prior Article ] [ Return to the Catalog of articles ] [ Next Article ] |
Last month I compared a few archive tools and showed some of the ways they can be used. This time we can review these tools as alternative to traditional Unix commands in production use. I will use the ZIP/UNZIP tools in the examples below. Most of the methods described here will work with most of the archive tools described previously - ZIP/UNZIP, LHarc, PKZIP - and others that were not mentioned, including Gnu ZIP (gzip), ZOO, Arc, Sit, and others. I prefer ZIP to other archive tools because it has the greatest amount of support on all platforms (particularly MS-DOS) and some unique features all its own.
Copies of these and other tools are available from PC Bulletin Boards, on-line services (CompuServe, America Online, etc.), or the Internet.
Archive tools can help you organize files, retain prior versions for future reference, and reduce the amount of disk space used by these files. The traditional Unix method for creating such archives is by using a combination of the tar and compress commands. tar ("tape archiver") combines one or more files into a single output, while compress reduces the size of a file. Since files created with tar are not compressed, the archive file is the same size as all of the files within it combined, so compress is needed to reduce the file size. In comparison, ZIP and similar tools provide archiving and, optionally, file compression within a single utility.
Archive tools like ZIP also much easier to use. Suppose you wish to create a compressed archive file and remove the original copies of the files afterwards. Using traditional Unix methods, you might enter the following commands:
tar cvf myarch * # create myarch.tar compress myarch.tar # compress myarch.tar to myarch.tar.Z rm * # remove the original filesTo create the same archive file using ZIP, you would merely enter zip -m myarch * (translated, "move all files into myarch.zip"). The resulting ZIP file would be smaller by about 20% and it would take less time to create the archive.
The file naming requirements of the tar and compress tools are inflexible: file names must end with .Z (upper-case Z) to be recognized by compress, and the .tar extension should be added for tar. The result is often a file name that does not fit into the file name requirements of other systems and may even exceed the 14- character limit of some Unix file systems. For example, the archive file long.program.tar.Z would be truncated on our Unix system to long.program.ta - indistinguishable from its uncompressed form long.program.tar and resulting in a duplicate file name error whenever the compress utility is used to extract it. In comparison, ZIP prefers to have a .zip extension, but any file name is acceptable; long.program.zi would work fine.
Finally, it is difficult to assess the contents of files archived using traditional Unix tools: the tared files must be uncompressed before a listing of the archive's contents can be viewed, resulting in a temporary loss of file space. ZIP files can be interrogated directly using the -l (listing) or -v (verbose listing) options of UNZIP, and these file names can in turn be passed to other programs or placed in files for other purposes. The syntax for this file listing is the same for any platform: enter unzip -l myarch.zip to view the contents of archives created in the Unix, MS-DOS, or any other environment.
Naturally, you will lose some portability to other versions of Unix if you use ZIP instead of tar and compress; that is the reason nearly all Unix files in public archives are maintained this way. ZIP files are mostly of interest if your destination is internal or one of several desktop systems, such as MS-DOS, Apple, or others.
It is often useful to incorporate the creation of an archive file into a production process. In some cases, the archive file itself is the product; for example, the resulting files from a large extract may be bound for an MS-DOS system. Shell scripts provide a convenient way of creating such archive files.
In the script example shown in Figure 1, the first line moves the current working directory location to a common area. The next three lines are typical job-stream commands, each producing a single output file in the working directory. The ZIP program is invoked using the - m (move) option to create the archive file.
Figure 2 shows a revision of this script that reduces the amount of file space required to run the job by incrementally storing each output file as it is created. The last step also introduces another refinement: automatic notification of the job run and archive contents. In this example, the UNZIP command is used to interrogate myarch.zip and the resulting file list is mailed to user dsb. (The Unix mail system - though crude in its presentation - provides excellent opportunities for automated notification such as this. More on this topic next month.)
The following is a potpourri of small program solutions that archive tools can help provide:
Finding and backing up files across the system: If you have a number of small files scattered across directories on the system, it is often difficult to locate and back them up. For example, .ini files on PCs contain critical information that changes daily, but most are buried in program directories that may not be backed up frequently. I use a file name search tool in combination with the capabilities of ZIP to accept file names through Standard Input to meet this need, like this:
whereis *.ini | zip iniback -@This statement uses the common DOS system utility whereis.com to locate files with an extension of .ini, then passes the files names to ZIP through a pipe (Standard Input). The Unix find utility could be used to locate files instead, allowing powerful searches by date, time, user ID, or other factors. The result: a cross-system backup of these small but important files.
Another way to search for file names is through the ZIP "include" specification, -i. The same search could be expanded to include files with the extension of .grp with the following statement:
zip -r iniback c:\*.* -i *.ini *.grpTranslated, this means "zip recursively all files on the c: drive, but include only those with extensions of .ini or .grp." Using dates as names: Production archive files need to be identified uniquely, and it is sometimes necessary to retain multiple copies of file sets. Using portions of the date or time provides a convenient means of providing this visibility. Many versions of Unix allow use of the date command to provide a formatted output that can meet this need. If supported by your system, the following syntax should provide the current date in year/month/day (YYMMDD) format in the shell variable FILENAME:
FILENAME=`date +%y%m%d` . . . zip -m $FILENAME *In this example, the variable FILENAME is assigned the value of the date and the value of that name ($FILENAME) is later used as a ZIP file name. The file name assignment uses the powerful Unix trick of command substitution to place the result of a process (the date command) into a variable. date can also supply other information, including the time, Julian date, time zone, and formatting characters such as new line, tab, etc.
Automating comments: Comments can be entered into the headers of zip files through the -z option (e.g. zip -z myarch) and typing the information to be included, ending with a period on the first line. Comments may also be included by combining information from a number of sources into a single text file and using that file as input to the comment process. Figure 3 shows this, using the same archive process as an example. A temporary file is created with the text to be included in the header. This file is added to the archive file, and the temporary file is removed. The resulting comment will be displayed every time an archive listing is produced.
As always, I hope you find some application of this information in your own environment. If you have any comments or specific archiving problems to solve, please send them and I will present them in this space.
Figure 1: Example of Script Using ZIP Archive
#!/bin/sh # prodzip - example of production ZIP usage cd /work/mydir # change to the working directory wrun process1 > file1 # run program process1 to produce file1 process2 file1 > file2 # run script file process2 to produce file2 # extract entries with "phy" from the passwd file grep -i phy /etc/passwd > file3 zip -m myarch * # move all files into myarch.zip
Figure 2: Revised Script Using ZIP Archive
#!/bin/sh # prodzip2 - example of production ZIP usage cd /work/mydir # change to the working directory wrun process1 > file1 # run program process1 to produce file1 zip -m myarch file1 # move file1 to archive process2 file1 > file2 # run script file process2 to produce file2 zip -m myarch file2 # move file2 to archive # extract entries with "phy" from the passwd file grep -i phy /etc/passwd > file3 zip -m myarch file3 # move file3 to archive unzip -v myarch | mail dsb # create archive listing; mail to dsb
Figure 3: ZIP Header Comments
#!/bin/sh # prodzip3 - example of production ZIP usage (comment headers) cd /work/mydir # change to the working directory wrun process1 > file1 # run program process1 to produce file1 zip -m myarch file1 # move file1 to archive process2 file1 > file2 # run script file process2 to produce file2 zip -m myarch file2 # move file2 to archive # extract entries with "phy" from the passwd file grep -i phy /etc/passwd > file3 zip -m myarch file3 # move file3 to archive # Create header comment in temporary file RUNDATE=`date +%y%m%d:%H%M` # formatted date and time # create header.txt file echo "Production file run: $RUNDATE" > header.txt echo "" >> header.txt echo "Please distribute to Rachel Smith in Accounting" >> header.txt echo "" >> header.txt # Add comment to archive; clean up temporary file zip -z myarch < header.txt rm header.txt # Create archive listing; mail to dsb unzip -v myarch | mail dsb
Copyright © 1994 Dennis S. Barnes
Reprints of this article are permitted without notification
if the source of the information is clearly identified