[Access to Wang masthead]

Cross-Platform File Archiving

Combine, compress, retrieve and store files with common utilities

From "Migration",  Access to Wang, June 1994
  [ Prior Article ]     [ Return to the Catalog of articles ]     [ Next Article ]  

File archiving tools perform a number of file storage and retrieval functions, including combination of multiple files into a single entity (concatenation) and reduction of the disk space requirements for these combined files (compression). Traditionally, the Unix tar ("tape archive") and compress tools have been used in combination to produce the same results, requiring a two-step process: combination of files using tar and reduction of file size with compress. In the MS- DOS world, PKZip is widely used to consolidate multiple files into compact archives. How can these two worlds communicate?

While PC versions of Unix tools like tar and compress are available, it is not convenient to distribute archives created with these tools to others since it is unlikely that they will have the means to extract the files. A far better solution is to choose file archiving tools that are available for a variety of platforms. Fortunately, several excellent programs are available to fill this need.

Why would you need to consider using a file archiver? Perhaps you wish to save prior copies of transaction files or reports without a large impact to your disk, or reduce the amount of time required to transfer a file by modem. Maybe you need to produce large text files from Unix programs and distribute them to PC and Macintosh users, or distribute a software package that consists of several subdirectories and a myriad of files. Whatever your reasons, file archive tools can help.

Of the many archive tools available for MS-DOS and Unix, we will concentrate on two that are available for both platforms and without cost:

There are also many front-end programs available for MS-DOS and Windows to make it easier and safer to use file archive utilities. Most of these tools provide a simple point-and-shoot screen instead of requiring command-line entries, and most support ZIP and LHa formats. Typically, these programs are distributed as shareware and can be licensed for minimal cost; one program - WIZUNZIP - is actually part of the UNZIP distribution and is free of cost. Most of the newer Windows-based tools support drag-and-drop use, so it is possible to archive a file by simply dragging the file name over to an icon. Consult bulletin boards or other PC program sources for availability.

Beginner's guide to archiving

File archive tools combine multiple files and use compression techniques to reduce the storage required for the combined files. They share compression techniques with disk compression tools like Doublespace and Stacker but differ in one important respect: you must take action to store or retrieve files stored in an archive, while disk compression utilities mimic all of the behavior of the usual file system. File archive tools can also be used for backup, though they are usually much slower than good backup tools.

Archiving a set of files with either ZIP or LHa requires entry of simple commands or use of one of the PC front-end tools described above. The generic command-line syntax looks like this:

archive_tool [options] archive_name[.ext] [input_file_name]

Archive_tool is the name of the file archive program (ZIP, LHA, etc.). A number of options (Figure 1) may by specified to control how the archive is performed. (See Figure 2, Figure 3, and Figure 4 (below) for a list of these options.) The name of the archive file and, optionally, a file extension are the next group of items, followed by the name of the input file. Most archive tools allow ambiguous file names (wild cards) to be used for the input specification. Thus, zip -u myzip.zip *.* would run the ZIP program, updating archive file myzip.zip with all files in the current directory. Other examples:

c:\> unzip a:bigfile.zip

(Extracts archive file bigfile and places the files the current directory)

c:\> lha m allfiles *.ini

(Copies all files ending with .ini into archive file allfiles, deleting the input files afterwards; e.g., it moves the files into the archive)

ZIP and LHa archive files are usually identifiable by their file extension: .zip (ZIP) or .lzh (LHa). It is not usually necessary to enter the extension when specifying the archive name; in the second example (above), the archive file name would be allfiles.lzh. (More about issues with file naming conventions, below.)

If you want to know the contents of an archive file you must determine the type of archive and use the appropriate tool to extract a file list. ZIP files can be viewed by the command unzip -l myarch; LHarc file listing can be had by entering lha l myarch. If the file listing is long it is helpful to send it to a paging utility so you can see each page in turn; e.g. unzip -l myarch | more. Substituting the -v option would produce a longer (verbose) file listing, including modification times and file protections.

Using archive tools in production jobs

One example of a simple use of ZIP is to add files to an archive under program control. Our shop uses scripts and the background scheduler cron to run routine jobs, and many of those scripts include commands to add report files to an archive. These reports can be retained for future reference at roughly 15% of their original size. If the files to be archived have predictable name prefixes it is easy to automate this storage. Consider this Unix script:


#!/bin/sh
# runbigjob.sh - runs BIGJOB and copies report files to archive
files

PRINTARCH='/usr/sysbak/printarch'
SPOOLIB='/usr/sysbk1/%dsbprt'
wrun BIGJOB
zip $PRINTARCH/bigjob $SPOOLIB/bigj*

This Unix script fragment assumes that you have stored the paths of the print archive and spool library directories in the environment variables PRINTARCH and SPOOLIB. After running the job, print files beginning with 'bigj' are stored in /usr/sysbak/printarch/bigjob.zip. Similar processes can be designed for a number of other purposes.

Issues

As with many tools, selecting file archive utilities requires acknowledgment and understanding of some issues. It's not possible to cover all of these issues in the space of this column, but here are a few important considerations:

Executable PC versions of ZIP, UNZIP, and LHa are routinely available from PC-based bulletin boards and through commercial networks (CompuServe, etc.). Unix versions can be found on the Internet and in the library of the Unix forum on CompuServe. In general, only source files are available for Unix versions and it will be necessary to compile them in your environment. All are written in C and contain reasonable (though terse) installation instructions. If you are unfamiliar with the C language and the make discipline, get a friend to help you compile these tools on your system. I'll cover advanced archiving techniques in the future. Until then, consider incorporating these excellent tools into your environment.


Figure 1: Comparison of Archive Tool Capabilities

OptionLHAZIP/
UNZIP
Self-extracting archives (MS-DOS versions only) Y N
Option to split large archive across disks N Y
Macintosh version available N Y
Windows versions available N Y
VAX (VMS) version available N Y
OS/2, NT versions available N Y
Other versions available (Amiga, Atari, etc.) N Y
Source code available Y Y
Store comments by archive N Y
Store comments by file N Y
Accept file names through Standard Input N Y
Write extracted file to Standard Output N Y
Streaming I/O (Standard Input to Standard Output) N Y
Convert text file line endings Y Y
Test integrity of archive Y Y
Support for encryption N Y

Figure 2: ZIP Options


Copyright (C) 1990-1993 Mark Adler, Richard B. Wales, Jean-loup Gailly
and Kai Uwe Rommel. Type 'zip -L' for the software License.

Zip 2.0.1 (Sept 18th 1993). Usage:
zip [-options] [-b path] [-t mmddyy] [-n suffixes] [zipfile list] [-xi list]
  The default action is to add or replace zipfile entries from list, which
  can include the special name - to compress standard input.
  If zipfile and list are omitted, zip compresses stdin to stdout.
  -f   freshen: only changed files  -u   update: only changed or new files
  -d   delete entries in zipfile    -m   move into zipfile (delete files)
  -k   simulate PKZIP made zipfile  -g   allow growing existing zipfile
  -r   recurse into directories     -j   junk (don't record) directory names
  -0   store only                   -l   convert LF to CR LF (-ll CR LF to LF)
  -1   compress faster              -9   compress better
  -q   quiet operation              -v   verbose operation
  -c   add one-line comments        -z   add zipfile comment
  -b   use "path" for temp file     -t   only do files after "mmddyy"
  -@   read names from stdin        -o   make zipfile as old as latest entry
  -x   exclude the following names  -i   include only the following names
  -F   fix zipfile (-FF try harder) -D   do not add directory entries
  -T   test zipfile integrity       -L   show software license
  -y   store symbolic links as the link instead of the referenced file
  -h   show this help               -n   don't compress these suffixes

Figure 3: UNZIP Options


UnZip 5.1 of 7 February 1994, by Info-ZIP.  Portions (c) 1989 by S. H. Smith.
Send bug reports to authors at zip-bugs@wkuvx1.wku.edu; see README for details.

Usage: unzip [-Z] [-opts[modifiers]] file[.zip] [list] [-x xlist] [-d exdir]
  Default action is to extract files in list, except those in xlist, to exdir;
  file[.zip] may be a wildcard.  -Z => ZipInfo mode ("unzip -Z" for usage).

  -c  extract files to stdout/screen (CRT)   -l  list files (short format)
  -p  extract files to pipe, no messages     -v  list files (verbose format)
  -f  freshen existing files, create none    -t  test compressed archive data
  -u  update files, create if necessary      -z  display archive comment
  -x  exclude files which follow (in xlist)  -d  extract files into exdir

modifiers:                                   -q  quiet mode (-qq => quieter)
  -n  never overwrite existing files         -a  auto-convert any text files
  -o  overwrite files WITHOUT prompting      -aa treat ALL files as text
  -j  junk paths (don't make directories)    -U  don't make names lowercase
                                             -V  retain VMS version numbers
Examples (see unzip.doc for more info):
  unzip data1 -x joe   => extract all files except joe from zipfile data1.zip
  unzip -p foo | more  => send contents of foo.zip via pipe into program more
  unzip -fo foo ReadMe => quietly replace existing ReadMe if archive file newer

Figure 4: LHa Options


LHa version 2.06                     Copyright (c) 1988-91, Haruyasu Yoshizaki
=== <<< High-Performance File-Compression Program >>> ==========  02/14/91  ==
  usage : LHa [aumfdpexlvst] [/rwxmpcazthonil-[-+012|WDIR]] LZH [DIR\] [FILES]
------------------------------------------------------------------------------
  <command>
     a: Add files           u: Update files        m: Move files
     f: Freshen files       d: Delete files        p: disPlay files
     e: Extract files       x: eXtract files with pathnames
     l: List of files       v: View listing of files with pathnames
     s: make a Self-extracting archive   t: Test the integrity of an archive
  <option>
     r: Recursively collect files        w: assign Work directory
     x: allow eXtended file names        m: no Message for query
     p: distinguish full Path names      c: skip time-stamp Check
     a: allow any Attributes of files    z: Zero compression (only store)
     t: archive's Time-stamp option      h: select Header level (default = 1)
     o: use Old compatible method        n: display No indicator a/o pathname
     i: not Ignore lower case            l: display Long name with indicator
     -: '-' or '@' as the first letter of filenames
+=============================================================================
You may copy or distribute without any donation to me,   Nifty-Serve  SDI00506
although a few restrictions for a commercial use.        ASCII-pcs    pcs02846
(See the User's Manual for detailed descriptions.)       PC-VAN       FEM12376

  [ Prior Article ]     [ Return to the Catalog of articles ]     [ Next Article ]  


Copyright © 1994 Dennis S. Barnes
Reprints of this article are permitted without notification if the source of the information is clearly identified