Cross-Platform File Archiving
Combine, compress, retrieve and store files with common utilities
From "Migration", Access to Wang, June 1994
|[ Prior Article ] [ Return to the Catalog of articles ] [ Next Article ]|
File archiving tools perform a number of file storage and retrieval functions, including combination of multiple files into a single entity (concatenation) and reduction of the disk space requirements for these combined files (compression). Traditionally, the Unix tar ("tape archive") and compress tools have been used in combination to produce the same results, requiring a two-step process: combination of files using tar and reduction of file size with compress. In the MS- DOS world, PKZip is widely used to consolidate multiple files into compact archives. How can these two worlds communicate?
While PC versions of Unix tools like tar and compress are available, it is not convenient to distribute archives created with these tools to others since it is unlikely that they will have the means to extract the files. A far better solution is to choose file archiving tools that are available for a variety of platforms. Fortunately, several excellent programs are available to fill this need.
Why would you need to consider using a file archiver? Perhaps you wish to save prior copies of transaction files or reports without a large impact to your disk, or reduce the amount of time required to transfer a file by modem. Maybe you need to produce large text files from Unix programs and distribute them to PC and Macintosh users, or distribute a software package that consists of several subdirectories and a myriad of files. Whatever your reasons, file archive tools can help.
Of the many archive tools available for MS-DOS and Unix, we will concentrate on two that are available for both platforms and without cost:
ZIP, UNZIP: Often confused with PKZip, a commercial tool from PKWare, the ZIP and UNZIP tools have similar features and can usually be used interchangeably with PKZip. Written by several authors and maintained by Info-ZIP, a software development cooperative. Current versions: 2.0.1 (September 18, 1993) for ZIP, 5.1 (February 1994) for UNZIP.
LHa (LHarc): A long-time favorite of PC users that is now generally available for Unix as well. Written by Haruyasu Yoshizaki. Current version: 2.06 (February 14, 1991).
There are also many front-end programs available for MS-DOS and Windows to make it easier and safer to use file archive utilities. Most of these tools provide a simple point-and-shoot screen instead of requiring command-line entries, and most support ZIP and LHa formats. Typically, these programs are distributed as shareware and can be licensed for minimal cost; one program - WIZUNZIP - is actually part of the UNZIP distribution and is free of cost. Most of the newer Windows-based tools support drag-and-drop use, so it is possible to archive a file by simply dragging the file name over to an icon. Consult bulletin boards or other PC program sources for availability.
File archive tools combine multiple files and use compression techniques to reduce the storage required for the combined files. They share compression techniques with disk compression tools like Doublespace and Stacker but differ in one important respect: you must take action to store or retrieve files stored in an archive, while disk compression utilities mimic all of the behavior of the usual file system. File archive tools can also be used for backup, though they are usually much slower than good backup tools.
Archiving a set of files with either ZIP or LHa requires entry of simple commands or use of one of the PC front-end tools described above. The generic command-line syntax looks like this:
archive_tool [options] archive_name[.ext] [input_file_name]
Archive_tool is the name of the file archive program (ZIP, LHA, etc.). A number of options (Figure 1) may by specified to control how the archive is performed. (See Figure 2, Figure 3, and Figure 4 (below) for a list of these options.) The name of the archive file and, optionally, a file extension are the next group of items, followed by the name of the input file. Most archive tools allow ambiguous file names (wild cards) to be used for the input specification. Thus, zip -u myzip.zip *.* would run the ZIP program, updating archive file myzip.zip with all files in the current directory. Other examples:
c:\> unzip a:bigfile.zip
(Extracts archive file bigfile and places the files the current directory)
c:\> lha m allfiles *.ini
(Copies all files ending with .ini into archive file allfiles, deleting the input files afterwards; e.g., it moves the files into the archive)
ZIP and LHa archive files are usually identifiable by their file extension: .zip (ZIP) or .lzh (LHa). It is not usually necessary to enter the extension when specifying the archive name; in the second example (above), the archive file name would be allfiles.lzh. (More about issues with file naming conventions, below.)
If you want to know the contents of an archive file you must determine the type of archive and use the appropriate tool to extract a file list. ZIP files can be viewed by the command unzip -l myarch; LHarc file listing can be had by entering lha l myarch. If the file listing is long it is helpful to send it to a paging utility so you can see each page in turn; e.g. unzip -l myarch | more. Substituting the -v option would produce a longer (verbose) file listing, including modification times and file protections.
One example of a simple use of ZIP is to add files to an archive under program control. Our shop uses scripts and the background scheduler cron to run routine jobs, and many of those scripts include commands to add report files to an archive. These reports can be retained for future reference at roughly 15% of their original size. If the files to be archived have predictable name prefixes it is easy to automate this storage. Consider this Unix script:
#!/bin/sh # runbigjob.sh - runs BIGJOB and copies report files to archive files PRINTARCH='/usr/sysbak/printarch' SPOOLIB='/usr/sysbk1/%dsbprt' wrun BIGJOB zip $PRINTARCH/bigjob $SPOOLIB/bigj*
This Unix script fragment assumes that you have stored the paths of the print archive and spool library directories in the environment variables PRINTARCH and SPOOLIB. After running the job, print files beginning with 'bigj' are stored in /usr/sysbak/printarch/bigjob.zip. Similar processes can be designed for a number of other purposes.
As with many tools, selecting file archive utilities requires acknowledgment and understanding of some issues. It's not possible to cover all of these issues in the space of this column, but here are a few important considerations:
Picky syntax differences: There are some big differences in syntax between archive products that can make you crazy. For example, many PC and Unix command-line tools use switch characters to help the utility determine the difference between a command and data. Unix tools traditionally use a dash (-a), while MS-DOS versions frequently use a forward slash (/a); LHa uses neither (a). The behavior of commands can also vary: the -m option in ZIP will move all files into the archive regardless of whether current versions are already present, while LHa's m option will move only those files that have more recent modification dates, saving much time.
Cross-platform extraction: Moving files between unlike systems requires some care in file and directory naming and other platform-specific parameters. Case-sensitive file systems like Unix can accept file names such as Big, big, and BIG as separate files, but MS-DOS will see those as duplicates and may require you to come up with a new name for each one. Long Unix file names are usually truncated by MS-DOS tools, and the presence of invalid characters in Unix file names (multiple periods, commas, imbedded spaces) can complicate extraction.
Text file line ending conversion: PC text files usually end with a carriage return/ line feed character pair. Unix text files end with a line feed only. Macintosh text files end with a carriage return. Most tools allow conversion of these characters, but be careful: selecting these options with binary data will ruin the output files.
Differences in versions: Version levels often mean radical changes in file format and loss of backward compatibility. The current version of the commercial PKWare products (PKZIP 2.04g) introduced a new compression form that cannot be extracted by earlier versions of their product (typically 1.1), but UNZIP usually handles this correctly. The current version of ZIP added the new compression form of PKZIP 2.04g with a similar loss of compatibility. Be aware of the versions used by your shop and any others you work with.
Performance: ZIP, UNZIP, and LHa are excellent tools, but commercial products often work faster and produce smaller output files. This should be considered if performance is an issue.
Support for wild cards: MS-DOS and Unix wild-card conventions differ in small ways. The expression *.* extracts all files in MS-DOS, while * works for Unix. As a Unix product moved to MS- DOS, early versions of ZIP behaved as Unix programs even in their PC versions.
Executable PC versions of ZIP, UNZIP, and LHa are routinely available from PC-based bulletin boards and through commercial networks (CompuServe, etc.). Unix versions can be found on the Internet and in the library of the Unix forum on CompuServe. In general, only source files are available for Unix versions and it will be necessary to compile them in your environment. All are written in C and contain reasonable (though terse) installation instructions. If you are unfamiliar with the C language and the make discipline, get a friend to help you compile these tools on your system. I'll cover advanced archiving techniques in the future. Until then, consider incorporating these excellent tools into your environment.
Figure 1: Comparison of Archive Tool Capabilities
Option LHA ZIP/
Self-extracting archives (MS-DOS versions only) Y N Option to split large archive across disks N Y Macintosh version available N Y Windows versions available N Y VAX (VMS) version available N Y OS/2, NT versions available N Y Other versions available (Amiga, Atari, etc.) N Y Source code available Y Y Store comments by archive N Y Store comments by file N Y Accept file names through Standard Input N Y Write extracted file to Standard Output N Y Streaming I/O (Standard Input to Standard Output) N Y Convert text file line endings Y Y Test integrity of archive Y Y Support for encryption N Y
Figure 2: ZIP Options
Copyright (C) 1990-1993 Mark Adler, Richard B. Wales, Jean-loup Gailly and Kai Uwe Rommel. Type 'zip -L' for the software License. Zip 2.0.1 (Sept 18th 1993). Usage: zip [-options] [-b path] [-t mmddyy] [-n suffixes] [zipfile list] [-xi list] The default action is to add or replace zipfile entries from list, which can include the special name - to compress standard input. If zipfile and list are omitted, zip compresses stdin to stdout. -f freshen: only changed files -u update: only changed or new files -d delete entries in zipfile -m move into zipfile (delete files) -k simulate PKZIP made zipfile -g allow growing existing zipfile -r recurse into directories -j junk (don't record) directory names -0 store only -l convert LF to CR LF (-ll CR LF to LF) -1 compress faster -9 compress better -q quiet operation -v verbose operation -c add one-line comments -z add zipfile comment -b use "path" for temp file -t only do files after "mmddyy" -@ read names from stdin -o make zipfile as old as latest entry -x exclude the following names -i include only the following names -F fix zipfile (-FF try harder) -D do not add directory entries -T test zipfile integrity -L show software license -y store symbolic links as the link instead of the referenced file -h show this help -n don't compress these suffixes
Figure 3: UNZIP Options
UnZip 5.1 of 7 February 1994, by Info-ZIP. Portions (c) 1989 by S. H. Smith. Send bug reports to authors at email@example.com; see README for details. Usage: unzip [-Z] [-opts[modifiers]] file[.zip] [list] [-x xlist] [-d exdir] Default action is to extract files in list, except those in xlist, to exdir; file[.zip] may be a wildcard. -Z => ZipInfo mode ("unzip -Z" for usage). -c extract files to stdout/screen (CRT) -l list files (short format) -p extract files to pipe, no messages -v list files (verbose format) -f freshen existing files, create none -t test compressed archive data -u update files, create if necessary -z display archive comment -x exclude files which follow (in xlist) -d extract files into exdir modifiers: -q quiet mode (-qq => quieter) -n never overwrite existing files -a auto-convert any text files -o overwrite files WITHOUT prompting -aa treat ALL files as text -j junk paths (don't make directories) -U don't make names lowercase -V retain VMS version numbers Examples (see unzip.doc for more info): unzip data1 -x joe => extract all files except joe from zipfile data1.zip unzip -p foo | more => send contents of foo.zip via pipe into program more unzip -fo foo ReadMe => quietly replace existing ReadMe if archive file newer
Figure 4: LHa Options
LHa version 2.06 Copyright (c) 1988-91, Haruyasu Yoshizaki === <<< High-Performance File-Compression Program >>> ========== 02/14/91 == usage : LHa [aumfdpexlvst] [/rwxmpcazthonil-[-+012|WDIR]] LZH [DIR\] [FILES] ------------------------------------------------------------------------------ <command> a: Add files u: Update files m: Move files f: Freshen files d: Delete files p: disPlay files e: Extract files x: eXtract files with pathnames l: List of files v: View listing of files with pathnames s: make a Self-extracting archive t: Test the integrity of an archive <option> r: Recursively collect files w: assign Work directory x: allow eXtended file names m: no Message for query p: distinguish full Path names c: skip time-stamp Check a: allow any Attributes of files z: Zero compression (only store) t: archive's Time-stamp option h: select Header level (default = 1) o: use Old compatible method n: display No indicator a/o pathname i: not Ignore lower case l: display Long name with indicator -: '-' or '@' as the first letter of filenames +============================================================================= You may copy or distribute without any donation to me, Nifty-Serve SDI00506 although a few restrictions for a commercial use. ASCII-pcs pcs02846 (See the User's Manual for detailed descriptions.) PC-VAN FEM12376
Copyright © 1994 Dennis S. Barnes
Reprints of this article are permitted without notification if the source of the information is clearly identified