[Access to Wang masthead]

File Experiments

Existing disk management software doesn't fulfill simple needs

From "VS Workshop",  Access to Wang, January 1991
  [ Prior Article ]     [ Return to the Catalog of articles ]     [ Next Article ]  

Sometimes it's the simple questions that really make you work. Suppose your manager just walked in and announced plans to replace the General Ledger system. The new G/L software will be installed next week, and users will begin entering transactions in parallel starting the following week. She wants to know what the present disk storage requirements are to assess the impact of the additional data and program files, and expects an immediate response from you. After an embarrassed silence, you mutter that you will have an answer later in the day and scurry off to build some reports.

The tools of the systems trade have improved, but we still have so far to go. Questions that seem obvious to humans of average intelligence defy definition as an acceptable query to a sophisticated data base. Sure, modern disk management software allows you to select and report on usage from several perspectives, but the likely result is too much detail and too little clarity.

The following discussion is intended to serve two purposes: first, to reveal some of my experiments in applying the concept of sets to file identification; secondly, to present a challenge to vendors of disk management software to better the current situation.

I got interested in file selections by attribute while reviewing the SELCOPY utility (a USERAID or VSAID; distributed by the United States Society of Wang Users). SELCOPY copies files matching entered criteria to another disk or to tape. Possible selection criteria include the owner ID (a single user or blank for all users), file protection class (A through Z, #, @, $, or blank), the file type (Consecutive, Indexed, Print, Program, Alternate- Indexed, Log, or Word Processing), and date ranges. What if the same selection approach could be applied to disk usage reporting - perhaps including ambiguous (wild-card) file, library, or volume references?

Disk Management paradise

The ideal disk management software would review status and activity over time, isolating trends and highlighting abnormalities. It would allow queries in natural language (e.g. "What is the space used by REPORT files throughout the system?") and return the answers in reasonable time. It would format information graphically for better understanding but still retain the detail for further interrogations. In short, this software doesn't exist yet.

Consider the following questions about disk usage:

In all of these examples there are elements that elude the machine but appear obvious to us.

How would you define the limits of "accounting data files" or "current companies"?

Document usage might be determined by summing the space used by files of the Word Processing file type. Only documents have a file type of Word Processing, right? Nope; so do WPS object files, work files for WP, and possibly other files as well. And what about those documents on volumes not officially recognized by WP?

Reviews of usage over time imply comparisons with earlier usage records and the ability to when and how will these records be created? How often should the sample be taken? What will perform the actual comparison?

Only a few disk management tools summarize the information in an understandable way, such as by library. Fewer yet produce charts to explain their results. Ever tried to make a point with your one-minute manager by waving around a 75-page report?

Experiments with FILEINFO

The FILEINFO utility extracts file information by volume or library and creates a file that can be interrogated. It provides a means of testing some of the file identification concepts brought up above. I have covered FILEINFO in several prior columns (see "The FILEDATA procedure" in the July 1987 issue of Access 87 for a full description), so I will not dwell on it here. For this experiment I needed to review information on all files on the system. Since FILEINFO can work with one volume at a time, I wrote a procedure that extracts each volume's data and creates a work file for each. After all volumes had been processed, the individual volume data files were merged into a single data file using the CREATE utility.

For the first trials, I concentrated on the file type, using the REPORT utility to sum block usage by file type by volume. The results were then keyed into a spreadsheet for further reporting and graphing (See Figure 1, below).

The results were surprising. Previously, I had no clear idea of the space usages of various file types and was confused by the number of files in some libraries. This report gets to the heart of the matter by showing disk commitment in blocks, the only comparable measure. Further refinements to this spreadsheet added percentages by disk, a file count by type by volume, and graphs of the results. Finally, our storage requirements make some sense.

Defining file type sets

While file types are easy to extract and interpret, they do not provide a fine enough breakdown of volume usage. What about the other factors that identify a file, such as the library or file name, record size, or internal contents? Some of these subsets can be defined externally (see Figure 2, below); others would require an understanding of the internal contents of the file. A good example is Procedure files: other than naming conventions, how would you separate Procedure files from COBOL source, data files, or any other 80-character consecutive file?

It should be obvious that neither FILEINFO nor any other disk management tool currently available offers what is needed here. The logical understanding of file usage to this degree must include external characteristics, file naming conventions, and exceptions in its definitions. Refinement of queries to this degree is not practical with crude tools like FILEINFO.

Software developers: are you listening?

Figure 1: Sample Report - Disk Usage by File Type

Numeric values other than percentages indicate disk blocks allocated.

File Type DISK01 DISK02 DISK03 DISK04 Total % used
Alt-Indexed 25,094 24,192 84,596 31,253 165,135 25.3
Consecutive 21,689 12,316 6,920 23,794 64,719 9.9
Indexed 87,702 17,660 47,431 60,138 212,931 32.6
Log 22,838 465 258 490 24,051 3.7
Object 23,191 2,252 1,600 49,012 76,055 11.6
Print 41 16,217 104 8,944 25,306 3.9
WP 0 85,192 0 130 85,322 13.1
Totals 180,555 158,294 140,909 173,761 653,519 100.0

Figure 2: Sample File Type Subsets (Partial List)

File Type Usage Identifiers
Alternate-Indexed Data files Library
Consecutive Data files
Program source files
Wang INFO files
Library; other
Indexed CONTROL files
Data files
REPORT files
Library; size
Library; size
Log Data files
Object Program files
Print General reports
Screen dumps
Name (?)
Word Processing Documents
Font files
Data exchange files
WP work and queue files
WPS object files
Library; name
Library; name
Library; name

  [ Prior Article ]     [ Return to the Catalog of articles ]     [ Next Article ]  

Copyright © 1991 Dennis S. Barnes
Reprints of this article are permitted without notification if the source of the information is clearly identified