[Access to Wang masthead]

Understanding Data Exchange Formats

How to keep your information from getting lost in translation

From "VS Workshop",  Access to Wang, March 1993
  [ Prior Article ]     [ Return to the Catalog of articles ]     [ Next Article ]  

One result of the increasing penetration of small computers into business is that many of your customers have discovered data transfer and may expect to be able to exchange information electronically with their vendors. Frequently, this involves exchanges between unlike, seemingly incompatible systems. For example, your customer may be using an inventory system written in FoxPro, which has a file structure based on the dBase file format. How are you gonna get their order information into your VS?

Part of the answer lies in understanding and using some of the many data interchange formats available. Fortunately, such interchange formats abound; unfortunately, they are sometimes subject to wide interpretation - consider the wide variations between "text files" - and are not always easy to meet with conventional VS programming tools. Nonetheless, mastery of common file exchanges can give your organization a competitive advantage by enabling faster and more accurate exchange of information with your clients.

Interchange formats provide a relatively easy translation between complex data structures by describing specific product features in common forms - usually in text. The task of meeting a typical interchange format cannot be described as easy, since it has many unfamiliar aspects in comparison to routine VS programming work, but it is far easier than writing to a proprietary format and far more portable.

Before delving into the specifics of data interchange formats, let's consider some of their characteristics and the constraints they work within.

Requirements for an interchange format

To be successful in bridging unlike systems, any interchange format must be generic. Specific requirements that must be addressed include:

Translated, most data interchange specifications use an uncompressed consecutive organization ("text files"), a limited character set, records and fields that vary in size according to need, and multiple record types. The text file format is necessary to overcome differences between operating environments, since there are few industry standards for record indexing or compression. Ordinary characters - numbers, punctuation, and alphabetic characters - are common ties between all systems and sidestep differences of character interpretation between environments. Variable-length field and record areas allow the record size to remain compact, even in an uncompressed file format - an important consideration in low-speed data transmissions.

The tab-delimited format

An example of such a data interchange format is the humble tab-delimited text file. Tab-delimited files can be imported into and exported from most PC spreadsheets, database products, and word processing systems. In this format, data items are separated by a tab character - an ASCII value of 9 - and records are defined by the combination of a line feed and carriage return characters - like files produced by most text editors. There are usually the same number of "columns" - delimited fields - in each row, similar to the fixed row-and-column arrangement of a spreadsheet or database view. A typical data record might look like this:


ITEMNUMB<T>DESC<T>QUAN<T>UM<T>COST
1234<T>"Striped Widget, Large"<T>5<T>EA<T>5.70
45678Hand cutter1150.00

(In this example, the <T> character indicates a tab character.) The first row contains field labels for the columns, followed by two rows of data. Note that the third line has no value for UM (Unit of Measure); instead, there are two tab characters so that the COST column is placed in the correct column. Another interesting point is the double quotes around the description of the Striped Widget; some spreadsheet and database products quote fields that contain a comma or quotes within the text area, since these characters are used as delimiters and might confuse the separation of the fields. (If you have access to a product that produces tab-delimited output, experiment now to understand how it handles this situation.)

This looks simple, but some care is necessary to ensure compatibility with other environments. Number formats should have leading signs that "float" to the leading number position; leading zeros are often acceptable, too. Text fields that contain commas should be quoted. Dates may have to be converted into the internal spreadsheet format - an integer value representing the number of days since the turn of the century - if comparisons will be made between dates. Fields may be reduced to their non-blank area and tab characters added through the methods described previously in this space. (For a description of integer date conversion, see the LOTUSDAT procedure in Access to Wang, June 1989. Producing tab- delimited files from COBOL was the subject in the September 1992 Access to Wang.)

Don't underestimate the confusion of record delimiters when your transfers include a mix of MS-DOS, UNIX, or Macintosh systems; each has a different convention. Most MS-DOS applications use a carriage return (hex OD) followed by a line feed (hex OA), but some reverse this combination. UNIX systems use only the line feed. Inexplicably, Apple chose to use only a carriage return for their record delimiter. Some data communications packages can convert these into the scheme used by the local system; otherwise, it may be necessary to find or build utilities that can perform this translation. (Mega Edit - a Windows-based text editor distributed as Shareware - reads and writes files using any of these formats and can be used as a conversion tool. Contact Computer Witchcraft, Inc. at (415) 752-2477 for details.)

The comma-delimited format

Less common these days but still widely support, the comma-delimited approach uses a combination of double quotes and commas to separate fields. Numeric fields are unquoted; all other text is surrounded by double quotes. Here is how the example above would look in this format:


"ITEMNUMB","DESC","QUAN","UM","COST"
1234,"Striped Widget, Large",5,"EA",5.7
45678,"Cutter, hand",1,,150

This is the format you should choose if your intended receiver uses an early version of Lotus 1-2-3, since there are few other import choices with this product. All of the leading spreadsheets still offer it for import and export, usually with the .CSV file extension to indicate its internal structure.

Other file exchange formats

Other popular exchange formats include:

Other information

I hope this brief discussion of interchange approaches has given you some ideas of how such informational exchanges can improve your organization's service. Here are some more sources of information on this topic:


  [ Prior Article ]     [ Return to the Catalog of articles ]     [ Next Article ]  


Copyright © 1993 Dennis S. Barnes
Reprints of this article are permitted without notification if the source of the information is clearly identified