[Access to Wang masthead]

Using Script Languages

Part 5 (Perl, Part 2)

From "VS & Beyond",  Access to Wang, May 1998
  [ Prior Article ]     [ Return to the Catalog of articles ]     [ Next Article ]  

This month continues our series on scripting languages with some additional information on the Perl language. As I indicated previously, there is far more to the Perl language than scripting purposes; indeed, it could be considered a complete application language unto itself. This time we'll look at how Perl can be used to produce text output for a variety of purposes, including simple reports and CGI (Common Gateway Interface) applications in applications for the World Wide Web.

Formatted text

Perl has powerful ways to format and present data, as you'd expect from a language designed for reporting and data presentation. Though delivery of this information is primarily directed to Standard Output (e.g. the screen), there are some good alternatives to this.

Last month's script sample showed how the print statement dumps the contents of a variable to the screen (Standard Output) unless directed elsewhere. In that case, a file name was constructed and teamed with a select statement to intercept this output and direct it into the contents of a text file rather than to Standard Output. In this script, the print statement stood alone - there was nothing indicating what should be printed. This illustrates use of one of Perl's built-in variables - known as $_ (a dollar sign plus an underscore) - that automatically receives some types of input, including the contents of records read in from the input file loop. (Other built-in variables are available; consult Perl manuals for details.)

Like the echo statement in Unix, the print statement will remove some types of formatting information (including tabs), resulting in a presentation that may differ from what you had in mind. There are several ways to avoid this outcome:

The first example illustrates how simple formatting can be done using string concatenation. In Perl's syntax, the periods between variable names and literal values tie elements together to form a new line. If you've worked C or some other script languages, you'll recognize the \t value at a means of specifying a tab character; it's enclosed in quotes but the Perl interpreter will replace it with the value of a tab as the print statement is executed.

The second alternative - the sprintf command - is borrowed from the C/C++ environment. It's used for specific control of the presentation of a single element, where the results of the formatting activity are placed within another variable for use in format or print statements. Here is an example of how a numeric data item would be formatted to show a date:


sprintf("%02d-%s-%02d   %02d:%02d",
    $mday, $MoY[$mon], $year, $hour, $min);

Sample output: 01-May-98 13:54

In this example, the values of the local time and date would be placed within a format that specifies a two-digit day, a string for the month, and two digits each for the year, hour, and minute.

More text control using format

If you want more control over the presentation of date, Perl's format statement is a better choice. It is similar in construction and use to the PRINT USING functions in BASIC: a series of formatting characters and a corresponding list of the variables that must be inserted into them, in the order they must appear. format statements are coupled with the write statement (rather than print) and identified by a name. The format lines may be located anywhere within the script, but are normally close to the statements that use them. Though it is not a requirement, it is assumed that most formatted output would have more than the single variable ($_) used in the sample script from last month to present; in typical use, this means the record has already been broken into single variables for further handling.

Figure 1 shows a code fragment that uses a format statement as discussed. The format is named in line 1; this name corresponds to the output file name shown below that line. Line 2 shows a left-aligned character of about thirty characters, while the following line indicates that the contents of $name are to be inserted into that format line. Other formatting lines alternate with the names of the variables to be included within them. Note that lines 6 and 7 define a printed line that presents the contents of three variables. The single period in the first column (line 8) closes the format statement.

Following the formatting statements, the script opens the addr.txt file for input, a new file (labels.txt) for output, and starts to read through the contents of the input file. In this example, the input file is a tab-delimited text file with columns of text to be printed on labels. After removing the newline character with a chop command, the records is split by the tab characters and inserted into the contents of variables for further processing. Finally, the write statement (line 14) sends these variables and their contents into its associated formatting statements, resulting in correctly-formatted output.

The format command has a number of other alternative characters that control how lines are displayed, including left-aligned and right-aligned text fields, numeric fields, and multi-line fields.

Printing more text

The final method for printing formatted text is a technique borrowed from Unix shell script writers everywhere. When the task requires a large amount of text interspersed with a few values or variables, it becomes inconvenient to create a number of print or write statements to format and print each line. Instead, you can set a print statement going and let it run until it encounters a specific text string to stop it. This is shown in the code fragment in Figure 2.

Here's how this technique works:

Alert readers will note the similarity of the text in Figure 2 with the starting text in a Web document. In fact, this example was taken from a Perl script used to generate a Web page using the CGI model. It's nearly impossible to talk about Perl without also discussing CGI: Perl remains the primary engine of CGI on the Web due to its ease of development, universal applicability, and large reserves of sample code.

Perl and CGI

CGI (Common Gateway Interface) was developed to allow an easy way to create Web content that changes according to external conditions, the user's request, or other factors. CGI applications replace the need for normal ("static") pages by providing programs to create information on the fly. To compare:

Perl works well for these situations because it can easily send its output back to the server. As the code sample in Figure 2 shows, it is relatively easy to create scripts that generate code based on selections made by the user or other events.

Of course, there are many other aspects to CGI programming - too many for this space and time. If there is interest I will resume this topic some time in the future.

Setting Up the Perl Environment

Before you can experiment with the scripts shown here, you will need to install the Perl interpreter on your system. While the actual process for doing this varies according to your operating system, there are two key concepts in common to all installations:

Installation kits can be found at the Web locations listed below. Each has instructions and installation scripts specific to that operating system's requirements. Setting up Perl for a Windows NT system requires the use of an extension (typically ".pl") to identify the type of file and launch the interpreter. Windows 95 systems lack this capability, so scripts must be run by manually invoking the interpreter and passing it the name of the script:


c:\scripts> perl myscript.pl

(This example assumes that the Perl interpreter is located somewhere on the system's path and, thus, can be started by simply typing "perl" at the command prompt.)

Conclusion

After two columns, there is still much to discuss about Perl, including its use of modules and objects and the specifics of Perl programs. Though it is more powerful than most scripting languages, Perl remains light enough to be useful for scripting as well. I definitely recommend experimenting with Perl in your own environment.

Next month we'll look at Java.

References

Acquiring Perl
http://language.perl.com/info/software.html

Use this page to locate the version of Perl you need - free, of course.


Figure 1: Sample of Formatted Output


1  format LABELSOUT =
2  @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
3  $name
4  @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
5  $address
6  @<<<<<<<<<<<<<<<<<<< @<  @<<<<
7  $city, $state, $zip
8  .
9
10 open(ADDRESSES, "<addr.txt");    # input file
11 open(LABELSOUT, ">labels.txt");  # output file
11 while(<ADDRESSES>) {
12     chop;
13     ($name, $address, $city, $state, $zip) = split("\t");
14     write LABELSOUT;
15 }


Figure 2: Listing Large Amounts of Formatted Text


1  print <<EOH;
2  Content-type: text/html
3
4  <html>
5  <head>
6  <meta http-equiv="Content-Type"
7  content="text/html; charset=iso-8859-1">
8  <title>$headertext</title>
9  </head>
10
11 <body bgcolor="#FFFFFF">
12 EOH
13 ;
14
15 while(<INPUTFILE>) {
16     print;
17 }

  [ Prior Article ]     [ Return to the Catalog of articles ]     [ Next Article ]  


Copyright © 1998 Dennis S. Barnes
Reprints of this article are permitted without notification if the source of the information is clearly identified