Using Script LanguagesPart 5 (Perl, Part 2) |
|
From "VS & Beyond", Access to Wang, May 1998 |
|
[ Prior Article ] [ Return to the Catalog of articles ] [ Next Article ] |
This month continues our series on scripting languages with some additional information on the Perl language. As I indicated previously, there is far more to the Perl language than scripting purposes; indeed, it could be considered a complete application language unto itself. This time we'll look at how Perl can be used to produce text output for a variety of purposes, including simple reports and CGI (Common Gateway Interface) applications in applications for the World Wide Web.
Perl has powerful ways to format and present data, as you'd expect from a language designed for reporting and data presentation. Though delivery of this information is primarily directed to Standard Output (e.g. the screen), there are some good alternatives to this.
Last month's script sample showed how the print statement dumps the contents of a variable to the screen (Standard Output) unless directed elsewhere. In that case, a file name was constructed and teamed with a select statement to intercept this output and direct it into the contents of a text file rather than to Standard Output. In this script, the print statement stood alone - there was nothing indicating what should be printed. This illustrates use of one of Perl's built-in variables - known as $_ (a dollar sign plus an underscore) - that automatically receives some types of input, including the contents of records read in from the input file loop. (Other built-in variables are available; consult Perl manuals for details.)
Like the echo statement in Unix, the print statement will remove some types of formatting information (including tabs), resulting in a presentation that may differ from what you had in mind. There are several ways to avoid this outcome:
Place formatting characters directly into the data stream. Example:
print $var1."\t".$var2."\t";Use the sprintf (formatted print) statement to fill a variable with a formatted string.
Use a format statement to control the placement of text.
Send a number of previously-formatted lines to the output file until an identified end-of-data marker is encountered.
The first example illustrates how simple formatting can be done using string concatenation. In Perl's syntax, the periods between variable names and literal values tie elements together to form a new line. If you've worked C or some other script languages, you'll recognize the \t value at a means of specifying a tab character; it's enclosed in quotes but the Perl interpreter will replace it with the value of a tab as the print statement is executed.
The second alternative - the sprintf command - is borrowed from the C/C++ environment. It's used for specific control of the presentation of a single element, where the results of the formatting activity are placed within another variable for use in format or print statements. Here is an example of how a numeric data item would be formatted to show a date:
sprintf("%02d-%s-%02d %02d:%02d", $mday, $MoY[$mon], $year, $hour, $min);
Sample output: 01-May-98 13:54
In this example, the values of the local time and date would be placed within a format that specifies a two-digit day, a string for the month, and two digits each for the year, hour, and minute.
If you want more control over the presentation of date, Perl's format statement is a better choice. It is similar in construction and use to the PRINT USING functions in BASIC: a series of formatting characters and a corresponding list of the variables that must be inserted into them, in the order they must appear. format statements are coupled with the write statement (rather than print) and identified by a name. The format lines may be located anywhere within the script, but are normally close to the statements that use them. Though it is not a requirement, it is assumed that most formatted output would have more than the single variable ($_) used in the sample script from last month to present; in typical use, this means the record has already been broken into single variables for further handling.
Figure 1 shows a code fragment that uses a format statement as discussed. The format is named in line 1; this name corresponds to the output file name shown below that line. Line 2 shows a left-aligned character of about thirty characters, while the following line indicates that the contents of $name are to be inserted into that format line. Other formatting lines alternate with the names of the variables to be included within them. Note that lines 6 and 7 define a printed line that presents the contents of three variables. The single period in the first column (line 8) closes the format statement.
Following the formatting statements, the script opens the addr.txt file for input, a new file (labels.txt) for output, and starts to read through the contents of the input file. In this example, the input file is a tab-delimited text file with columns of text to be printed on labels. After removing the newline character with a chop command, the records is split by the tab characters and inserted into the contents of variables for further processing. Finally, the write statement (line 14) sends these variables and their contents into its associated formatting statements, resulting in correctly-formatted output.
The format command has a number of other alternative characters that control how lines are displayed, including left-aligned and right-aligned text fields, numeric fields, and multi-line fields.
The final method for printing formatted text is a technique borrowed from Unix shell script writers everywhere. When the task requires a large amount of text interspersed with a few values or variables, it becomes inconvenient to create a number of print or write statements to format and print each line. Instead, you can set a print statement going and let it run until it encounters a specific text string to stop it. This is shown in the code fragment in Figure 2.
Here's how this technique works:
A print statement is started in line 1; the string "EOH" is set as the stopping point.
Text lines from 2 through 11 are printed, using the order and format as they appear within their respective formats. Note that there are Perl metacharacters (characters with special meaning) within these print statements, but their special meaning is ignored in this case.
Printing is halted on line 12 with the occurrence of the string "EOH" on its own line. Line 13 closes the print statement with a normal Perl line ending (the semicolon).
Normal Perl statements resume in line 15. In this case, the contents of a file (named "INPUTFILE") is read and printed until there are no more lines.
Alert readers will note the similarity of the text in Figure 2 with the starting text in a Web document. In fact, this example was taken from a Perl script used to generate a Web page using the CGI model. It's nearly impossible to talk about Perl without also discussing CGI: Perl remains the primary engine of CGI on the Web due to its ease of development, universal applicability, and large reserves of sample code.
CGI (Common Gateway Interface) was developed to allow an easy way to create Web content that changes according to external conditions, the user's request, or other factors. CGI applications replace the need for normal ("static") pages by providing programs to create information on the fly. To compare:
Web servers accept the address of a static HTML files, open those files, and send their contents (text and commands in the HTML markup language) back to the Web browser of the requesting user.
When a CGI application is invoked, one or more parameters are passed to a program specified in the address. The program is run and results are sent back to requesting user.
Perl works well for these situations because it can easily send its output back to the server. As the code sample in Figure 2 shows, it is relatively easy to create scripts that generate code based on selections made by the user or other events.
Of course, there are many other aspects to CGI programming - too many for this space and time. If there is interest I will resume this topic some time in the future.
Before you can experiment with the scripts shown here, you will need to install the Perl interpreter on your system. While the actual process for doing this varies according to your operating system, there are two key concepts in common to all installations:
The Perl interpreter must be installed on your system in a location that your scripts can find.
Where possible, Perl scripts should be identified by the operating system as executable. This identification can be external (file extensions, file permissions) or internal (embedded location information for the Perl interpreter).
Installation kits can be found at the Web locations listed below. Each has instructions and installation scripts specific to that operating system's requirements. Setting up Perl for a Windows NT system requires the use of an extension (typically ".pl") to identify the type of file and launch the interpreter. Windows 95 systems lack this capability, so scripts must be run by manually invoking the interpreter and passing it the name of the script:
c:\scripts> perl myscript.pl(This example assumes that the Perl interpreter is located somewhere on the system's path and, thus, can be started by simply typing "perl" at the command prompt.)
After two columns, there is still much to discuss about Perl, including its use of modules and objects and the specifics of Perl programs. Though it is more powerful than most scripting languages, Perl remains light enough to be useful for scripting as well. I definitely recommend experimenting with Perl in your own environment.
Next month we'll look at Java.
Acquiring Perl
http://language.perl.com/info/software.htmlUse this page to locate the version of Perl you need - free, of course.
Figure 1: Sample of Formatted Output
1 format LABELSOUT = 2 @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 3 $name 4 @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 5 $address 6 @<<<<<<<<<<<<<<<<<<< @< @<<<< 7 $city, $state, $zip 8 . 9 10 open(ADDRESSES, "<addr.txt"); # input file 11 open(LABELSOUT, ">labels.txt"); # output file 11 while(<ADDRESSES>) { 12 chop; 13 ($name, $address, $city, $state, $zip) = split("\t"); 14 write LABELSOUT; 15 }
Figure 2: Listing Large Amounts of Formatted Text
1 print <<EOH; 2 Content-type: text/html 3 4 <html> 5 <head> 6 <meta http-equiv="Content-Type" 7 content="text/html; charset=iso-8859-1"> 8 <title>$headertext</title> 9 </head> 10 11 <body bgcolor="#FFFFFF"> 12 EOH 13 ; 14 15 while(<INPUTFILE>) { 16 print; 17 }
Copyright © 1998 Dennis S. Barnes
Reprints of this article are permitted without notification
if the source of the information is clearly identified