Using Script LanguagesPart 4 (Perl, Part 1) |
|
From "VS & Beyond", Access to Wang, April 1998 |
|
[ Prior Article ] [ Return to the Catalog of articles ] [ Next Article ] |
This month we'll cover the Perl language in our continuing series on script languages. Its name is an acronym for Practical Extraction and Reporting Language (or, alternatively, Pathologically Eclectic Rubbish Lister) but it is also known as the "Swiss Army chainsaw of programming" because it is very effective but less than graceful in its appearance.
Unlike the Web-specific script languages covered so far (VBScript and JavaScript), Perl is a general-purpose utility language. It has become the development environment of choice in several areas of information processing:
System administration, replacing shell scripts, AWK, sed, grep, and the like
Application development, as an alternative to C or C++ or other compiled development languages
Interactive Web applications, where it can retrieve data from databases and generate Web pages on the fly using the CGI (Common Gateway Interface) approach
Its strong capabilities as a reporting language are reminiscent of RPGII or the COBOL report generators, while its ability to retrieve and process text make it an ideal way to replace larger, more cumbersome application programming languages. And if that weren’t enough, Perl is available free of charge for a broad number of systems.
In its native form, Perl works like a shell script language: scripts are developed using a text editor and run within an interpreter, producing on-screen output or, more often, diverting output to a text file or to the Standard Output buffer. It assumes use of a command prompt for its operation, but also works quietly in the background of scheduled batch jobs and interactive Web applications. One of the main reasons Perl has become so popular is that it is available for most operating systems in use today. Perl scripts can be moved between any of these environments to take advantage of the large reserves of ready-made Perl applications, most available without charge. Thus, it is possible to take scripts developed on a PC and move them to a Unix, Macintosh or NT system with little or no change.
Perl takes much of its structure from the C language, which makes it somewhat difficult for those of us without that programming experience. Statements end with a semicolon, conditions are enclosed within parentheses, and sections are delimited by curly braces ("{" and "}"). By default, output is directed to Standard Output (e.g. the screen) unless redirected. Input sources include files, command-line arguments, or data streams ("pipes," or Standard Input). Perl offers a number of ways to use its power. As a script language, simple applications can be developed and run to produce results. Since it has network capabilities that include sockets, Perl applications can perform data transfers and other sophisticated communications tasks. In CGI applications, Perl directs streams of formatted output to the Standard Output device, feeding Web browsers with custom-generated content.
Perl version 5 includes methods for creating reusable program objects with characteristics similar to those of C++ and other object-oriented languages. This means that it becomes possible to make effective use of library functions developed by others in your own work. Unlike other object- oriented languages, however, it is still possible to ignore these advanced features if you wish and use the language for simple scripting and applications work.
Like most other script languages, Perl does not use strong data typing; variables may contain numbers or text and conversions between these formats is automatic. Variables may be used without defining them, though initializing the contents of variables is recommended; a common interpreter option is the -w ("warn") attribute, which will nag you about any variables that do not have initialization statements.
Most variables are of the scaler type - a single value within a variable. Scaler variables are named with a dollar sign to indicate their status as variables. For example, $var would refer to a simple scaler variable. The same variable could also be referenced as an array (example: $var[25]) or as an associative array (%var). Arrays are similar to those found in other programming languages and may be one- or multi-dimensional. Associative arrays are one-dimensional arrays where the values contained within each element are indexed for fast searching and retrieval, returning a pointer to the location within the array. Though cryptic, associative arrays provide powerful search and indexing capabilities.
Besides its abilities in substring control, Perl sports a number of text conversion and string manipulation capabilities. String information can be edited, indexed, searched, and modified in place. There are also a number of conversion functions, including base conversions (hexadecimal, octal) and time and date information.
Perl provides a handful of ways to select and loop through information, including internal subroutines and if. . .elsif, unless. . ., while. . ., foreach. . . and until logic forms. Even the notorious goto. . . form is supported for compatibility with older scripts.
Conditions tested can include complex statements, pattern matching (using regular expressions), and system operators (file or library existence, etc.).
In its screen interaction Perl mostly resembles shell scripts: output is to Standard Output, which is normally the screen. As a script language, this primitive screen control is sufficient for most purposes. Versions of the Perl interpreter have been developed for the Macintosh and Windows operating system that mimic the command-line orientation of the language, but there are no other regular ways of controlling the computer screen.
Perl has a large number of file and library test and control facilities, making it an effective choice for many system administration tasks. It can also perform time and date conversions, send and receive messages between networks, and interact with other processes using threads.
As rich as Perl’s environment is, most of the development environment remains poorer. Those who have experienced a good integrated editor and debugger such as Wang’s will be disappointed to find they are reduced to simple text editors and a command-line debugger. Like most other script languages, Perl scripts are usually debugged by directing intermediate output to the screen, working with a section at a time. Some of this need is reduced due to the availability of strong code libraries and sample applications.
The sample script (Figure 1, below) shows off some of the ways Perl can be used for utility work on any system. The script takes a large file and splits it into smaller files that contain no more than a specified number of records. The script creates as many of these smaller files as necessary.
Some comments on the script:
Literals and other variables are defined at the top of the script. Note the use of comment lines (any text to the right of a pound sign (#)) to the right of the code they refer to.
Unlike some utility languages (notably Wang Procedure language), Perl provides a regular means of opening and using data files. Note that the name of the DATAOUT file is constructed from values in variables combined with literal text. Besides normal files, Perl can read from and write to command strings and other processes.
Like RPGII and report generators, Perl normally expects to read an input file from start to finish. The while statement in this script performs this function.
The select DATAOUT statement directs output to the file; normally, the print statement below it would send its product to Standard Output.
Like C, it is possible to increment (or decrement) a counter by without using an equals sign. The statement $reccounter++ sets the value of the record counter up by one.
Perl can work in a number of ways according to the requirements of the task. As a utility language, it can help you manage your system better and analyze data; as an application language, it can replace clumsier compiled languages. While its execution speed is not the fastest available, the ease of development and maintenance makes up for much of this.
Next month: more about advanced uses of Perl.
Programming Perl
Larry Wall and Randal L. Schwartz
O’Reilly & Associates
ISBN: 0-937145-64-1The "camel book" (the cover features a picture of a camel) is an indispensable part of the Perl programmer’s shelf. Co-author Larry Wall developed Perl.
Learning Perl
Randal L. Schwartz
O’Reilly & Associates
ISBN: 1-56592-042-2The "llama book" (the cover features a picture of a llama) presents some of the features of the language that are not apparent from the reference approach taken in Programming Perl.
The Perl Institute
http://www.perl.orgA comprehensive site of materials for Perl programming, including recent releases of the Perl interpreter and code samples.
The Perl Programming Pages
http://www.perl.comHosted by O’Reilly & Associates, this site offers a vast amount of reference material, sample programs, and tutorials.
Figure 1: Sample Perl Script
#!/usr/lib/perl # split.pl - splits large files into smaller files # with the same record count $fprefix = "data"; # Prefix for output file names $recsperfile = 5000; # Records per output file $filecounter = 1; # Output file increment $reccounter = 0; # Loop counter open(DATAFILE, "<data.txt"); open(DATAOUT, ">".$fprefix.$filecounter.".txt"); select DATAOUT; while(<DATAFILE>) { if($reccounter > $recsperfile) { $reccounter = 0; close DATAOUT; $filecounter++; open(DATAOUT, ">".$fprefix.$filecounter.".txt"); select DATAOUT; } print; $reccounter++; } exit 0;
Copyright © 1998 Dennis S. Barnes
Reprints of this article are permitted without notification
if the source of the information is clearly identified