[Access to Wang masthead]

The Shell Game

Unix shell script languages for the procedure programmer

From "Migration",  Access to Wang, September 1994
  [ Prior Article ]     [ Return to the Catalog of articles ]     [ Next Article ]  

Every programming environment has some sort of batch language to act as the "glue" that ties programs together to form applications. Such languages are accessible, offering immediate execution and feedback and a restricted syntax that is easier to learn than complex development languages. At the very least, batch languages must be present for those non-interactive jobs most applications still require, like nightly file reorganization and posting programs. On the VS, these purposes are well served by Wang's Procedure Language.

There seem to be two opposing view for the use of Procedure language in VS development. Many programmers never used any procedures, preferring the control and execution speed of compiled languages and using Procedure only as a job control language. Others coded entire applications within enormous Procedures. Those who tended to use Procedure as an application language - and I am one of them - think of Procedure as one of the reasons for the success of the VS with application developers.

When I began to become more involved with Unix, I was concerned that I would lose the benefits of Procedure - the integrated tools, straightforward syntax, and practical applications. In many ways, these fears proved true. Shell script languages are more powerful but they lack the clarity and integration of Procedure, making development and maintenance more difficult. On the other hand, shell scripts extend the capabilities of the command line environments they support, allowing most commands to be entered directly at the prompt for development or testing.

This month I'll compare and contrast the operating environment of shell scripts and VS Procedure Language and describe a few production scenarios. Development techniques, more language elements, and further exploration of script usage will be covered in future columns.

The shell script environment

In the VS world, there are two types of files that can be directly executed: those with a file type of Program, and consecutive 80-byte text files whose first non-comment line contains "PROC" or "PROCEDURE". MS-DOS executables are identified by the file name extension: .COM, .EXE, or .BAT. In contrast, Unix does not know or care about file types and will attempt to run any file set up with the access rights to be executed - regardless of its name.

This is not to say there are no file types recognized by Unix: unlike the VS, the format of a file is determined at run time, not stored by the file system. Like other operating systems, Unix language compilers produce machine-specific object code whose instructions are used directly by the operating system. If the file is not an object format, it is assumed to be a script language and an interpreter is invoked, just as the VS operating system attempts to use the Procedure Interpreter if it finds PROCEDURE within a text file.

Unlike the VS or MS-DOS environments, there is a wide choice of script languages in the Unix world, each with its own interpreter and syntax. Each flavor of Unix has a preferred shell script language which acts as the default unless there is evidence that another language should be used. In most Unix systems, this default interpreter is the Bourne shell, and this will be the shell used for all examples.

If you wish to use another interpreter instead of this default, your choice must be specified in a "comment" line on the first line of the script file, just as Procedures must contain a statement beginning with the word PROCEDURE that otherwise functions as a comment. A typical script language specification looks like this:


#!/bin/sh

The first character (pound sign) is the equivalent of the asterisk in Procedure language, denoting a comment line. The exclamation point following this character indicates that this is not an ordinary comment but should be interpreted as an indication of the shell environment to be used. Finally, /bin/sh is the full path of the Bourne shell interpreter, an object file. Most Unix systems expect scripts to contain a line such as this unless the system default shell is to be used, and most expect it to be the first line in the file. It is good practice to explicitly state the shell script language in every script file to ensure your commands get processed correctly.

Once the file is opened and its contents examined, the interpreter is run and the file's contents passed to it. Like Procedure, shell script lines are interpreted directly as each line is read, but script commands can contain elements - metacharacters -that are expanded into more specific information prior to execution. An example of this is the asterisk, which indicates a pattern to match like the "wild card" characters of MS-DOS and some VS utilities. In contrast, the Procedure Interpreter accepts each line verbatim and executes them.

Shell script interpreters and Procedures share one of the drawbacks of all interpreted languages: since each line is interpreted as read, it may contain syntax errors in sections of the code that not been executed. To combat this problem, the Procedure Interpreter can be set to review a file for syntax purposes without actually running it, returning information about errors of syntax found. To my knowledge, no shell environment offers this feature; instead, some will indicate the line number of a problem area as they return to the command prompt.

Language elements

Shell script languages share most of the capabilities of other languages, including commands, variables, conditional branching, and comments. One large difference is that scripts can directly perform file I/O using the capabilities of Standard I/O (Standard Input and Standard Output). As their names imply, Standard Input accepts information from command statements or other programs and Standard Output passes information to other programs or to the command line. Programs that use Standard I/O can be fitted together using pipes, where the output of one process becomes the input of the next. (Many MS-DOS programs also allow the use of Standard I/O, the DOS shell (COMMAND.COM) shares many characteristics with Unix shell interpreters.)

Scripts typically interact with programs through arguments (parameters) passed to the program and through Standard I/O. On the VS, Procedures use GETPARMs, ACCEPT and SWITCH statements, and passed parameters to interact with programs and other Procedures. Like Procedure, scripts can also capture the return codes of programs and change their behavior accordingly, but there is no effective replacement for the ERROR EXIT and CANCEL EXIT clauses of the Procedure RUN statement, so you must find other means of trapping and handling these sorts of production errors. One of these means is through the third form of Standard I/O - Standard Error - which is an output stream only used for error conditions.

Like Procedure, shell scripts can use variables to hold the value of unknown or run-specific information. Like BASIC, most script languages do not require that these variables be declared before they are used. Bourne shell variable names must use alphanumeric characters or the underscore and cannot contain spaces. Variables and their values only exist within the script under execution unless they are exported, which makes them available to any scripts or programs running beneath that script - like the GLOBAL declaration clause under Procedure.

Branching within Procedures consists of two statements: CALL and GOTO. The CALL statement works like a subroutine, while GOTO branches directly to a label anywhere in the procedure. Neither of these forms exist in the Bourne shell. Instead, there are several forms of conditional testing, including variations of if and else, for, while, and until, the exit command to stop looping, and case statements to interpret many types of input and adjust the script's execution accordingly. Other script languages offer other features, including internal procedures that act like FUNCTIONs in BASIC, effectively extending the language by creating new verbs.

Script languages also contain elements that extract system information and control the operating environment, just as the EXTRACT and SET verbs do in Procedure. These include control of the appearance of the prompt, the contents of the path, workstation definition, and substitution of commands through the alias function. Unix systems use a file named .profile to set up each user's working environment, just as most VS shops use Procedures to set usage constants and start the first programs for each user.

That's almost all there is to the language elements of the Bourne shell. Its real power and flexibility come from application of the "building blocks" of the Unix prompt - variable substitution, redirection, Standard I/O - along with the myriad of small, concise utilities that can be combined to form powerful applications. And combining these small, generic elements to form applications is one of the cornerstones of Unix application philosophy.

As you can see, mastery of the shell is one of the most important steps to effective use of Unix. By automating control of the shell, scripts unlock the real power of Unix.


  [ Prior Article ]     [ Return to the Catalog of articles ]     [ Next Article ]  


Copyright © 1994 Dennis S. Barnes
Reprints of this article are permitted without notification if the source of the information is clearly identified