Perl idioms
In this course, we’ve made an effort to write Perl that is easy to understand by not taking common Perl shortcuts. But as you gain more experience, and as you read more scripts, you’ll come across commonly used (idiomatic) pieces of code that are terse or even obfuscated. Perl, like C, is a programming language that provides so many shortcuts that it’s easy to write code that no one else can understand.
In the first part of today’s class, we’re going to talk about some common Perl idioms that you’ll need to be aware of.
The $_ variable
The special variable $_ is used in Perl as an all-purpose default
argument in many situations.
$_ and reading lines from a file
Here is the way we have been reading lines from a file, where
$ARGV[ 0 ] is the name of the file as given in the first argument
from the command line:
if ( open( IN, "<$ARGV[ 0 ]" ) )
{
my( $dataLine );
while ( defined( $dataLine = <IN> ) )
{
print( STDOUT $dataLine );
}
}
After we open the file, we explicitly create a variable, $dataLine,
into which we place each line obtained from the file by the line input (angle,
diamond) operator <>. We have to check that
$dataLine contains a defined value each time we attempt to read from
the file, since the value will be undefined when we reach the end of the file.
This is one way to write the same code; this code is easier to write but harder to understand:
if ( open( IN, "<$ARGV[ 0 ]" ) )
{
while ( <IN> )
{
print( STDOUT $_ );
}
}
The difference here is that we didn’t create a variable for holding each line
as it is read from the file. In a while loop, if and only if you
don’t immediately assign the result of the line input operator
<> to a variable, Perl assigns the result to the $_
variable. In addition, Perl checks to see if $_ contains a defined
value; if it does, then Perl executes the code inside the curly braces of the
while loop.
You’ll probably see this a lot in other people’s code because programmers want to save time entering code, so they tend to use the shortest code, even if it’s a bit obscure. (For more information, see Programming Perl, 3rd ed., pp. 80-81.)
$_ as the default argument for the print() function
There’s another simplification we can make to the above code. Where we had:
print( STDOUT $_ );
we can instead use:
print( STDOUT );
This is because when no argument other than the filehandle is given to the
print() function, the function takes as its argument the
$_ variable.
Our code now looks like this:
if ( open( IN, "<$ARGV[ 0 ]" ) )
{
while ( <IN> )
{
print( STDOUT );
}
}
$_ as the default argument for the chomp() function
chomp() is the function you use to remove a newline character from the end of
a value. Most commonly, chomp() is used when reading lines of input from the
keyboard or from a file. Here’s how we have used it when reading lines from a file:
while ( defined( $dataLine = <IN> ) )
{
chomp( $dataLine );
}
If we change the while loop so that it uses the $_ variable, we
have to apply chomp() to $_, like this:
while ( <IN> )
{
chomp( $_ );
}
As a convenience (and to save our aching fingers from having to enter too much
code), if we don’t give chomp() an argument, it takes
$_ as its default argument. This allows us to write our loop this way:
while ( <IN> )
{
chomp();
}
$_ as the default operand of the m//
regular expression matching operator
Here is an example of using a regular expression to scan a fasta file for the sequence IDs, which follow immediately after a ">" character at the beginning of a line. The way we have written this code is:
while ( defined( $dataLine = <IN> ) )
{
if ( $dataLine =~ m/^>(\S+)/ )
{
print( $1 );
}
}
If change our while loop so that it uses the $_
variable, we have to match the regular expression against $_,
like this:
while ( <IN> )
{
if ( $_ =~ m/^>(\S+)/ )
{
print( $1 );
}
}
As a convenience, when we omit the operand of the regular expression match operator,
Perl performs the match against $_, like this:
while ( <IN> )
{
if ( m/^>(\S+)/ )
{
print( $1 );
}
}
$_ in a foreach loop
We can use a foreach loop when we want to examine each item in a list
or array for some purpose, as in this example:
@colors = ( 'red', 'yellow', 'blue' );
foreach $color ( @colors )
{
print( "$color\n" );
}
Each time through the loop, an element from the @colors array variable
is assigned to the $color scalar variable. If we don’t specify the
scalar variable to which each element is to be assigned, then Perl assigns the
element to the $_ variable, letting us simplify our code to:
@colors = ( 'red', 'yellow', 'blue' );
foreach ( @colors )
{
print( "$_\n" );
}
Obscure uses of $_
For other, more obscure uses of $_, see Programming Perl, 3rd.
ed., pp. 658-659.
Using || as a short-circuit
Perl provides the C programming language-style logical &&
(and) and || (or) operators. Here are the truth
tables for these two operators, which we first introduced in
Lesson 4:
| 1st value | 2nd value | Result |
|---|---|---|
| T | T | T |
| T | F | F |
| F | T | F |
| F | F | F |
| 1st value | 2nd value | Result |
|---|---|---|
| T | T | T |
| T | F | T |
| F | T | T |
| F | F | F |
When we look at the truth table for the || operator, we can see that
either one or the other value has to be true, or both values have to be true, for
the result to be true. We can also see that if the first value is true, then we
don’t have to check the second value, because we know that if the first value
is true, then the result is always true. But if the first value is false, then we
have to check the second value to find out what the result will be.
Note: when we work with true and false values, we perform Boolean logic. Variables that can take only true or false values are called Boolean variables. All Perl variables are Boolean in the sense that all Perl variables evaluate to either a true value or a false value.
Here’s an example of how to use the && and || operators:
my( $first );
my( $second );
# In Perl, a variable is false if its value is 0 (zero), the empty string
# (''), or its value is undefined. Otherwise, a variable is true.
$first = 1; # a true value
$second = 0; # a false value
if ( $first && $second )
{
# execute this code if both $first and $second are true
}
else
{
# execute this code if either or both of $first and $second are false
# in this example, this code would be executed
}
if ( $first || $second )
{
# execute this code if either or both of $first and $second are true
# in this example, this code would be executed
}
else
{
# execute this code if neither $first nor $second is true
}
In Perl code, these operators work on expressions, and expressions can be lines of
code that result in a value. Knowing this, we can splice together two lines of code
with the || operator, and Perl handles things like this:
-
If the expression on the left side of the
||operator evaluates to a true value, then Perl will not evaluate the expression on the right side of the||operator because it knows that the result is already true. -
If the expression on the left side of the
||opeator evaluates to a false value, then Perl will evaluate the expression on the right side of the operator because it doesn’t yet know what the result will be.
The result of this is applied in a tricky way to cause Perl to die()
when it can’t open a file. Remember that the open() function
returns a true value when it opens a file successfully. Up to this point,
we have been writing something like this:
if ( open( IN, "<$ARGV[ 0 ]" ) )
{
# Do something with the file...
}
else
{
die( "Can't open file $ARGV[ 0 ]: $!." );
}
But here’s how we can use the || operator as a code short-circuit.
On the left side of the operator, we attempt to open the file as normal.
If the open is successful, then the left side evaluates to a true value,
and the expression on the right side of the || is ignored. But if the
open fails, then the left side evaluates to a false value, and Perl must
evaluate the expression on the right side of the || operator to
determine the result.
So if we put the open() function on the left and a die()
function on the right of the || operator, we get what we want.
If the file is opened successfully, the
open() function returns true, and Perl ignores the
die() on the right. If the file is not opened successfully, the
open() function on the left returns false, and Perl must
evaluate the die() function on the right to determine the result. (And
of course die() causes Perl to die, and Perl never finds out the
result, but we don’t care because we made Perl do what we want it to do.)
The result is a tricky way to open a file and check the result of the open all on one line of code:
open( IN, "<$ARGV[ 0 ]" ) || die( "Can't open $ARGV[ 0 ]: $!.\n\n" );
For more information, see Programming Perl, 3rd. ed, pp. 102-103.
unless, the reverse of if
if is used like this:
if ( $value )
{
# execute this code if $value is true
}
else
{
# execute this code if $value is false
}
Oftentimes, you’ll want to reverse this. One way to do this is:
if ( ! $value )
{
# execute this code if ! $value is true, that is, $value is false
}
else
{
# execute this code if ! $value is false, that is, $value is true
}
Perl supplies unless as the reverse of if, and
unless can be used in place of the code immediately above, like this:
unless ( $value )
{
# execute this code if $value is false
}
else
{
# execute this code if $value is true
}
When I think about this too hard, my brain starts to hurt. But I use
unless all the time in my code.
Obfuscated Perl
Obfuscation is the act of making something obscure or difficult to understand. A famous phrase from the 1970s was "eschew obfuscation", a difficult to understand phrase that means "avoid making things difficult to understand".
Above, we showed a simplified piece of code that reads a file line by line and prints the lines to the standard output:
if ( open( IN, "<$ARGV[ 0 ]" ) )
{
while ( <IN> )
{
print( STDOUT );
}
}
Since the default filehandle for the print() function is
STDOUT, we can simplify the line to:
print();
And since the parentheses after a function name are optional in Perl, we can shorten this line even further to:
print;
Here’s the newly shortened piece of code:
if ( open( IN, "<$ARGV[ 0 ]" ) )
{
while ( <IN> )
{
print;
}
}
Did you know that the final line in a block, where a block is lines of
code surrounded by curly braces ("{}"), doesn’t require
a semicolon after it? (Programming Perl, 3rd ed., p. 111.) Let’s
remove the semicolon after print, giving:
if ( open( IN, "<$ARGV[ 0 ]" ) )
{
while ( <IN> )
{
print
}
}
This can get even shorter. If you use the line input operator <>
without putting a filehandle inside, Perl will automagically open the file
indicated by the first argument from the command line, then read lines from that
file. Furthermore, once it’s done reading from that file, it will open the next
file indicated by the second argument from the command line, and so on. (If no
arguments are provided, then Perl reads input from the keyboard,
STDIN.) And if a file can’t be opened, Perl simply goes on to the
next one. (See Programming Perl, 3rd ed., pp. 82-83 for more
information.)
This means we can shorten our code even more. Think of all the time we’ve wasted entering all that redundant code. Now we can have:
while ( <> )
{
print
}
Now we’re down to only four lines of code. But Perl doesn’t care if we make our code easy to read, so we can reduce the four lines to a single line, like this:
while ( <> ) { print }
Furthermore, Perl doesn’t care if the white space is in the code, so let’s take that out, too, like this:
while(<>){print}
This one-liner is the equivalent of:
my( $fileName );
foreach $fileName ( @ARGV )
{
if ( open( IN, "<$fileName" ) )
{
my( $dataLine );
while ( defined( $dataLine = <IN> ) )
{
print( STDOUT $dataLine );
}
close( IN );
}
}
This sort of simplification can be a marvelous feature that saves the programmer coding time. Unfortunately, code of this type is impossible to understand unless you have a deep understanding of the programming language.
So when should you use the short form and when should you use the long form? If you’re writing a script that you’re going to use once and then throw it away, use the short form. If you’re writing a script that’s going to grow complicated or that you’re going to give to someone else, then use the long version.