File Manipulation

Filehandles

You can write to a file as easily as you can write to the terminal. The first step (almost the only step) is to open the file with the open command. Here's an example:

   open(DOG,">/home/scotty/data/dogs");

DOG is the "filehandle"--the name by which you'll refer to the open file from now on. It's customary to use all caps for filehandles. The other thing inside the parens is the full pathname of the file; it's prefixed with a > so you can write to it. (Without the > you could only read from it--we'll talk about that in a second.)

Now to write a line of text to the file, just do it like this:


    print DOG "This line goes into the file and not to the screen.\n";

Pretty easy, huh? I love Perl.

Since failure to successfully open a file can cause your program to go batty, it's a good idea to have the program exit gracefully if it fails to open the file. To do that, use this syntax for the open command:


   open(DOG,">/home/scotty/data/dogs") || die "Couldn't open DOG.\n";

Now it will either open the file, or quit the program with an explanation why.

To read from a file, open it without the >:


   open(DOG,"/home/scotty/data/dogs") || die "Couldn't open DOG.\n";

(If the file doesn't exist, the program will quit.) Once the file is open, there are two common ways to get the information out. You can do it one line at a time, with lines like this:

   $x = <DOG>;

That copies the first line from DOG (or the next line, if you've already taken some lines out) and assigns it to $x. The <DOG> syntax is just like the <> we used earlier to get input from the keyboard; sticking DOG in there just tells Perl to get the input from the open file instead.

You can also do it in a loop. This program prints out all the contents of DOG:


  #!/usr/bin/perl

  open(DOG,"/home/scotty/data/dogs") || die "Couldn't open DOG.\n";

  while(<DOG>) {
       print;
  }

Notice how we didn't specify a variable to store each line from DOG in, and we didn't specify anything for the print command to print? This is a really important concept I should have introduced earlier: Perl features a default variable called $_. Basically, if you don't specify which variable you want to use, or if you use a command like print by itself, Perl assumes you want to use $_.

Some other common commands that can assume you're talking about $_:


  s/dog/cat/g;

That's a valid line all by itself; it means "substitute all occurences of 'dog' in the variable $_ with 'cat'." Another popular type of construction is

  print if (/dog/);

That means "print $_ if $_ has 'dog' in it."

$_ shows up everywhere in Perl, just to make your life easier. For instance, foreach will store its elements in $_ if you're too lazy to name a variable yourself:

    foreach (@array) {
        print if (/Tonight/);
    }
You can assign from $_ or manipulate it just like any other variable, with commands like

   $_++;
   $x = $_;

You just have to do it before the next time you overwrite $_ with a command like

   <DOG>

--it's a very temporary storage space.

When you're done with a file, don't forget to close it with the close command:


   close(DOG);

Filename Globbing

Perl can read all the filenames in a directory (/home/scotty/bin in the following example) with this syntax:

    while($x = </home/scotty/bin/*>) {
        ...
    }

One obvious and powerful use of this "filename globbing" is a loop like this:

    while($x = </home/scotty/bin/*>) {
        open(FILE,"$x") || die "Couldn't open $x for reading.\n";
        ...
    }

Thus, the following simple program will print all lines containing the word "dog" (along with the names of the files they came from) in the /home/scotty/bin directory:

    #!/usr/bin/perl

    while($x = </home/scotty/bin/*>) {
        open(FILE,"$x") || die "Couldn't open $x for reading.\n";
        while(<FILE>){
            if(/dog/) {
                print "$x: $_";
            }
        }
    }

Opendir

Here's a key point of Perl philosophy: Perl tries to never limit you. Arrays can be as big as you want, strings as long as you want, and strings can contain anything. If you try to open a file, and the file doesn't open, the program hums right along; if a line is expecting three variables back from a subroutine, as in

   ($one, $two, $three) = &some_routine;

and it only gets back one, well, it won't care. It'll just pad the other variables and move along.

Anyways, Perl doesn't have limits, but Unix sometimes does. A key example is the difference between filename globbing, which relies on the Unix shell's built-in functions, and Perl's own opendir function.

Try to open a directory with 3000 files using filename globbing, and your program will probably crash. But you can open a directory of 100,000 files with opendir (as long as your machine can handle it...)

Anyways, here it is:


    opendir(DIR,"/tmp");
    while($file = readdir(DIR)){
          ....
    }

Note that opendir will try to open every file, including the enigmatic . and .. files. (I can't imagine why, but I'm sure there's a reason.) So here's a variation that will only attempt to open files whose names don't begin with dots:

    opendir(DIR,"/tmp");
    while($file = readdir(DIR)){
        next if ($file =~ /^\./);
          ....
    }

Comments or suggestions? Please write scotty@bluemarble.net.