A Lazy Afternoon

Copyright (c) 1999-2001 by Rich Morin
published in Silicon Carny, March 1999


I don't do much formal consulting, but occasionally a project crosses my desk by accident. If I have the time and/or inclination, it can be interesting to help out.

In this instance, a Perl programmer was having very limited success in getting a Perl script to work. The problem appeared to be in the area of some dbm (Unix data base management) files that the script was supposed to read.

To simplify the problem, he had tried reading the files in a small, separate script. No luck. So, he sent me snapshots of the dbm files and asked me to take a look at them. I hacked up my own short script to print out the files:

    #!/usr/local/bin/perl
    foreach $file (qw(divisions jobs users)) {
      $cnt = 0;
      print("$file\n");
      dbmopen(%hash, $file, 0100);
      foreach $key (sort(keys(%hash))) {
        printf("  %-15s  <%s>\n", $key, $hash{$key});
        last if (++$cnt >= 5);
      }
      dbmclose(%hash);
      print("\n");
    }
      

This produced:

    divisions
      AP               
      CS               
    ...
      

The script appeared to work just fine, printing out three plausible sets of data. I sent the script and the output data back to the programmer, who reported that he got nothing but the headers when he ran the script. Inquiring, I found out that the dbm files had actually been written on a different machine.

A small light began to glimmer. The dbm facility is notorious for being machine- and even version-specific. In general, dbm files are best treated as local data for a particular machine. These dbm files had come from a different machine; perhaps their format wasn't right.

To test my theory, I tried running my script on a different machine (an Intel/FreeBSD box, rather than the original SPARC/SunOS one). As I suspected, I no longer got any output data. Knowing that Intel and SPARC machines have different byte orders, I dumped some of the data:

    sparc 1:  od -x divisions.pag | head -1
    0000000  0012 03fe 03e8 03e6 03cf 03cd 03bf 03bd

    intel 1: od -x divisions.pag | head -1
    0000000  1200 fe03 e803 e603 cf03 cd03 bf03 bd03
      

Aha! Clearly, the two machines were seeing the bytes differently. Specifically, this was a "NUXI" problem, where pairs of bytes were being swapped. I suggested that the programmer attempt to correct the problem by reversing the byte pairs:

    intel 2: dd if=divisions.pag of=dpr conv=swab
    2+0 records in
    2+0 records out
    1024 bytes transferred in 0.000463 secs (2211621 bytes/sec)
    intel 3: od -x dpr | head -1
    0000000  0012 03fe 03e8 03e6 03cf 03cd 03bf 03bd
      

He tried that, but found that the files were still unreadable. So, the problem was more than just a simple byte-reversal. Too bad, but sometimes you don't get a break!

Because I was able to read the dbm files, the programmer asked me to convert them into a machine-independent format and send them back to him. With luck, he would then be able to reload his dbm files.

I wrote up a tiny pair of scripts for this task. The first one, somewhat amusingly, generated a sequence of Perl statements:

    #!/usr/local/bin/perl
    foreach $file (qw(divisions jobs users)) {
      dbmopen(%hash, $file, 0100);
      foreach $key (sort(keys(%hash))) {
        printf("\$%s{%s} = '%s'", $file, $key, $hash{$key});
      }
      dbmclose(%hash);
     print "\n";
    }
      

This produced:

    $divisions{AP} = 'Applications';
    $divisions{CS} = 'Comm Software';
    ...
      

The second script reloaded the dbm files in the obvious manner:

    #!/usr/local/bin/perl 
    dbmopen(%divisions, 'divisions', 0770);
    dbmopen(%jobs,      'jobs',      0770);
    dbmopen(%users,     'users',     0770);
    require 'fmt.out';
    dbmclose(%divisions);
    dbmclose(%jobs);
    dbmclose(%users);
      

The entire exercise was performed via email, over the course of an hour or two. The programmer's stock options are no longer in jeopardy and I had an interesting and enjoyable diversion. I also had an amusing and instructive story to relate in my column!

So, what does the story have to teach us? First, although it's always a good idea to try solving problems on your own, it's also a good idea to know when you need help. In this instance, the programmer had tried enough to know that he needed help and was able to relate the problem specifics in an intelligent manner.

Second, don't be afraid to write throw-away code. If a few lines of Perl can answer a critical question, don't waste time looking around for more sophisticated solutions. My initial script told us something very valuable: the files were valid and readable.

Third, don't program if you don't have to. I could easily have written scripts to display and byte-swap the data, but I didn't need to do so. I knew that od could show me the data and that dd is capable of doing byte swaps (and a great deal more!); why not use the tools Unix provides?.

Finally, don't be afraid to use "silly" data formats. In this instance, I wanted a self-documenting, machine-independent way of storing the data. Having looked over the input files, I knew that there weren't any embedded single quotes or other problems. So, this was really the simplest way to do the job!

According to Larry Wall, laziness is a Perl virtue. I could perhaps have found a lazier way to solve this problem, but that might have been too much work! I have spent many pleasant hours learning about the Unix tool kit; in cases like this one, that effort pays off in a big way.

About the author

Rich Morin (rdm@cfcl.com) operates Prime Time Freeware (www.ptf.com), a publisher of books about Open Source software. Rich lives in San Bruno, on the San Francisco peninsula.