Saturday March 4, 2006

Polyglot Programming

Programmers who are facile with multiple languages frequently combine them, to great effect, in single projects. In Using PHP as a Macro Pre-processor, I only used two languages (HTML and PHP), but others (e.g., CSS, JavaScript) could easily have been added.

In a recent programming project, I found myself combining Perl, PHP, SQL, and YAML. Making things a bit more interesting, three of these languages were used in a single file. The results were a bit baroque, but the approach allowed me to get a lot of flexibility from a very small amount of code.

My client wanted a DBMS-driven report generator. Users would "subscribe" to specified reports, to be emailed on selected days. Early each morning, the program would perform the necessary SQL queries (etc), generate the reports, and email them to the appropriate users.

Although a CGI-based subscription page may be added at some point, an editable control file was all that I needed for the prototype. So, I sketched up a YAML (YAML Ain't Markup Language) configuration file, of the following form:

  # Describe each _query_, specifying any needed SQL statements
  # and/or executable scripts (e.g., output filters).

      sql:          |
        SELECT  foo
        FROM    Bar;
      script:       _script_

  # Describe each _report_, specifying each included _query_
  # and any surrounding text.

      subject:      'This is the subject line.'
      title:        'This is the title text.'
      description:  >
        This is a short (wrapped) description.
        - b:        |
            This is some block-formatted text.
        - t:        >
            This is some normal (wrapped) text.
        - q:        _query_

  # Describe each _user_, specifying each desired _report_
  # and when it should be delivered.

      address:      ''
      full_name:    'Rich Morin'
        - name:     _report_
          when:     'Mon Wed Fri'

Even if you don't know YAML or SQL, this should be pretty easy to read. As in Python, indentation is used to indicate structure. However, unlike Python, YAML disallows tabs (whew!).

Labels that end in colons (e.g., Reports:) are hash keys. A dash (-) indicates the start of a list element. Values may be strings (e.g., ''), hashes, or lists.

The | and > operators indicate that the following lines will contain, respectively, block-formatted or line-wrapped text. The next hash key or list sigil terminates this text.

Understanding the generated data structure is, however, a bit more complicated. Using the Load() function from Brian Ingerson's pure-Perl module, my script obtains a reference ($r) to the data structure. It can then access the enclosed content, as:

  $r->{Queries}{$query }{sql}             # SQL command for $query

  $r->{Reports}{$report}{content}[0]{b}   # some block-formatted text
  $r->{Reports}{$report}{content}[2]{q}   # the name of a _query_

  $r->{Users  }{reports}[0]{name}         # the name of a _report_
  $r->{Users  }{reports}[0]{when}         # a list (string, really) of days

The content area has a rather odd structure, in that each list element is a hash with only one element. This hack lets me add a content type flag (e.g., b, q, t), at a small cost in obscurity.

Enter PHP...

Given that I wrote the application and the "output filter" scripts in Perl, I was already using three languages (four, if you count the English text :-). However, things had only started to get ugly interesting. You see, many SQL queries are very similar in structure, varying only in specific details. According to the DRY (Don't Repeat Yourself) principle, this is a Bad Thing:

I wanted to "boil out" this repetition, but I didn't want to embed a macro facility (or whatever) in the report generator. Recognizing that laziness can be a virtue, I decided to use command-line PHP as a macro pre-processor.

The first steps were strictly administrative. I made the file executable, changed its extension to .php, and put a "shebang" line at the start of the file:

  #!/usr/bin/env php

I then had an executable file that would emit its own content (except for the shebang line), having first performed any requested PHP operations. One of the pleasant side benefits of this is that I no longer had to worry about the location of the file. Just put it into ~/bin and let the shell find it!

More critically, I could now mix PHP code into my YAML:

  <? # Loop over desired intervals.

    $i_hash  = array('today'     => 'NOW()',
                     'yesterday' => 'NOW() - INTERVAL 1 DAY',
                     'last_week' => 'NOW() - INTERVAL 7 DAY',

    $t_list  = array('foo', 'bar');

    foreach ($i_hash as $i_key => $i_val) {

      foreach ($t_list as $t_item) {
      <?= $t_item ?>_entries_for_<?= $i_key ?>:
          sql:            |
            SELECT    COUNT(*)
            FROM      Events
            WHERE     date       =  <?= "$i_val\n"  ?>
            AND       type       =  '<?= "$t_item\n" ?>';
  <? } ?>

This generates six queries, as follows:

          sql:            |
            SELECT    COUNT(*)
            FROM      Events
            WHERE     date       =  NOW()
            AND       type       =  'foo';

I should probably note that, unlike Perl's hashes, PHP's associative arrays are traversed in a predictable order, normally based on the order in which elements are defined. Alternatively, the array's keys can be sorted before the foreach loop traverses them:


In the actual application, a 200-line PHP/YAML/SQL file generated 500 lines of YAML/SQL. Although the file was admittedly more arcane-looking, it certainly met the DRY requirement and I felt more confident that I wasn't going to have copy-and paste errors, etc.

Technorati Tags: , , , ,

Polyglot Programming in Computers , Science , Technology - posted at Sat, 04 Mar, 23:00 Pacific | «e» | TrackBack