Archives

Sunday November 2, 2008

Using Ruby, Perl, Python, and PHP in concert...

About a year ago, I was tasked with "tidying up" the Protégé-Frames User's Guide. This consisted of a few hundred files of rather ugly machine-generated HTML. To reduce the effort of cleaning up and maintaining the files, as well as the chance of editing errors, I wanted to mechanize the process as much as possible. That is, eliminate the repetitive and voluminous header and footer blocks, auto-generate indexes and navigational links, etc.

After writing some support code, I was able to re-cast the HTML into RHTML (HTML with embedded Ruby). A two-pass generation process allowed me to collect page names and navigational information, then generate both index pages and nicely cross-linked content pages. However, that wasn't the end of the story...

A new release of Protégé-Frames is coming out soon, so the folks at the Stanford Center for Biomedical Informatics Research wanted an updated User's Guide. They also wanted something that would allow their user community to assist in the editing process. They already had an instance of MediaWiki in place, so we decided to pour the Guide into that.

The first problem, obviously, was to turn the HTML into MediaWiki markup. Looking around, I found HTML::WikiConverter on the CPAN. This library converts HTML into a number of Wiki markup formats (including MediaWiki :-).

In a few cases where the WikiConverter had problems with its input, I was able to tweak my RHTML support code and/or filter the intermediate HTML. And, of course, I could also filter the resulting MediaWiki markup. Given that this was a one-shot application, I was willing to be a bit hacky, as long as the results were clean.

The next problem was to load the wiki pages and accompanying image files into MediaWiki. The "category killer" for this task is PyWikipediaBot, a suite that includes a support library and a number of front-end commands. With a bit of tweaking, PyWikipediaBot performed admirably, loading several hundred pages and images with no complaint.

So, in summary, the project used Ruby to generate HTML, Perl to convert this to MediaWiki markup, and Python to upload the result into a (PHP-based) wiki. Although using such a plethora of languages caused me some grief at times (no, this is Perl, not Ruby...), everything worked largely as advertised.

I keep hoping that Parrot (or whatever) will eventually allow me to use (say) Perl libraries from Ruby. However, in this case, nothing of the sort was needed. Passing ASCII files between programs, in traditional Unix style, gave me everything I needed to stitch these languages together.

Using Ruby, Perl, Python, and PHP in concert... in Computers , Technology - posted at Sun, 02 Nov, 23:20 Pacific | «e» | TrackBack


Post a comment

Note: All comments are subject to approval. Spam will be deleted before anyone ever sees it. Excessive use of URLs may cause comments to be auto-junked. You have been warned.

Any posted comments will be viewable by all visitors. Please try to stay relevant ;-) If you simply want to say something to me, please send me email.