Archives

Saturday July 18, 2009

Mechanizing the Path to Ruby 1.9

In What do we need to get on Ruby 1.9?, Yehuda Katz says he thinks it's time to get serious about migrating the Ruby community to Ruby 1.9. He then asks for specific information on show-stoppers: gems, plugins, tools, and such that don't yet work on Ruby 1.9. His request has already resulted in a lot of opinions and anecdotal evidence, but I think there is a fairly obvious way to get better information.

Precis

The basic idea is that we should create and use mechanized tools to help in assessing and tracking infrastructure dependencies and Ruby 1.9 compatibility issues. This involves harvesting and analyzing information, as discussed below, but it's just "a simple matter of software" (:-). Specifically, it doesn't appear to require any Computer Science breakthroughs or even a great deal of new code.

Dependency Analysis

Step one is to inventory the major archives of Ruby code and analyze their dependencies. RubyGems makes this pretty easy. Gem servers commonly provide YAML snapshots of Gem-related metadata (generated by "gem generate_index"). GitHub and RubyForge, for example, provide http://gems.github.com/yaml and http://gems.rubyforge.org/yaml. These (large :) pages contain everything we need to generate Gem dependency graphs.

It may be a bit more work, but we can also get the same sort of information for plugins, tools, etc. So, we should be able to work out a relatively complete graph of Ruby infrastructure dependencies. This, by itself, would be quite useful to have.

For example, by knowing how many items (eg, Gems) depend on a given item, we can estimate how critical the item is to the ecosystem (Google's PageRank algorithm uses a variation on this technique). Fortunately, we don't have to content with the effects of search engine optimization and such, but we still have to consider global popularity. If an item is used by a number of other items, but few apps use any of them, it may not be all that critical.

Compatibility Analysis

Step two is to run each item's test suite, finding out which tests exhibit Ruby 1.9-specific problems. This is a much bigger step, for a variety of reasons. Let's look at some of them...

  • dependencies - Some tests may depend on items which are broken on Ruby 1.9. However, this is not a total show-stopper. Let's say that gem2 depends on gem1. Even if gem1 is broken, we may still be able to run a number of gem2's tests. For example, a given unit test may not actually use any part of gem1.

  • resources - The required resources are substantial. Processing and storage aren't a big deal: disk storage is cheap, at these levels, and most packages don't release all that often. However, development and administration are likely to require some Real Work (TM). This might happen on a volunteer basis, but some corporate sponsorship could certainly help it along.

  • security - All tests need to be run in a secure manner. Nobody wants to find out that running a test suite has trashed their machine. So, for example, it might be appropriate to run each package's tests in a fresh VM.

  • setup - Some tests may require a lot of setup. If this hasn't been totally mechanized, the effort has to be balanced against the results. That said, it may be possible to "crowdsource" this effort, taking advantage of existing test setups in the wild.

Presentation

Step three is to present the information in a digestible and useful fashion. Ideally, we'd be able to get reports on the overall situation, showing us which issues are the most critical to resolve. Follow-up reports, allowing us to "drill down" into particular questions, would also be useful.

It's not clear (to me, at least) what kinds of reports we'll ultimately need. We're dealing with a large amount of highly-interrelated information. Diagramming all of the dependencies (eg, via GraphViz) might be impressive, but isn't going to be useful. Humans aren't all that good at reading complex diagrams. So, we'll need some finer-grained approaches.

Collaboration

Step four is to provide a forum for collaboration. If there's a convenient and well-known place to discuss outstanding issues, more developers may be interested in contributing. Wikipedia and (closer to home) RubySpec are existence proofs for this sort of thing, but building up a "critical mass" of volunteers is non-trivial. Again, some corporate sponsorship might be useful, lending needed credence and visibility.

Dan Kubb suggests that we could adopt some approaches taken by the CPAN (Perl's humongous archive of modules):

When new gems are downloaded, their specs could be run first, with installation only occurring if they pass. This would bring more compatibility issues to light than just blindly installing the gem and finding problems at runtime. Plus, if we provide a nice way for the gem to phone home with the results (after prompting for permission, of course), we could aggregate spec failures someplace, providing gem authors with a lot of information about platform- and version-specific bugs.

The CPAN Testers Matrix, for example, has a distributed testing system where volunteers can install a small app on their machine and it'll sync up with CPAN and run the tests for packages, reporting the results. Here is some example output for CGI-State 0.02.

If we had something like this, it would be easy to see what versions and platforms a gem works with. CPAN is the most advanced language-specific distribution system, so we should be looking to them for ideas and inspiration. They are at least 3-5 years ahead of anything we have available now, but a concerted effort could make significant progress on catching up.

Status

I don't know of anyone who is working on this exact project, but there is clearly a lot of existing work that could be leveraged. Harvesting Open Source files and metadata, for example, is far easier than it was in years past. There are also a variety of technologies (eg, Condor, Nanite) that can help in handling scaling issues.

As it happens, I'm already working on a project that harvests and analyzes Ruby code. PARSE (Punish All Ruby Software Equally) is supposed to run MetricFu, YARD, and some home-grown tools on a wide swath of Ruby code. Checking for Ruby 1.9 (and eventually, 2.*) compatibility is an obvious fit.

That said, the Ruby 1.9 migration effort will have its own needs and schedule, so it should have its own tool chain. (Nor do I want Ontiki or PARSE to be on its critical path. :-) However, I'd be more than happy to work with anyone who is interested in crafting such a tool chain.

Mechanizing the Path to Ruby 1.9 - posted at Sat, 18 Jul, 14:59 Pacific | «e» | TrackBack


Post a comment

Note: All comments are subject to approval. Spam will be deleted before anyone ever sees it. Excessive use of URLs may cause comments to be auto-junked. You have been warned.

Any posted comments will be viewable by all visitors. Please try to stay relevant ;-) If you simply want to say something to me, please send me email.