Saturday July 18, 2009
Mechanizing the Path to Ruby 1.9In What do we need to get on Ruby 1.9?, Yehuda Katz says he thinks it's time to get serious about migrating the Ruby community to Ruby 1.9. He then asks for specific information on show-stoppers: gems, plugins, tools, and such that don't yet work on Ruby 1.9. His request has already resulted in a lot of opinions and anecdotal evidence, but I think there is a fairly obvious way to get better information.
The basic idea is that we should create and use mechanized tools to help in assessing and tracking infrastructure dependencies and Ruby 1.9 compatibility issues. This involves harvesting and analyzing information, as discussed below, but it's just "a simple matter of software" (:-). Specifically, it doesn't appear to require any Computer Science breakthroughs or even a great deal of new code.
Step one is to inventory the major archives of Ruby code and analyze their dependencies. RubyGems makes this pretty easy. Gem servers commonly provide YAML snapshots of Gem-related metadata (generated by "gem generate_index"). GitHub and RubyForge, for example, provide http://gems.github.com/yaml and http://gems.rubyforge.org/yaml. These (large :) pages contain everything we need to generate Gem dependency graphs.
It may be a bit more work, but we can also get the same sort of information for plugins, tools, etc. So, we should be able to work out a relatively complete graph of Ruby infrastructure dependencies. This, by itself, would be quite useful to have.
For example, by knowing how many items (eg, Gems) depend on a given item, we can estimate how critical the item is to the ecosystem (Google's PageRank algorithm uses a variation on this technique). Fortunately, we don't have to content with the effects of search engine optimization and such, but we still have to consider global popularity. If an item is used by a number of other items, but few apps use any of them, it may not be all that critical.
Step two is to run each item's test suite, finding out which tests exhibit Ruby 1.9-specific problems. This is a much bigger step, for a variety of reasons. Let's look at some of them...
Step three is to present the information in a digestible and useful fashion. Ideally, we'd be able to get reports on the overall situation, showing us which issues are the most critical to resolve. Follow-up reports, allowing us to "drill down" into particular questions, would also be useful.
It's not clear (to me, at least) what kinds of reports we'll ultimately need. We're dealing with a large amount of highly-interrelated information. Diagramming all of the dependencies (eg, via GraphViz) might be impressive, but isn't going to be useful. Humans aren't all that good at reading complex diagrams. So, we'll need some finer-grained approaches.
Step four is to provide a forum for collaboration. If there's a convenient and well-known place to discuss outstanding issues, more developers may be interested in contributing. Wikipedia and (closer to home) RubySpec are existence proofs for this sort of thing, but building up a "critical mass" of volunteers is non-trivial. Again, some corporate sponsorship might be useful, lending needed credence and visibility.
Dan Kubb suggests that we could adopt some approaches taken by the CPAN (Perl's humongous archive of modules):
When new gems are downloaded, their specs could be run first, with installation only occurring if they pass. This would bring more compatibility issues to light than just blindly installing the gem and finding problems at runtime. Plus, if we provide a nice way for the gem to phone home with the results (after prompting for permission, of course), we could aggregate spec failures someplace, providing gem authors with a lot of information about platform- and version-specific bugs.
I don't know of anyone who is working on this exact project, but there is clearly a lot of existing work that could be leveraged. Harvesting Open Source files and metadata, for example, is far easier than it was in years past. There are also a variety of technologies (eg, Condor, Nanite) that can help in handling scaling issues.
As it happens, I'm already working on a project that harvests and analyzes Ruby code. PARSE (Punish All Ruby Software Equally) is supposed to run MetricFu, YARD, and some home-grown tools on a wide swath of Ruby code. Checking for Ruby 1.9 (and eventually, 2.*) compatibility is an obvious fit.
That said, the Ruby 1.9 migration effort will have its own needs and schedule, so it should have its own tool chain. (Nor do I want Ontiki or PARSE to be on its critical path. :-) However, I'd be more than happy to work with anyone who is interested in crafting such a tool chain.