Infrastructure for Open Source

Copyright (c) 1999-2001 by Rich Morin
published in Silicon Carny, October 1999


The Open Source community has very uneven levels of automated support for its users and developers. What can (and should) we do to improve this situation?

As noted last month, the Perl community is served by a wide range of excellent books. It also has two magazines (and columns in others), thousands of pages of online documentation, a fancy bug-reporting system (perlbug), a distributed worldwide archive (the CPAN), and a plethora of mailing lists, newsgroups, and web sites.

Most Open Source projects, however, are far less fortunate. The typical Open Source project, in fact, is lucky to have a web site and a mailing list to its name. Its offerings may be relegated to temporary storage on a university FTP server, or worse, scattered over several disconnected servers.

Indexing

Despite the indexing efforts of Debian, FileWatcher, FreeBSD, Freshmeat, and others, particular packages can be very difficult for users to find.

The different indexes use different names for the same packages and create different hierarchies to contain them. FileWatcher, which currently tracks some 10,000 packages, is clearly the largest of these efforts. Their hierarchy is also the best-developed, in my estimation.

Nonetheless, it can be a trial for a user to find, download, unpack, and examine a multi-megabyte piece of software, only to find that it isn't relevant to the user's needs, hardware, or operating system environment. An easier way should be found to let users browse the source code and documentation for arbitrary Open Source packages.

Porting and Installation

Even when a user has located the original tarball for a package, there may be difficulties in installing the package. Has the current version of the package been ported to the user's system? Where might this port reside? Is the installation painless and/or well-documented? What other packages need to be installed first?

The Debian and FreeBSD folks have automated solutions that lead the field in handling these matters. Unfortunately, the solutions only work for their own users. If you're on a Solaris system, the FreeBSD work is (mostly) irrelevant. For that matter, the Debian work is of no direct benefit to a FreeBSD user.

A generalized porting and installation solution is clearly needed, but it is far from clear who will develop it. The Debian and FreeBSD folks have their hands full, keeping track of more than 3000 packages each. The Red Hat folks seem to be content with RPMs and nobody else is even looking at the problem.

The Open Source Matrix

The sheer size of the Open Source Matrix is a critical part of the porting and installation problem. Assume, for purposes of discussion, that we wish to support five hardware architectures (e.g., Alpha, I386, PA-RISC, PowerPC, and SPARC) and ten Unixish operating system families (e.g., AIX, *BSD, Digital Unix, HP-UX, IRIX, Linux, Mac OS X, Solaris, SunOS, and UnixWare).

This gives us a matrix of 50 architecture/OS combinations, containing perhaps 25 worthwhile ports. Actually, the porting matrix is far larger than this; for instance, neither *BSD nor Linux are completely standardized. Nonetheless, this is a reasonable starting point for our analysis.

25 ports is a challenging task for any Open Source developer. How can Dave Developer gain access to that range of machines, let alone find the time to learn about their vagaries and port his software to them? That's an easy question; he won't.

Nor, being an isolated, overworked hacker, will he set up an Internet-accessible CVS repository to accept patches and ports from volunteers around the world. They might have the needed access, time, and skills, but Dave simply doesn't have the resources to perform the needed administration.

Similarly, Dave may not find the time to set up a bug reporting system, email lists, a well-developed web site, FAQs, and the other facilities he'd love to provide. He'd like to, but he doesn't have the time...

Now let's say that Paul Programmer wants to try porting Dave's "wombat" package to an Intel/Solaris machine. Paul can start from the base wombat distribution (whatever Dave Developer had at his site) or try to locate a few similar ports (e.g., SPARC/Solaris and Intel/FreeBSD). Either method will take skill, luck, and persistence.

Now step back and consider the big picture: 25 ports each of 10,000+ packages. Even using these (rather conservative) numbers, we have more than 250,000 ports to accomplish. And, of course, new packages (and new versions) are coming out of the woodwork every day! Without proper infrastructure, we can't even track this many items, let alone hope to make a dent in them.

Automated Assistance

So, let's give the developers (and ourselves) some help. Let's create a set of CVS (or whatever) repositories that can hold every interesting version of every port of every significant Open Source package. At current mass storage prices, this would cost only a few thousand dollars.

Developers could use the repositories to find relevant versions of packages. With luck, they might even end up folding some of the variant strains back together, reducing the problem space a little.

By making these repositories accessible via web browsers, we could let Paul Programmer (and Alice Administrator) find and peruse specific package's code and documentation. If Paul or Alice can browse a package in fifteen seconds, rather than fifteen minutes or an hour, they may be able to find what they need far more quickly.

While we're at it, let's provide a "dating service" for system vendors and developers. If Dave needs access to a particular kind of machine, for porting or regression testing of his software package, shouldn't the machine's vendor be willing to provide it? If the vendor knows that any incoming developers have been "vouched for" by a responsible party, there really isn't all that much of a risk...

Some developers (or interested users) might also be willing to maintain current information for specific packages, as long as they don't have to find an ISP, edit web pages, and the like. Thus, a semi-automated information-gathering system, based on one or more of the existing Open Source indexes, might be a real possibility.

By providing free, centralized support facilities, we might be able to greatly increase the ease with which all of us interact with Open Source software. In any case, I think it's worth a try.

About the author

Rich Morin (rdm@cfcl.com) operates Prime Time Freeware (www.ptf.com), a publisher of books about Open Source software. Rich lives in San Bruno, on the San Francisco peninsula.