Serious FTP

Copyright (c) 1999-2001 by Rich Morin
published in Silicon Carny, April 1999


Walnut Creek CDROM (www.cdrom.com) sells lots of CD-ROMs, but they give away even more data. Specifically, anyone who has Internet access is free to log into wcarchive (ftp.cdrom.com) and start downloading bits.

Even with a good Internet connection, however, you should expect to be at it for a while. At the present time, wcarchive resides on half a terabyte (500 gigabytes) of RAID 5 disk storage. Even if your 56 Kbps modem can deliver seven kilobytes per second, downloading the complete archive would take you 70 million seconds Even then, some of the files would be more than two years out of date, so a bit of "back and fill" would be needed.

Of course, nobody uses wcarchive that way. Instead, they just drop in when they need the odd file or two. The FTP server is very accomodating; 3600 simultaneous download sessions is the current limit and an upgrade to 10,000 sessions is in the works.

This translates to about 800 GB per day of downloads. Bob Bruce (Walnut Creek's founder) says he's thinking about issuing a press release when they reach a terabyte a day. but 800 GB isn't all that shabby...

The Hardware

Because FTP archives don't do a lot of thinking, wcarchive doesn't need a massive cluster of CPUs. In fact, it gets by with a single 200 MHz P6 ("Pentium Pro") and a measly (!) 1 GB of RAM. The I/O support, however, is fairly impressive.

A six-channel Mylex RAID controller (DAC960SXI; Ultra-Wide SCSI-SCSI) is the centerpiece of the I/O subsystem. Two channels link it to the PC ("Personal Computer" !?!), via a dual-channel Adaptec card (AHA-3940AUW; PCI to Ultra-Wide SCSI). 256 MB of internal cache helps it to eliminate recurring disk accesses.

Four nine-drive disk arrays provide the actual storage. The two larger arrays use 18 GB (IBM) drives; the two smaller arrays use 9 GB (Micropolis, Quantum) drives. A separate four GB (Quantum) drive is used as the "system disk".

The output side is handled by a single Intel 100Base-T controller (Pro/100B PCI), which feeds into the Internet through a number of (shared) DS3 (45 Mbps) and OC3 (155 Mbps) circuits.

A detailed description of the system is available as ftp.cdrom.com/archive-info/configuration; A picture of the machine is available at: ftp.cdrom.com/archive-info/wcarchive.jpg.

The Software

The system software is rather prosaic: a copy of FreeBSD (www.freebsd.org), supplemented by home-grown FTP mirroring and server code. Because of the massive hardware support, the software "only" needs to keep the I/O going in an efficient and reliable manner.

FreeBSD, the "prosaic" operating system mentioned above, merits a bit more discussion. Like Linux, FreeBSD is Open Source. Anyone can examine, modify, and/or redistribute the source code. And, like Linux, an active user community helps the authors to find bugs, improve documentation, and generally support the OS.

Unlike Linux, FreeBSD is derived from the "Berkeley Unix" code that forms the foundation for most commercial Unix variants. When you use the "fast file system" (cylinder groups, long file names, symbolic links, etc.), TCP/IP networking, termcap, or even vi, you are using Berkeley Unix additions.

The version of BSD underlying FreeBSD, however, is "pure" BSD; don't look for the System V modifications you see in Solaris. Instead, think of it as SunOS, brought up to date with Kerberos, modern Sendmail, an updated file system, and more. Solid, fast, and free!

One of FreeBSD's finest innovations, the "Ports Collection", makes FreeBSD a delight for Open Source application users. The Ports Collection automates the downloading, building, and installation (including de-installation) of 2300+ Open Source packages. For more information, see www.freebsd.org/ports

The Company

Walnut Creek CDROM has been around for several years now, so you are likely to be familiar with their offerings. You may not realize, however, that they provide the major financial support for FreeBSD.

The FreeBSD support has two purposes. First, it provides the company with a solid base to run wcarchive and other massive projects. Second, it ties in with the company's mission of making software (and data) economically accessible.

Bob Bruce, the firm's founder, is an interesting guy: laid back and somewhat conservative in manner, but productive and innovative in practice. Here is a possibly illustrative story.

When Bob started selling CD-ROMs, disc "caddys" were selling for $15 each. Bob thought that was rather high, so he started investigating the marketplace. A long-distance call to Japan got him Sony's fax number; a series of faxes got him in touch with the salespeople.

It turned out that caddys were available, in bulk, for only a few dollars each. Bulk, in this case, meant pallet-loads of 10,000 caddys. In an act of great faith, Bob purchased a pallet of caddys, then proceeded to sell them for five dollars each.

The results were everything he might have wished. Folks who bought his CD-ROMs added caddies to their orders; folks who bought piles of caddies added in a disc or two. Either way, Walnut Creek CDROM was making a name for itself.

Many pallet-loads later, the company is still selling caddies, making and distributing CDROMs, and giving away bits. Walnut Creek CDROM is a real Open Source success story; their breadth and depth of offerings (listed on www.cdrom.com) is well worth a look...

About the author

Rich Morin (rdm@cfcl.com) operates Prime Time Freeware (www.ptf.com), a publisher of books about Open Source software. Rich lives in San Bruno, on the San Francisco peninsula.