Building a Browser for the FreeBSD File Tree

Copyright (c) 2001 by Rich Morin
published in Silicon Carny, January 2001


The proof-of-concept demo for the Meta Project is a Web-based file tree browser for the FreeBSD distribution. With some 50K directories and files to cover, annotating everything by hand would be impractical. Instead, Rich Morin uses existing man page information and simple pattern recognition.

As mentioned last month, I'm working on a proof-of-concept demonstration for the Meta Project. By eliminating a large number of complicating issues (e.g., multiple operating systems, add-in packages, local variations, distributed access), I've been able to reduce the scale of the demo to something that a single volunteer can reasonably accomplish. The remainder, though highly limited in scope, should still give an idea of Meta's promise.

The demo, in any event, will be a Web-based file tree browser for the FreeBSD distribution. Users will interact with a set of 50K Web pages, each of which describes a given file or directory. A mock-up of a sample Web page is available online at http://www.cfcl.com/Meta/sample.html.

Even with such a limited goal, the amount of information that must be handled is rather imposing. Producing 50K annotation files by hand is out of the question, so a computer-assisted method will be required. My approach is to use existing information (e.g., from man pages) and simple pattern recognition to handle large portions of the filesystem, then create several hundred annotation files to cover gaps in the upper portions of the file tree.

The annotation files are largely in place and a man page parser has been written. My remaining problem is to recognize, understand, and mechanize the patterns I see. Fortunately, FreeBSD's excellent documentation, tool set, and internal organization make this approach quite possible.

File names and extensions

Experienced Unix users are familiar with a large number of file naming patterns. If a file is named Makefile, we know it's a control file for make(1). A file named readme (in whatever capitalization scheme) is obviously a text document.

Similarly, if a file has a well-known extension (e.g., a, c, h, html), we can be pretty confident of its type. The file(1) command makes use of extensions, along with magic numbers and other clues, in guessing the nature of files.

By using the file command's output, Meta can (fairly reliably) say things about the nature of almost any file. Adding some contextual help (e.g., from annotation files and the nature of the enclosing directory), it can determine even more information about the file.

Full path names

The full path name of the file contains a great deal of information. In theory, each node in the path says something about the nature of the file. Let's look at the file /usr/bin/vi to see what we can find out from its name.

Any directory named bin is likely to contain commands that are executable by users, so vi is likely to be a user command. Because FreeBSD's /usr directory is not mounted in single-user mode, the files in /usr cannot be critical to basic system operation. So, /usr/bin/vi cannot be a critical command.

Nearly all commands have man pages, and the man pages for user commands are stored in section one. So, we can reasonably expect to find a man page for vi(1). The cattable (ASCII text) version of the man page will be located in a file named /usr/man/cat1/vi.1.gz; the troff source code, if present, will be located in /usr/man/man1/vi.1.gz.

Similarly, there are also patterns that relate entire directory sub-trees to each other. In FreeBSD, the build tree for /usr/bin is named /usr/src/usr.bin; the output files are stored in /usr/obj/usr/src/usr.bin. This regularity was needed to make the system build process work smoothly, but it can also be used by Meta!

In this case, the regularity tells us that /usr/src/usr.bin/vi is the build directory for /usr/bin/vi. Looking at the build directory, however, we see that it contains only a Makefile and some C header files. The Makefile, however, tells us that:

    SRCDIR=  ${.CURDIR}/../../contrib/nvi

So, we know that /usr/src/contrib/nvi contains the remaining source code for the command. The Web page can thus include links to both source directories.

ls(1), stat(3), etc.

The filesystem itself can also provide useful information. By using the Perl equivalents of ls(1) and stat(3), Meta can find out assorted useful things. For instance, the link count for /usr/bin/vi is six, indicating there are five other names for the file. Because these names will all have the same inode number, we can find them by examining the /usr filesystem:

    % cd /usr/bin
    % ls -i vi
    8200 vi
    % ls -i * | grep 8200
    8200 ex
    8200 nex
    8200 nvi
    8200 nview
    8200 vi
    8200 view

We now know that ex, nex, nvi, nview, vi, and view all share the same binary, relying on the command name that was used in a given instance to control the program's operational characteristics. This is valuable information: these files are very strongly related! This also helps explain why there are no explicit build directories for ex, nex, etc. So, if we need to create a Web page for /usr/bin/ex, we can correctly fill in the build directory links.

Man pages

Although man pages are not intended for consumption by programs, they contain a great deal of information that's fairly easy to extract. For instance, if a set of man pages shares a single file (as in the case of /usr/share/man/cat1/vi.1.gz), the commands described are sure to be closely related (though not necessarily, as in this case, the same file).

There are also some useful sections, such as Files and See Also, that can be parsed for linkage information. Finally, the main body of the man page text can be scanned for things that look like path names. For this demo, my scripts are reading the ASCII versions of the man pages; ultimately, however, Meta should be able to parse the troff source code in order to get more precise information.

Graph theory

By bringing together the results of these kinds of rules and my own annotations regarding file linkage, Meta can assemble quite a large set of relationships. Then, using some simple graph theory, it can make educated guesses about which files are related enough to display on a given file's Web page. For instance, Meta can use the linkage information I entered for /usr/share/vi/{catalog, perl,tcl} to pull their names into the Web page for /usr/bin/vi. A bit of tuning will no doubt be required; if Meta were to follow every link it found, recursively, it would eventually bring in most of the file tree! I expect to find, however, that some simple rules will suffice to keep this from happening.

Future directions

Once the basic Web pages are in place, there are a number of directions that Meta development could take. One intriguing possibility would be the addition of data flow diagrams for assorted subsystems, set up as clickable image maps. This could allow the user to see the relationships between sets of files, using the image map to access descriptions of the depicted files and links.

Another possibility, once the pattern recognition and file parsing code are well in hand, would be to add the contents of the FreeBSD Ports Collection to the mix. This would allow users to see how their system might look if, say, Apache or Majordomo were installed. Additional volunteers would, however, be needed to provide manual annotation for the add-on packages.

About the author

Rich Morin (rdm@cfcl.com) operates Prime Time Freeware (www.ptf.com), a publisher of books about Open Source software. Rich lives in San Bruno, on the San Francisco peninsula.