Copyright (c) 2001 by Rich Morin
published in Silicon Carny, January 2001
The proof-of-concept demo for the Meta Project is a Web-based file tree browser for the FreeBSD distribution. With some 50K directories and files to cover, annotating everything by hand would be impractical. Instead, Rich Morin uses existing man page information and simple pattern recognition. As mentioned last month, I'm working on a proof-of-concept demonstration for the Meta Project. By eliminating a large number of complicating issues (e.g., multiple operating systems, add-in packages, local variations, distributed access), I've been able to reduce the scale of the demo to something that a single volunteer can reasonably accomplish. The remainder, though highly limited in scope, should still give an idea of Meta's promise. The demo, in any event, will be a Web-based file tree browser for the FreeBSD distribution. Users will interact with a set of 50K Web pages, each of which describes a given file or directory. A mock-up of a sample Web page is available online at http://www.cfcl.com/Meta/sample.html. Even with such a limited goal, the amount of information that must be handled is rather imposing. Producing 50K annotation files by hand is out of the question, so a computer-assisted method will be required. My approach is to use existing information (e.g., from man pages) and simple pattern recognition to handle large portions of the filesystem, then create several hundred annotation files to cover gaps in the upper portions of the file tree. The annotation files are largely in place and a man page parser has been written. My remaining problem is to recognize, understand, and mechanize the patterns I see. Fortunately, FreeBSD's excellent documentation, tool set, and internal organization make this approach quite possible. File names and extensions
Experienced Unix users are familiar with a large number of file naming
patterns. If a file is named Makefile, we know it's a control
file for
Similarly, if a file has a well-known extension (e.g., a, c, h, html),
we can be pretty confident of its type. The By using the file command's output, Meta can (fairly reliably) say things about the nature of almost any file. Adding some contextual help (e.g., from annotation files and the nature of the enclosing directory), it can determine even more information about the file. Full path names
The full path name of the file contains a great deal of information. In
theory, each node in the path says something about the nature of the
file. Let's look at the file
Any directory named
Nearly all commands have man pages, and the man pages for user commands
are stored in section one. So, we can reasonably expect to find a man
page for
Similarly, there are also patterns that relate entire directory
sub-trees to each other. In FreeBSD, the build tree for
In this case, the regularity tells us that
SRCDIR= ${.CURDIR}/../../contrib/nvi
So, we know that ls(1), stat(3), etc.
The filesystem itself can also provide useful information. By using the
Perl equivalents of
% cd /usr/bin
% ls -i vi
8200 vi
% ls -i * | grep 8200
8200 ex
8200 nex
8200 nvi
8200 nview
8200 vi
8200 view
We now know that Man pages
Although man pages are not intended for consumption by programs, they
contain a great deal of information that's fairly easy to extract. For
instance, if a set of man pages shares a single file (as in the case of
There are also some useful sections, such as Files and See Also, that can be parsed for linkage information. Finally, the main body of the man page text can be scanned for things that look like path names. For this demo, my scripts are reading the ASCII versions of the man pages; ultimately, however, Meta should be able to parse the troff source code in order to get more precise information. Graph theory
By bringing together the results of these kinds of rules and my own
annotations regarding file linkage, Meta can assemble quite a large set
of relationships. Then, using some simple graph theory, it can make
educated guesses about which files are related enough to display on a
given file's Web page. For instance, Meta can use the linkage
information I entered for Future directionsOnce the basic Web pages are in place, there are a number of directions that Meta development could take. One intriguing possibility would be the addition of data flow diagrams for assorted subsystems, set up as clickable image maps. This could allow the user to see the relationships between sets of files, using the image map to access descriptions of the depicted files and links. Another possibility, once the pattern recognition and file parsing code are well in hand, would be to add the contents of the FreeBSD Ports Collection to the mix. This would allow users to see how their system might look if, say, Apache or Majordomo were installed. Additional volunteers would, however, be needed to provide manual annotation for the add-on packages. About the authorRich Morin (rdm@cfcl.com) operates Prime Time Freeware (www.ptf.com), a publisher of books about Open Source software. Rich lives in San Bruno, on the San Francisco peninsula. |