Integrating System Metadata and Documentation
Rich Morin, rdm@cfcl.com
Canta Forda Computer Laboratory
The modular nature of Unix and similar operating systems
provides great adaptability,
but there is a corresponding cost in complexity.
By integrating system metadata and documentation,
this complexity can be tamed.
Sections:
The Documentation Problem,
The Metadata Problem,
What's Under the Hood?,
Eclectic Systems,
What's the Problem?,
Existing Work,
Intriguing Ideas,
Design Goals,
The Meta Proposal,
Why XML?,
Possible Data Sets,
Supported Software,
Project Status,
Getting Started,
File Characterization and Conversion,
Other Opportunities,
The Meta Wiki
The Documentation Problem
Measured by sheer volume,
Unix (and related systems) are superbly documented.
We have a wealth of online and printed documentation.
Unfortunately, sheer volume isn't everything
or even (really) enough.
If administrators, programmers, or users
can't find the information they want,
in a reasonable period of time,
the documentation isn't doing its job.
And, to a large degree, it isn't.
The reason, in a nutshell,
is lack of integration.
Although individual documentation sets
(e.g., man pages, the perlinfo suite)
may be reasonably well integrated, the entirety is not.
To take a simple example,
consider the problem
of finding out the contents of an inode.
On my FreeBSD system,
I find man pages for
inode(5),
ls(1), and
stat(2).
These are all reasonable places to look;
why is it that none of them
contains a See Also reference to the others?
More generally,
although Perl has a fine stat function,
as well as some handy file test operators,
I won't find them listed as first-class citizens
in the man pages.
Rather, they are hidden down
in the perlfunc(1) page,
well out of the reach of apropos(1).
Multiply this by dozens of languages
and hundreds of tools
and you begin to see the problem.
In order to find the information I need,
I have to know which documentation subsystem to ask.
This, in a word, is broken.
I shouldn't have to know where information is located
in order to find it; that's what computers are for!
The situation gets even worse when we move away
from indexed documentation subsystems.
I wouldn't expect a book or a magazine
to have an online keyword index
(though that isn't such a bad idea!),
but what about research papers
and other documents
that accompany software packages?
Far from being indexed online,
these documents are very seldom mentioned
in the online documentation.
Worse, they may be in any of a variety of formats,
forcing the user to hunt for (and perhaps acquire)
the appropriate formatting tool(s),
just to see if the paper is actually useful.
"Information delayed is information denied"
(or some such sweeping assertion :-).
If it takes too much time and effort
to find the information I need,
I might as well not have it at all.
The Metadata Problem
If, as asserted above,
the documentation is broken,
the integration of system metadata is a complete disaster.
With sufficient effort,
almost any desired piece of metadata can be acquired.
The user might have to parse a data file,
run a command,
or grovel through some source code,
but it's all there.
Finding the appropriate data file, command, or source code is,
of course, the user's problem.
Just look it up in the documentation (:-).
Wouldn't it be a better idea for our community
to make all of this information
more easily available?
Having ascertained the information once,
why not make it readily available for everyone?
Simply storing copies of the system metadata
in consistent and documented locations and formats
would be a great start.
Folding in the system documentation
provides a complete resource for system information.
And that, indeed, is what Meta is all about.
What's Under the Hood?
Bob Young, of Red Hat,
compares binary software distributions to cars
that have the "hood welded shut".
Access to source code "opens the hood",
but it is far from a complete solution.
Without comprehensive documentation,
source code access can lead prospective maintainers
(however qualified, otherwise)
into wasted effort, frustration, and error.
To work productively,
car mechanics need a fair amount of documentation:
shop manuals,
catalogs of replacement and add-on parts, etc.
A mechanic at a car dealership will have all of these,
possibly in online form.
Back at the factory,
car designers and engineers will have access
to the complete manufacturing specifications
for every part in every model of the car, year by year.
In commercial aviation,
similar systems are used
to track the complete configuration of each airplane
in a fleet.
If a gasket is replaced,
the appropriate database record is updated.
Consequently, an airline mechanic never has to wonder
about the precise configuration or functioning
of a given subsystem.
Problems are also tracked globally,
generating replacement bulletins for problematic components.
Thus, mechanics are automatically informed
when a given part needs
to be checked, adjusted, repaired, or replaced.
As complicated as cars and airplanes are, however,
they are far simpler than operating systems.
Thus, computer operating systems could benefit even more
from this kind of support.
If every part of the operating software
on a running system were cataloged and documented,
a maintainer (e.g., administrator or developer)
would never have to guess at a file's format or purpose.
Like airplane parts and subsystems,
files (e.g., /etc/passwd)
and software packages (e.g., bintools, Sendmail)
can be described and tracked.
Although parts and files may be modified,
they seldom disappear or change in function.
Similarly, subsystems and packages tend to be
fairly consistent in content and structure
from release to release.
Thus, little human annotation
(i.e., non-automated documentation)
is needed for "maintenance".
Online catalogs of "replacement parts",
complete with descriptions and dependency information,
could allow maintainers
to perform fast and safe (sub)system upgrades.
With appropriate safeguards,
automated systems could perform recommended upgrades,
keeping systems up to date with important patches.
Not every administrator will want automated upgrades,
but most would be happy
to have current, browsable system documentation.
If all possible add-on packages were also documented
(including dependencies and caveats),
many installation hassles and risks would be reduced.
Eclectic Systems
Unix variants (e.g., HP-UX, Solaris, UnixWare)
and their Open Source cousins (e.g., *BSD, Linux)
play a major role in the world of computing.
In fact, these systems have become part
of the infrastructure of enterprises around the world.
Thus, their stability is of critical interest
to a large number of institutions.
Unlike more proprietary operating systems, however,
they are not the monolithic creations
of individual corporations.
Instead, they are assembled (and customized)
by a variety of "integrators"
(e.g., Debian, FreeBSD, Red Hat, Sun),
using "packages" (e.g., GNU tools, Perl, Sendmail)
that have been developed and maintained
by a wide-ranging community
of users, administrators, and programmers.
Thus:
-
Distributions are highly modular at the file level.
Distributions contain thousands of files,
generating still more files as they are used.
Files can be added, modified, or replaced
to customize system behavior.
Taking advantage of this,
local sites commonly add commands,
customize subsystems, etc.
-
The development process is distributed and cooperative.
Distributions contain elements (programs and complete subsystems)
developed by individuals and organizations around the world.
No centralized authority mandates engineering practices;
voluntary cooperation (i.e., "rough consensus and running code")
takes its place.
There is, at present,
no collective term which describes the union
of Unix and Open Source operating systems.
The term "Unixish systems" is memorable, if rather informal,
but legal constraints prevent its large-scale use.
So, given the extremely eclectic nature of these systems,
I will refer to them as "Eclectic Systems".
If this usage catches on,
we can adopt "Eclectix" as the generic name (:-).
What's the Problem?
This combination of complexity and variability
can make Eclectic Systems difficult to administer.
Automated administration tools
can solve a large class of "expected" problems,
but the human administrator must be able
to "jump in" when things break.
Finding out the purpose, or even the format,
of a given file
can be an exercise in frustration;
modifying files blindly can be a prelude to disaster!
And, although security concerns provide strong reasons
for administrators to keep versions current,
the difficulty of doing so is overwhelming to many.
Maintaining a truly robust and secure system
is such a time-consuming task
that many administrators simply give up
on following patches and minor updates,
relying instead on occasional major updates (and luck).
The sheer scale of Eclectic Systems
is a large part of the problem.
A typical system contains several thousand files;
any number of other "packages"
(drawn from a cast of thousands)
may have been added over time.
Running systems thus tend to be both unique and complex.
System administrators cannot be expected to remember all
of the details involved in a running system.
Good documentation helps,
but no static set of documentation
can ever be complete or current,
given the dynamic nature of production computers
and Open Source packages.
Only an integrated system
for documentation and metadata,
tracking both the local system configuration
and the world of Open Source offerings,
can provide current and complete information.
Fortunately, such a system is quite feasible.
Existing Work
A great deal of documentation and metadata
infrastructure exists for Eclectic Systems,
but it does not form a cohesive "system".
Here are some representative examples;
The Meta Wiki (see below)
contains a much more complete list.
Any significant Open Source package
will have an automated "build" mechanism,
generally supported by some form of make.
The GNU configuration and build suite
(e.g., autoconf, configure)
has been adopted by many developers,
standardizing (and greatly simplifying) the build process
for most packages on most systems.
Debian, FreeBSD, and Red Hat have each developed useful
and popular package management systems;
each of these systems supports a few thousand packages.
The systems differ in assorted respects,
but they all support automated installation of packages,
in both source or binary form.
All OS distributors, of course,
have "build" systems for their base distributions.
None of these systems provides much support
for automated documentation, however.
FileWatcher, Freshmeat, and other indexing systems
track thousands of Open Source packages
(FileWatcher and Freshmeat each track nearly 10K packages).
These systems contain high-level descriptions,
version information, etc.
SourceForge provides a range of support services,
from indexing and CVS access through email and Web presence
for Open Source packages.
Most online indexing systems are strongly optimized for
interactive use.
Provision for automated access is thus quite sketchy.
The CPAN (Comprehensive Perl Archive Network), however,
now provides a way to get XML responses to search queries.
Most popular packages include both user-
and programmer-level documentation.
This may be encoded in any of a number of formats
(e.g., ASCII, HTML, "man" pages, PDF, PostScript, TeX, troff).
Tools are typically available for manipulating
(e.g., browsing, converting, indexing, and/or searching)
all common formats.
Not every facility exists for each format, however.
Worse, no single tool handles all formats
and no universal "exchange" format has been adopted.
As a result, browsing and indexing
are only available for limited, disconnected subsets
of the available packages and documents.
In addition, users must learn (and remember)
how to operate (and perhaps administer) a number of tools.
It can thus involve considerable effort to "browse"
(i.e., search, examine, print)
documents in "unfamiliar" formats.
At the very least, this stifles examination
of new packages.
Finding out the formats (or even purposes)
of the files in a subsystem can be difficult.
Little existing documentation concerns itself
with individual files, let alone with file relationships.
Although the "man" pages of Unixish systems document many files,
most of these are executables.
Some control files are also written up,
but few other files receive explicit attention.
Many magazine articles, papers, and books
cover Open Source offerings.
Even when they are in electronic form, however,
their formats (e.g., HTML, PDF, PostScript)
do not lend themselves to automated indexing.
As a result, they are not well integrated
into a documentation "system".
In summary, a great deal of metadata exists
for Open Source software,
covering acquisition, building, installation,
modification, and use.
Unfortunately, it is not "tied together"
into a convenient, well-integrated whole.
Assorted problems
(e.g., name spaces, file formats, tool differences)
complicate maintenance and use
of both the packages and their metadata.
On a more global scale,
package management and indexing systems
do not share information in any formal way.
Much of the metadata collected for each package
by these systems is common,
but no formal mechanisms exist
for converting or exchanging this information.
For example, each FreeBSD "package maintainer"
must track (and discover!) package changes,
using informal mechanisms
(e.g., release notes and email),
supplemented by code examination and testing.
The resulting knowledge is then buried
in FreeBSD package configuration files.
Intriguing Ideas
Some intriguing ideas have been suggested,
and even tried out, in this general area.
A few are discussed below;
the Meta Wiki has a more complete list.
In "The Case for a New Business Model"
(Communications of the ACM, August 2000),
Philip G. Armour argues that software is
(like DNA, brains, hardware, and books)
a medium for storing knowledge.
Unfortunately, even when the software "works",
the domain-specific knowledge it embodies
tends to be buried in the source code.
I would contend that system metadata
is a perfect example of this phenomenon.
The XML-based Open Software Description Format (OSD)
is "a vocabulary used for describing software packages
and their dependencies for heterogeneous clients".
In addition to handling common packaging issues,
it deals with the problem of employing "push" technology
to update client machines.
Apple's A/UX system contained an annotated list
of every system file an administrator was likely to find
on a "vanilla" system.
In addition, a "flat file data base"
contained the initial characteristics
(e.g., checksum, size, permissions)
and acceptable variations (e.g., "must not get smaller")
for all files.
A system utility was able to use this data base
to perform emergency replacement of damaged system files.
Assorted package management and system administration tools
track the configuration (e.g., file characteristics and/or
installed packages) of running systems.
In general, however, these are not tied
into any overall documentation (let alone metadata) system.
Design Goals
Obviously, none of the efforts described above qualify as
"integrated metadata and documentation systems".
They demonstrate, however,
that significant parts of such a system are feasible.
Let's attempt to establish some design goals for such a system:
- Collaborative
The system should facilitate collaboration,
making it convenient for users, programmers, and projects
to share information.
Thus, collaborative support tools
(e.g., cite, CVS, Faqomatic, Jabber, wiki)
are an important part of the system infrastructure.
- Cooperative
The system must coexist gracefully
with existing systems and procedures.
Developers, for instance,
cannot be expected to restructure their packages
in order to "fit them into the system".
Nor should administrators be required
to log every change they make to their systems
(though some might, if it were easy).
- Evolutionary
The system should evolve out
of existing information systems
(e.g., the "man" pages, the FreeBSD Ports Collection).
Existing information should be "brought into the system"
through file characterization and (where appropriate) conversion
into a common, indexable format.
- Independent
The system should not be tied
to any particular operating system, computer architecture,
programming language, or package format.
Independence from human languages and cultures
is also a worthwhile goal, where practical.
- Modular
Just as applications simply "plug into" the defined interfaces
of the operating system,
package metadata should plug into an overall documentation system.
"man" pages currently do this, by and large,
but they are essentially unique in this attribute.
- Orthogonal
Each part of the system should concern itself only
with matters in its own province.
For example, a package's information
should specify its installation requirements,
not the settings for every possible target environment.
- Scalable
The system should be very scalable,
able to expand gracefully as new packages, OS environments,
information, and needs are added.
- Standardized
System interfaces should be standardized,
taking advantage of existing standards,
where appropriate.
- Voluntary
The system should not require participation,
let alone assistance, from any individual or organization.
The Meta Proposal
"Meta" is a proposed system
which is intended to meet these goals.
Instead of specifying all possible tools,
Meta defines a common interface
(i.e., access mechanism, data structure, database,
document definition, exchange format, language, view, ...)
to support them.
Thus, Meta is simply a unified interface
for system information,
where both "system" and "information"
are interpreted very broadly.
Less formally,
Meta is a convenient and consistent way
for programs (and thereby, humans)
to access a wide range of "system" information.
The interface is defined
as a spanning set of "metadata" files,
rather than any particular programs or libraries.
Just as any program can read /etc/passwd,
any properly-constructed program
will be able to make use of the metadata files.
The metadata files are abstract and declarative,
rather then imperative.
Unlike the package- and OS-specific "make" files
of the Ports Collection,
Meta has description files for packages,
hardware and software environments,
administrative preferences, etc.
Although this places complex burdens
on the supporting software,
the result is extremely worthwhile:
only one description file
needs to be created and maintained
for each package.
Further, by use of multiple inheritance,
many types and levels of "preferences"
can be specified.
Meta's file formats are defined
by XML (Extensible Markup Language) Schemas.
allowing them to be carefully structured and defined,
yet extensible.
XML is an increasingly popular standard,
supported by a large body of documentation,
programming language interfaces,
development tools, and expert practitioners.
Note:
Meta is being defined and prototyped for Eclectic Systems,
whose extremely modular nature
both invites assistance and helps to make the Meta approach feasible.
In addition, the large subset of Open Source technologies in Eclectic Systems
eases acquisition of system metadata.
These points aside,
there is nothing that would keep the Meta approach
from being adaptable to other environments.
Why XML?
The first thing that should be emphasized
about the use of XML
(or any other data representation language)
is that "mortals" (i.e., uninterested parties)
need not be aware of its use.
Thus, XML files need not be seen by ordinary users,
any more than, say,
makefiles are in the Ports Collection.
Nor would developers be required
to interact directly with XML files.
Either a forms-based interface or a "little language"
could be used to capture and/or edit the information.
The useful characteristics of XML
(e.g., strict declarative syntax,
tight yet extensible definitions)
help to enforce a level of discipline
and thereby maintain Meta's portability
and robustness.
In addition, the current work on XSL (Extensible Stylesheet Language)
and XSLT (XSL Transformations)
promises to yield powerful tools for data transformation.
XML is a well-defined
and increasingly popular standard
for publishing computer- and human-readable documents.
Notably, several major Open Source projects have committed
to adopting the DocBook DTD
(an XML-based format which is optimized
for technical documentation).
The definition of the term "publishing", however,
deserves a closer look.
There is little essential difference
between formatting a set of documents
(e.g., using LaTeX, TeX, and PostScript)
and building a binary package or software distribution.
In fact, make(1) is commonly employed in both tasks.
By the same token,
XML "style sheets" can be used to define many processes
(e.g., acquisition, building, installation)
that are needed to maintain operating system software.
By defining data types, both make and XML
enable a variety of data transformations.
Saying that a piece of text is a heading
and saying that a file contains C source code
(or a JPEG image)
are both examples of this.
In make files, however,
many of the transformations are "hard coded".
This "early binding" can detract greatly
from the flexibility and portability of makefiles.
XML documents, in contrast,
do not specify the exact nature of their data types,
let alone the types of transformations to be performed.
Similarly, style sheets are decoupled
from the implementation details of transformations.
Much of the information
which allows a set of files
to be compiled, linked, and packaged
could be used to generate an indexed hypertext version
of the source code.
Fold in the documentation
(using appropriate formatting tools)
and you are well on the way to a Meta implementation.
Possible Data Sets
Here are some informal descriptions
of possible Meta data sets,
hinting at their purposes, content, and format.
More exact definitions (e.g., XML Schemas)
will be developed as system needs and existing resources
become better understood.
The "official" versions of these files
may reside anywhere on the Internet;
copies can then be pulled in (and possibly cached)
as needed.
- build, install
build and installation "preferences"
for specific hardware architectures,
software environments,
organizational levels (e.g., company, network,
node, user) and types (e.g., debug, production).
- configuration
configuration status
(e.g., file modification and package installation dates)
for a given machine, network, etc.
- file, file tree
descriptions of files (e.g., format, purpose, type)
and file trees
- organization, person
contact and descriptive information
for organizations (e.g., companies, projects)
and individuals (e.g., authors, maintainers)
- package
general information about packages
(e.g., history, purpose, limitations)
- program
descriptions of programs (e.g., file access, purpose)
- version
specific information on package versions
(e.g., access information, release date, ID)
Supported Software
With the addition of appropriate software
(big hand wave here :-),
Meta's data sets
could support a variety
of useful functions, including:
- search and examination of packages
A browser could allow examination
of descriptions, documentation, code,
and current configuration (if installed)
of arbitrary packages.
By tracking inquiries and accepting user feedback,
the browser could also serve
as a vehicle for information collection.
- documentation generation for packages, (sub)systems, etc.
File usage and other information
could be analyzed systematically,
yielding subsystem data flow diagrams,
navigable descriptions
("What files are used by this package, how, and why?"), etc.
- acquisition, build, installation, and removal of packages
This is the function currently met by the Ports Collection.
The proposed system would act similarly,
but would not be limited to a single OS variant
or to add-on packages.
- version control and updating of packages
By combining the local configuration status
with package-specific information,
a utility could suggest and/or implement updates,
tracking dependencies, local preferences, etc.
- archiving of all significant package versions
Definitive, mirrored repositories could be established,
containing every significant release
of every interesting package.
Aside from being a convenient resource,
this could serve as a precaution
against loss or damage.
- preformatting and indexing of code and documentation
By preformatting (e.g., into HTML or PDF) and indexing
the archived code and documentation,
a repository could make it conveniently available to all.
This would be particularly valuable to administrators
who wish to "look over" a package, prior to downloading it.
- merging of disparate versions
The archive could also help developers
to examine and merge disparate package versions.
Merging may be motivated by reduction of clutter
(e.g., folding together similar ports),
the need to support new environments
(e.g., combining disparate ports to
create a version with aspects of each), etc.
Project Status
I have been playing with these ideas for some time
and have prototyped parts of the system for my own amusement.
As can be seen from the hand-waving in this paper,
many definitional issues are still up for grabs.
Thus, one of the biggest needs of the moment
is for experts on XML
and/or the characteristics
of specific information resources.
More to the point,
I am not in a position to create Meta on my own;
it is far too big a project for one person,
even assuming infinite time and wizardly attributes.
Having neither to offer, I am forced to ask for help.
Interested parties are therefore encouraged to contact me;
prospective volunteers and sponsors are particularly welcome.
Ignoring questions of Supported Software,
here are some arbitrary "levels" of Meta development:
- Define Meta's basic concepts.
- Define the purpose of each data set.
- Define the content of each data set.
- Define the XML encoding of each data set.
- Populate the data sets, using scripts, etc.
- Annotate the data sets, using Real Work (TM).
We are in pretty good shape (I think :-) on Level 1
and we're just beginning on Levels 2 and 3.
Obviously,
a completed Level 6 system will take many man-years of effort!
On the other hand,
the existing Eclectic Systems and Open Source packages didn't appear overnight
and they aren't going away any time soon.
Thus, we have time...
Here are some specific ways that individuals and/or organizations
can help to make Meta a reality:
-
Give me your comments and suggestions!
For instance, it is very helpful to know
about more Existing Work and/or Intriguing Ideas.
-
Help define the Meta data sets:
what's missing (e.g., data sets, fields),
what's wrong (e.g., poor organization), etc.
-
Open Source development projects
should think about ways
that they could benefit from (and promote) Meta.
As the Meta data sets begin to take shape,
projects should start trying to create data sets.
-
Open Source indexing and packaging projects
should think about ways to provide XML versions
of their public information.
-
Eclectic Systems integrators should think about
the kinds of system metadata they could provide
and how Meta might help them
with their support needs
(e.g., administration, bug tracking, installation,
packaging, programming, security).
Getting Started
Assuming that Meta's goals and design seem reasonable,
the first order of business
is to gather some interested (and knowledgeable) parties
to specify (first cuts at) the data sets.
Some categories can be abstracted
from existing information systems;
others (e.g., relationship information)
will suggest themselves as we proceed.
Once some definitions are in place,
scripts can be written to extract information
from existing collections and indexes.
At the same time, simple tools should be created
to use the information.
Neither efficiency nor elegance are critical
to this effort;
we're just jump starting the system.
The result of these efforts
should be a "proof of concept" set of XML documents
and associated tools.
We can then iterate,
refining the definitions, improving the scripts, etc.
Eventually, if the basic concepts are reasonable,
a useful system should start to emerge.
If the system is sufficiently valuable and convenient,
it will be adopted by the community at large.
Ideally, the entire process
would have the active cooperation
of assorted developers and integrators.
This spreads the workload,
lessens the chance of "losing" needed items,
and may even avoid some politics.
No single organization or individual is required
for the process to succeed, however.
If a package's developer isn't interested in the effort,
the package can simply rely on older technologies
until someone else picks up the task.
File Characterization and Conversion
In a project of this scope,
there are many plausible "starting points".
One of the best involves characterizing files,
converting existing metadata where possible.
The "man" pages are a good example
of both file characterization and data conversion.
They are written in a simple and well-defined format
(i.e., "troff -man"),
have a well-developed formatting chain,
and (generally) follow conventions in file naming and location.
Thus, it is fairly simple
to recognize "man" pages
in any of the common formats
(e.g., ASCII, PostScript, troff).
Each "recognized" file can then be characterized
(e.g., as to format, topic, version)
and the information salted away as XML text.
Meanwhile, the troff source code
(being highly structured data)
can itself be parsed into XML.
This allows the publishing side of XML to be used,
generating HTML, indexed PDF, PostScript,
or whatever other format is desired.
As files are parsed, interesting features can be extracted.
In a "man" page,
these might be keywords, "File" and "See Also" entries, etc.
In C source code, use of system resources
(e.g., data structures, include files, library functions)
could be characterized.
Detailed metadata of this sort has many uses;
automated indexing and hyperlink generation are obvious examples.
A programmer, for example,
might wish to look over examples of source code
that uses a particular combination of system resources.
Any collection of files can be characterized,
but automated recognition and conversion
of arbitrary files
is not a trivial (or even feasible) task.
Consider, for example,
the distribution tree
for an Open Source offering.
Each file and directory in the distribution has a purpose,
but what is it?
A program can make a good guess
as to many files' types and formats
(e.g., employing file naming and "magic" conventions).
Other files, showing up frequently enough
to exhibit a dependable pattern,
can be used to improve the recognition software.
Some files and many directories, however,
will always require human assistance.
In any case, some knowledgeable human
should make a pass over the result of the total process.
There are many things that can go wrong
with mechanical analysis of "raw" data.
Also, humans are very good at seeing patterns;
a human might well discern (or remember) information
(e.g., a file's purpose) that a program would not.
In the process of examining a software distribution,
certain questions are likely to arise:
"What is this file used for, anyway?",
"Why aren't these files in the same directory?"
If no good answers can be found,
the developer may be inspired to change the software
(rather than explain the unexplainable :-).
In addition, global indexing of the software
can make some questions easier to answer.
The scanf(3) family of functions, for example,
is widely regarded as a security sinkhole.
If use of library functions and system calls is indexed,
comprehensive examination becomes much more feasible.
In short, the documentation process
can be a significant aid to quality assurance.
Anecdotal evidence (e.g., from the OpenBSD Project)
indicates that many Eclectic Systems have hidden bugs;
this might be a good way to bring some of them to light.
Other Opportunities
Other development opportunities
include conversion of metadata
from existing indexing and package management systems
into XML.
Basically, any documentation resource
that hasn't been characterized (at least)
and converted (where appropriate)
is a golden opportunity!
Just as package distributions can be characterized,
so can installable operating system distributions.
An OS integrator might start
by characterizing all of the (5K or so) files
that comprise their "vanilla" distribution.
The bad news about such a project
is that it starts as a pretty big job
and never really gets done.
New packages arrive on the scene,
details change, etc.
The good news is that the task can be partitioned
(e.g., characterize the files in /etc)
and that most distributions
contain common files in (relatively) common places.
Also, many things are stable, even in a changing release.
The type, format, and purpose
of a make file, for example,
do not tend to change from release to release.
Finally, there is the possibility that a package
which an integrator is trying to characterize
has already been handled by its developer
or some other helpful party.
In short, we should expect a snowball effect.
The Meta Wiki
The Meta Project's Wiki web
(www.cfcl.com/twiki/bin/view/Meta)
is the center of activity
for the Meta Project.
Go there for current status information,
arguments and discussions,
implementation details,
and pointers to assorted resources.
Wiki webs,
in case you are wondering,
are collaboratively-editable sets of web pages.
They provide a facile and flexible way
to collect ideas into a cohesive document.
In particular,
this paper is simply an edited "snapshot"
of Meta's Wiki web,
as of early September, 2000.
|