Archives

Thursday April 13, 2006

Ontiki: first steps

Previous weblog entries (Ontiki: an ontology-aware wiki, Mechanically-augmented wikis) have discussed the possibility of creating structured wikis, using mechanical (i.e., software) augmentation. This entry is a very early status report, discussing my initial experiments and early progress in this effort.

Apologia

I subscribe to a number of mailing lists that discuss knowledge engineering and related topics. I've also skimmed through many related books and papers. The main thing that has become clear to me, from all of this study, is that knowledge engineering is a complex, subtle, and (as yet) ill-defined discipline.

If the experts can't agree on how to approach the creation of ontologies, or even how to define the relevant terminology, can an application programmer such as myself hope to do anything useful? Well, Larry Wall says that hubris, laziness, and impatience are the three great virtues of programmers. If I can harness the latter two properly, I may be able to justify the first...

Getting Started

A full Ontiki system will require the integration of a variety of technologies, but I have to start somewhere. So, following the techie commandment to "eat your own dog food", I decided to use a mechanically-augmented wiki to develop and display a naive ontology for the Unix operating system. See MBD: Case Study (Unix) for background information, etc.

The underlying wiki technology is provided by MediaWiki, a PHP-based wiki that is used by Wikipedia and many other substantial wikis. MediaWiki is convenient, portable, full-featured, and robust; it also has active user and developer communities.

Wiki Access

Although I am intrigued by the notion of using DBMS tables for inter-application communication, adding a page to MediaWiki involves updating about a dozen tables. This requires a much closer relationship with MediaWiki's logic than I have any interest in having.

Fortunately, others have blazed a trail that I can follow. Pywikipedia is a Python-based "bot" (robot) framework for HTTP client scripts. Although it was originally developed for accessing Wikipedia, it can access any MediaWiki-based wiki. Using this framework, I can get and put the text of wiki pages, upload images, etc.

Data Storage

Of course, this begs the question of where to store the information that is used to create the wiki pages. I hope to migrate, in time, to an RDBMS, a knowledge base framework such as Protégé, and/or a knowledge representation and reasoning system such as PowerLoom. However, I'm still not sure which way to jump.

So, for the moment, I'm using an informal approach: a directory tree of several dozen hand-edited YAML files. Each directory corresponds to a class; a YAML file in each directory defines the class, lists its relationships, etc. The YAML files are readable, self-documenting, and extremely flexible in structure.

Editing an extensive hierarchy of files is quite awkward using command-line tools (e.g., vi). Fortunately, my explorations into Ruby on Rails introduced me to the excellent TextMate editor. Using its Project drawer, I can view the directory hierarchy, disclose and hide sub-trees, and jump into any desired file at the click of a mouse.

Every so often, I use a Perl script to concatenate the files, process the definitions, and load the web pages. The script looks for problems (e.g., missing definitions) and generates assorted cross-references, indexes, etc. This run takes about five seconds per page, however, so I don't do it all that often.

The Ontology

As mentioned above, this is a "naive" ontology. I don't expect it to support much in the way of deduction; it simply has to provide a way to organize classes and instances of entities and relationships. At this point, I'm only working with "abstract classes"; once I get these sketched in, I can look at lower levels (e.g., "concrete classes", "instances").

For simplicity and flexibility, I treat everything as entities. This includes not only conventional entities (e.g., file node), but also attributes (e.g., is interpreted) and relationships (e.g., may include). Attributes may have zero or one value; relationships may have two or more roles.

The top level of the ontology (thing) is currently a bit of a muddle. The reason for this is that I'm mostly just entering concepts at this point. I suspect that a bit more order will emerge as I actually try to use these classes for something!

Anyway, feel free to take a look. The Abstract Class (AC) Index is structured (roughly :-) as an "is a" tree (i.e., taxonomy). However, because it allows multiple inheritance, it's really a directed acyclic graph (DAG).

If you are interested in helping me fill in the ontology, and have some part of Unix which you are knowledgeable about, please contact me. There are some substantial parts of the ontology (e.g., networking) that I've simply punted on, for lack of knowledge and/or time.

What's Next

Now that I can generate and upload wiki pages, I get to choose between enhancing their content and improving their appearance and usability. In all likelihood, I'll work on all of these areas. Comments, suggestions, and help are all welcome...

Image-mapped diagrams work well for context and navigation, but they aren't supported by MediaWiki "out of the box". So, I'm looking into ways to provide this capability. For examples of how I've used these diagrams in the past, see MBD: Case Study (FSW).

Getlets extend InterWiki notation, allowing wiki pages to draw upon arbitrary dynamic content. This could allow users to create their own dynamic reports, simply by editing a wiki page and asking for the right content.

PowerLoom is a well-regarded knowledge representation and reasoning system. I have downloaded a copy and am trying to get up to speed on it, as well as evaluate its suitability as a back-end server for Ontiki.

In short, there is no shortage of interesting research directions. Stay tuned; I'll let you know what I find out...

Technorati Tags: , , , , , , , , , ,

Ontiki: first steps in Computers , Technology - posted at Thu, 13 Apr, 21:35 Pacific | «e» | TrackBack