Archives

Monday March 6, 2006

Mechanically-augmented wikis

I've been thinking about ways to augment wikis with mechanically-harvested information, navigation aids, etc. The result would have the convenience and flexibility of wikis, but wouldn't depend on humans to provide all of the content, links, etc. As an example, let's consider the problem of generating detailed documentation for large collections of software.

Technorati Tags: , , , , ,

Following the Model-based Documentation approach, a documentation system should reflect (at least, in part) the major entities and relationships of the system being documented. This consistency eases both development and use, because the same "mental model" works for both the system and the documentation.

Entities and Relationships

In the case of a software system, most of the entities will be files (e.g., documents, libraries, programs, source code) or programming constructs (e.g., data structures, methods, modules, objects). Some additional entities (e.g., bug reports, developers, requirements, tests, users) can provide useful context for the system model.

Given this list of entities, it's easy to think of relationships that we might find. Data structures and methods, for example, may be defined within the context of objects. The definitions reside in source code files, which are written and maintained by developers. Similarly, there are dependency relationships between source and object files, functions, etc.

A software system may have millions of entities (and many more relationships), but a workable ontology (i.e., collection of class definitions) can be quite compact. Indeed, if you have to define more than a few dozen classes, you're probably doing something wrong!

Once the classes of entities and relationships are defined, you get to collect the instance data. Any program that deals with entities and relationships will have to store information about them. Harvesting this information may be tedious, but it is seldom challenging. See MBD: The Extraction Phase for an overview of this process.

Presentation

Once the instance data has been collected, it must be presented in a way that can be navigated and absorbed by humans. It surely isn't much use, just sitting in the database! Fortunately, this is reasonably well-trodden territory.

Documentation systems can present relationships as cross-references, use them to generate indexes and context diagrams, etc. They can also use the information to generate other charts, diagrams, and tables. Diagrams of class inheritance, data flows, and module dependencies can be very useful to programmers. Managers can use charts or tables that show how many bugs are getting past the development and testing teams.

With a bit more analysis, a documentation system can detect second-order attributes, such as "hot spots" in the code base. These might be defined by numerical measures, such as the amount of check-in activity or the number of bug reports they appear in. In short, any desired level of analysis can be performed, on a continuous and painstaking basis.

The biggest apparent challenge comes from the fact we may have millions of entities to present, with many more relationships. How can we present all of that, usefully, to the users? Gosh, I thought you'd never ask!

How about a Wiki?

Wikipedia has already demonstrated that MediaWiki can handle a million inter-related pages. In fact, because each Wikipedia page has a shadowing "discussion" page, the number is arguably two million! If humans can navigate Wikipedia, they can certainly navigate a (properly designed :-) documentation suite of the same scale.

As a useful first step, we could populate a MediaWiki database with machine-generated entries. Some simple ground rules and a modicum of care would keep the human- and machine-generated content from interfering with each other. For example, manual notations could be restricted to selected parts of machine-generated pages and/or to the related discussion pages.

It would be better, of course, to allow finer-grained mixing of content. I don't know how this should be done, but I'm quite confident that solutions will emerge if the general approach proves useful...

Adding Reports, etc.

It would also be nice to let users request the inclusion of particular "reports" in selected pages. It's not hard to imagine a declarative or functional notation that would safely let the user ask for a particular table or graph to be displayed. (Implementing it might be a challenge; imagining it is not. :-)

Given that the mechanically-generated wiki pages are organized around classes and instances of entities and relationships, the report definitions could also reside in the wiki. Some sort of object-oriented approach might allow a report to "do the right thing" when requested by a particular wiki page.

Following the same logic, We could allow users to add instances of relationships that they happen to know about. For example, "program foo creates log file /var/log/foo". Ideally, of course, there would be a way to integrate this information into the structure of the wiki.

My essay on Ontiki: an ontology-aware wiki sketches out some ideas about ways that this interaction might be supported. Semantic wikis, which add the strengths of semantically-aware (e.g., ontology-based) systems to wikis, are already a good start in this direction.

There are some interesting coordination problems, to be sure. How should the mechanized documentation system retrieve requests from the wiki, return reports, edit pages, etc? My current thinking is that this should all be done through MediaWiki's underlying database, but that's the subject of another essay.

Technorati Tags: , , , , ,

Mechanically-augmented wikis in Computers , Science , Technology - posted at Mon, 06 Mar, 21:31 Pacific | «e» | TrackBack

Comments

Great thoughts. I'm interested in taking similar path with SemanticMediaWiki.

Specifically, I want to explore using wiki as a "collective wisdom" tool for communities of people where each person might only have a few pieces of the puzzle.

For example, each wiki article would be a description of an element within a larger model (e.g. of a social system, econommic system, software system). The semantic metadata would be used to locate each element within the larger system. Bots would then surf the pages to build views and models based on that metadata (topic maps, or what ever).

The important features of this design are emergent structure, emergent properties, multiple viewpoints, and resiliancy (not vulnerable to failure due to contraction, omission, duplication, circularity, etc.)

I completely agree with the idea of mechanically-augmented wikis. Mechanically-generated data is often not perfect, but putting it in a wiki allows people to correct and even augment it. I just launched WeRelate.org, a system to help people do genealogy with 1.8M mechanically-generated pages that sits on top of MediaWiki. Each page has a meta-data section followed by a wiki text section. Pages represent entities (with namespaces governing the schemas), with links representing relationships between entities. Sources of genealogical information are linked to the Places and Surnames they cover, Places are linked to related Places, Surnames to related Surnames, etc. I'll have to see how things go over time, but so far it's been a great fit.

[Sadly, this comment was automagically filed as spam (in March of 2006). I just noticed it today (in January of 2009). I have updated the links, attached the comment to my weblog entry, and sent apologies to Thomas Gries. -r]

The underlying idea of integrating arbitrary contents from other sources (such as weather data and forecasts to give one example) into wiki pages was subject-matter of my disclosure given in my presentation

"Getlets: extending the interwiki concept beyond any limits"

during the The First International Wikimedia Conference "Wikimania 2005" in Frankfurt/Main, Germany, 04.-08.08.2005 in the workshop TG3 hold on 05.08.2005.

Please respect my rights of first publication of this idea.

References:

[1] http://upload.wikimedia.org/wikipedia/commons/a/a9/Wikimania05_Workshop_TG3.pdf Presentation slides

[2] http://meta.wikimedia.org/wiki/Transwiki:Wikimania05/Workshop-TG3
Abstract of the presentation

[3] http://meta.wikimedia.org/wiki/Transwiki:Wikimania05 Conference Proceedings

[4] Project site on Sourceforge open source server:
http://sourceforge.net/projects/getlets (currently abstract only, no files)

It's great to hear about other folks that have similar ideas. Aside from giving us warm fuzzy feelings, it lets us know about interesting and useful variations and applications of our ideas.

Incidentally, Thomas Gries has an idea that could be very useful for embedding machine-generated content in human-edited pages. His Wikimania 2005 slide set Getlets: extending the interwiki concept beyond any limits describes a way to couple wikis with external sources in a loose and extremely flexible manner.

It turns out that MediaWiki supports transclusions. So, a page named Template:T12345 can be incorporated into another page by means of the syntax {{T12345}}. However, by default this only works for local pages.

In theory, "scary" (external) transclusions can be enabled by


  • creating a row in the local interwiki table
  • setting the iw_trans Boolean for the row
  • setting the $wgEnableScaryTranscluding variable