<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Rich Morin :: tchotchkes</title>
    <link rel="alternate" type="text/html" href="http://www.cfcl.com/rdm/weblog/" />
    <link rel="self" type="application/atom+xml" href="http://www.cfcl.com/rdm/weblog/atom.xml" />
   <id>tag:www.cfcl.com,2010:/rdm/weblog//3</id>
    <link rel="service.post" type="application/atom+xml" href="http://www.cfcl.com/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=3" title="Rich Morin :: tchotchkes" />
    <updated>2010-07-03T23:56:18Z</updated>
    
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type 4.1</generator>
 

<entry>
    <title>How not to resolve online ordering problems</title>
    <link rel="alternate" type="text/html" href="http://www.cfcl.com/rdm/weblog/archives/001737.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.cfcl.com/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=3/entry_id=1737" title="How not to resolve online ordering problems" />
    <id>tag:www.cfcl.com,2010:/rdm/weblog//3.1737</id>
    
    <published>2010-07-04T00:01:42Z</published>
    <updated>2010-07-03T23:56:18Z</updated>
    
    <summary>This afternoon, I received a call from an online vendor whose web site we recently used, attempting to make a purchase. He said that our recent order used a flag indicating that a 10% discount code was enabled, but we had no such discount authorized. I expected him to suggest an adjustment and/or ask exactly what we did, so that he could try to reproduce the problem....</summary>
    <author>
        <name>Rich</name>
        <uri>http://www.cfcl.com/~rdm/weblog</uri>
    </author>
    
        <category term="Computers" />
    
        <category term="Technology" />
    
    <content type="html" xml:lang="en" xml:base="http://www.cfcl.com/rdm/weblog/">
        <![CDATA[This afternoon, I received a call from an online vendor whose web site we recently used,
attempting to make a purchase.
He said that our recent order used a flag indicating that a 10% discount code was enabled,
but we had no such discount authorized.
I expected him to suggest an adjustment and/or ask exactly what we did,
so that he could try to reproduce the problem.
<p>]]>
        <![CDATA[Instead, he started off about how the "only way" this could have happened
was that we (clearly computer programmers; he had looked us up on the web!)
had reverse-engineered the JavaScript code and sent a bogus flag to his server.
I told him that we were computer professionals and that we didn't do that sort of thing,
whether for fun or profit.
And that a 10% discount did not seem to be worth much effort, in any case.
<p>
I also said that he probably had a bug and suggested that he send us a note with the particulars,
asking for our help in tracking down the problem.
But no, he was on a roll.
"This is the first time in tens of thousands of sales that this has happened,
so it can't be a bug." or words to that effect,
followed by more accusations and implications of attempted system cracking on our part.
<p>
I told him that he was not taking the right approach to resolving the problem
and that he was also being insulting.
I said that we would be happy to help him reproduce the symptoms,
but that making accusations of perfidy and evil intent, without proof, wasn't the way to go about it.
He responded that he's cancelling the order;
whether he will send us the suggested note and/or try to track down the bug remains to be seen...
<p>
<h2>Ahem</h2>
<p>
As a computer programmer with 40+ years of experience,
I can definitively say that bugs can lurk for months or years
(and millions of user transactions) before emerging.
In terms of web commerce, "tens of thousands of sales" is a very small test sample.
In any case, the question is not how many times the expected logic path was followed,
but how this user got onto an unexpected path.
<p>
As a small-scale entrepreneur (I ran Prime Time Freeware for a decade or so),
I can also say that I would <i>never</i> have accused a customer of trying to scam a discount,
even if I was pretty sure s/he had done so.
Rather, I would have tried to:
<ul>
  <p><li>complete the sale and make the customer happy
  <p><li>make sure that my code rejects invalid discount codes
  <p><li>ascertain the details, in order to reproduce and resolve the problem
</ul>
'nuff said...]]>
    </content>
</entry>

<entry>
    <title>&quot;Automatic SketchUp: Creating 3-D Models in Ruby&quot;</title>
    <link rel="alternate" type="text/html" href="http://www.cfcl.com/rdm/weblog/archives/001735.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.cfcl.com/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=3/entry_id=1735" title="&quot;Automatic SketchUp: Creating 3-D Models in Ruby&quot;" />
    <id>tag:www.cfcl.com,2010:/rdm/weblog//3.1735</id>
    
    <published>2010-04-28T05:01:39Z</published>
    <updated>2010-04-28T04:56:55Z</updated>
    
    <summary>Although I&apos;ve played with SketchUp Ruby from time to time over the past few years, my first real introduction occurred in the Fall of 2009. Igloo Studios, a well-known name in SketchUp modeling and training, brought me in to maintain some of their tools and create some others. Clearly, I had some homework to do....</summary>
    <author>
        <name>Rich</name>
        <uri>http://www.cfcl.com/~rdm/weblog</uri>
    </author>
    
        <category term="Computers" />
    
        <category term="Ruby" />
    
        <category term="SketchUp" />
    
        <category term="Technology" />
    
    <content type="html" xml:lang="en" xml:base="http://www.cfcl.com/rdm/weblog/">
        <![CDATA[Although I've played with <a href="http://sketchup.google.com">SketchUp</a> Ruby
from time to time over the past few years,
my first real introduction occurred in the Fall of 2009.
<a href="http://www.igloostudios.com/igloo">Igloo Studios</a>,
a well-known name in SketchUp modeling and training,
brought me in to maintain some of their tools and create some others.
Clearly, I had some homework to do.]]>
        <![CDATA[<p>
Following my usual practice, I bought and skimmed several books on SketchUp.
I also dug through assorted SketchUp web sites, perused mailing lists, etc.
Unfortunately, none of this gave me the kind of "aha!" I was seeking.
So, although I was able to bumble along, I didn't feel all that confident.
<p>
Recently, however, I was able to purchase the book I needed six months ago:
<p>
<ul>
  <i>Automatic SketchUp: Creating 3-D Models in Ruby</i><br>
  Matthew Scarpino<br>
  Eclipse Engineering, 2010<p>
  <a href="http://www.autosketchup.com">www.autosketchup.com</a><br>
  <a href="http://www.amazon.com/Automatic-SketchUp-Creating-Models-Ruby/dp/0984059202">Amazon</a>
</ul>
<p>
<h2>Introducing Ruby</h2>
<p>
Three of the initial chapters introduce Ruby programming concepts and syntax.
I didn't need this information, but I read them anyway, out of curiosity.
I've thought about teaching Ruby as a first programming language,
so it was interesting to see how this book approached the task.
<p>
Because the book is not primarily about Ruby,
it doesn't begin by introducing a slew of Ruby concepts and syntax.
Instead, it presents a little bit of Ruby, then uses it with SketchUp.
Then, back to the well for a bit more Ruby.
I think this approach should work very well for aspiring Rubyists,
providing them with both motivation and reinforcement.
<p>
To be sure, this book won't turn the reader into a Ruby expert, but that's not the goal.
It certainly teaches enough Ruby to let new programmers attempt small plugins.
There is certainly no shortage of good books on Ruby,
if the reader decides to get serious about the language.
<p>
<h2>Introducing SketchUp</h2>
<p>
There are a number of books that introduce SketchUp concepts and use,
so the book doesn't make much of an effort in this area.
This is reasonable, given that one alternative might be a much larger book.
Still, a few more explanatory paragraphs might not be out of place.
<p>
That said, the book really shines in the intermediate topics:
<p>
<ul>
  <p><li>What are the key classes and methods in the API?
  <p><li>How does the API handle components, groups, layers, materials, pages, etc?
  <p><li>How do Ruby concepts (eg, arrays, iterators, objects) work with the API?
</ul>
<p>
This part of the book is well worth the price of admission.
The API web site lists all of the classes and methods,
but this book actually explains how to <i>use</i> them.
Given the paucity of overview material available elsewhere,
this book is a badly needed resource.
<p>
In summary, if you've been trying to create or even modify SketchUp Ruby plugins,
this book should be close at hand.]]>
    </content>
</entry>

<entry>
    <title>Configuration File Trickery</title>
    <link rel="alternate" type="text/html" href="http://www.cfcl.com/rdm/weblog/archives/001729.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.cfcl.com/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=3/entry_id=1729" title="Configuration File Trickery" />
    <id>tag:www.cfcl.com,2010:/rdm/weblog//3.1729</id>
    
    <published>2010-02-10T22:37:01Z</published>
    <updated>2010-02-10T23:32:53Z</updated>
    
    <summary> I tend to use a fairly minimal subset of YAML for configuration files. YAML supports my favorite data structures (lists and hashes) and is easy to read and edit (particularly if one ignores its syntax for declaring data types and such). However, in a recent project, I found myself using CSV (comma-separated value) files, instead....</summary>
    <author>
        <name>Rich</name>
        <uri>http://www.cfcl.com/~rdm/weblog</uri>
    </author>
    
        <category term="Computers" />
    
        <category term="Ruby" />
    
        <category term="SketchUp" />
    
        <category term="Technology" />
    
    <content type="html" xml:lang="en" xml:base="http://www.cfcl.com/rdm/weblog/">
        <![CDATA[<p>
I tend to use a fairly minimal subset
of <a href="http://en.wikipedia.org/wiki/YAML">YAML</a> for configuration files.
YAML supports my favorite data structures (lists and hashes) and is easy to read and edit
(particularly if one ignores its syntax for declaring data types and such).
However, in a recent project,
I found myself using <a href="http://en.wikipedia.org/wiki/Comma-separated_values">CSV</a>
(comma-separated value) files, instead.]]>
        <![CDATA[<p>
The initial motivation for this choice was my client's preference.
He uses spreadsheets (typically, <a href="http://en.wikipedia.org/wiki/Numbers_%28software%29">Numbers</a>)
on a regular basis and wondered if we could use them to enter configuration information.
I couldn't see any reason to object, so we started down that path.

<h2>Syntax, Structure, and Semantics</h2>
<p>
Any data representation format has to deal with three kinds of problems:

<ul>
<p><li><b>syntax</b> -
    distinguishing and representing data elements</li>

<p><li><b>structure</b> -
    representing relationships among data elements</li>

<p><li><b>semantics</b> -
    determining the "meaning" of data elements</li>
</ul>

<p>
YAML handles syntactic and structural problems very nicely.
I don't have to worry about how to parse the files
and the imported data structures can be arbitrarily complex.
At worst, my code will need to build occasional supplementary indexes.
<p>
In general, I tend to handle semantic problems in my own code.
I'm beginning to realize that
<a href="http://en.wikipedia.org/wiki/Semantic_Web"
  >Semantic Web</a> technology
offers some interesting alternatives
for large-scale data-integration projects,
but that's far beyond the scope of this posting...

<h2>Cooking Up Hashes</h2>
<p>
Although CSV handles syntactic problems, it isn't nearly as flexible as YAML
when it comes to representing data structures.
Ruby's CSV library turns a CSV file into a row-major list of lists.
In contrast, I typically want to use the data as a nested hash.
<p>
For example, I might want to define some of the CSV labels as top-level keys,
letting the others share the lowest level of the hash.
A representative spreadsheet might look like this:

<ul><table>
  <tr><th width=70 align=left>Name</th><th width=70 align=left>Value</th><th align=left>Description</th></tr>
  <tr><th align=left>age</th><td>11</td><td>Age, in years</td></tr>
  <tr><th align=left>height</th><td>22</td><td>Height, in inches</td></tr>
  <tr><th align=left>weight</th><td>3</td><td>Weight, in pounds</td></tr>
</table></ul>

<p>
My <tt>cook_csv</tt> routine creates a multi-level hash,
using the raw data and a list of field names as arguments.
I can then access the hash, much as if I'd loaded it from YAML:
<ul><pre>
  rare = cook_csv(raw, [:Name])
  ...
  age_descr = rare[:age][:Description]
</pre></ul>

In some cases, there may be rows which have identical values in the lowest-level key.
Specifying 0 as the last field name causes the row index to be used as a key;
this keeps the rows from over-writing each other in the hash.

<h2>Generating Variations</h2>
<p>
One of my spreadsheets summarizes information on a set of product models.
Naively, each row would contain the model number and a set of characteristics.
However, this would have made the spreadsheet unwieldy and error-prone.
<p>
Fortunately, the vendor's model names contain values and/or hints to the characteristics.
By borrowing a bit of magic
from <a href="http://en.wikipedia.org/wiki/Regular_expression">regular expressions</a>,
I was able to reduce the number of rows by an order of magnitude.
Let's say that our naive spreadsheet looks as follows:

<ul><table>
  <tr><th align=left width=110>Model</th><th align=left width=50>Type</th>
      <th align=left width=40>H</th><th align=left width=40>W</th><th align=left>D</th></tr>
  <tr><td align=left>ABC83344</td><td align=left>ABC</td>
      <td align=left>8</td><td align=left>33</td><td align=left>44</td></tr>
  <tr><td align=left>ABC113355</td><td align=left>ABC</td>
      <td align=left>11</td><td align=left>33</td><td align=left>55</td></tr>
  <tr><td align=left>ABC143366</td><td align=left>ABC</td>
      <td align=left>14</td><td align=left>33</td><td align=left>66</td></tr>
  <tr><td>...</td></tr>
  <tr><td align=left>ABC203366</td><td align=left>ABC</td>
      <td align=left>20</td><td align=left>33</td><td align=left>66</td></tr>
</table></ul>

<p>
There's a pattern here, if only we could take advantage of it.
It wouldn't be easy to parse, given that the H field varies in size,
but it's quite easy to <i>generate</i>.
<p>
Specifically, we can use regular expressions to create the variations,
then capture the sub-fields for use elsewhere in the row.
My <tt>exp_models</tt> routine does just this,
reducing the snippet above to:

<ul><table>
  <tr><th align=left width=75>Model</th>
      <th align=left width=50>Type</th>
      <th align=left width=30>H</th>
      <th align=left width=30>W</th>
      <th align=left width=30>D</th>
      <th align=left>RegExp</th></tr>
  <tr><td align=left>\1\2\3\4</td>
      <td align=left>\1</td>
      <td align=left>\2</td>
      <td align=left>\3</td>
      <td align=left>\4</td>
      <td align=left>{ABC} {8,11,14,17,20} {33} {44,55,66}</td></tr>

</table></ul>

The data has certainly gotten more complex (some might say, inscrutable),
but the reduction in rows is an enormous win.
In the actual application, I'm averaging about an 8:1 reduction in rows,
which more than compensates (IMHO) for the added complexity.
<p>
Similar pre-processing techniques can be used with other tools.
For example, it's not uncommon to find YAML
and <a href="http://en.wikipedia.org/wiki/eRuby">embedded Ruby</a>
being used to generate test fixtures and other support files.

<h2>DRYing Things Out</h2>
<p>
One of the spreadsheets contains about 100 rows of part specifications,
with each row containing more than a dozen values.
Because there is a lot of repetition
(ie, parts that have the same, or related, values),
it would be nice to refer to the values symbolically.
<p>
So, I store assorted (sub-)expressions in a Macros spreadsheet,
reducing the Parts spreadsheet to symbols and literal values.
Because the results are actually going to be used as "dynamic attributes"
in <a href="http://en.wikipedia.org/wiki/SketchUp">Google SketchUp</a> models,
my code don't actually have to evaluate them, just assemble and output them.
So, a relatively simple bit of macro pre-processing does the trick.
<p>
Each row in the Macros spreadsheet looks like this:

<ul><table>
  <tr><th align=left width=100>Name</th><th align=left width=200>Formula</th><th align=left>Description</th></tr>
  <tr><th align=left>%CA</th><td align=left>(%RAD * %RAD * 3.14)</td><td align=left>area of a circle</td></tr>
  <tr><th align=left>%RAD</th><td align=left>(%DIA * 0.5)</td><td align=left>radius of a circle</td></tr>
  <tr><th>...</th></tr>
</table></ul>

After loading the relevant CSV files, my code iterates over their content,
replacing symbols (eg, <tt>%DIA</tt>)
with corresponding, fully-resolved expressions.
The iteration continues as long as unresolved symbols are present
and progress is being made.
<p>
This approach is working just fine,
with the small problem that the resulting expressions
can be rather repetitive:

<ul>
( (%DIA * 0.5) * (%DIA * 0.5) * 3.14)
</ul>

If this became a real issue (eg, making SketchUp run too slowly),
I could bring in some code to simplify the expressions.
However, at present, the brute force version is working just fine. 

<h3>Naive Inheritance</h3>
<p>
Some of my spreadsheets contain vendor-specific information,
others do not, and some contain a mixture.
I handle this by allowing both common and vendor-specific instances
of each spreadsheet.
If there is only one spreadsheet of a given type, the program uses it.
If two spreadsheets are present, the program concatenates them
(discarding the intervening header row).
<p>
Even though this doesn't provide the kind of flexibility I enjoy in Ruby,
it allows me to segregate the information in a convenient manner.

<h2>Conclusions</h2>
<p>
I used a few other data-munging tricks in this project
(eg, ignoring empty rows so that they can be used for spacing),
but these transformations were certainly the most important ones.
<p>
<a href="http://en.wikipedia.org/wiki/SketchUp">DSLs</a> (domain-specific languages)
are all the rage in the Ruby community.
I hope that some of the notions above have shown you that they can also be used
to good effect in solving data representation problems.
Incidentally, Martin Fowler has an
<a href="http://my.safaribooksonline.com/9780132107549">upcoming book</a> on DSLs;
you might want to keep your eye out for it...]]>
    </content>
</entry>

<entry>
    <title>SketchUp MashUp HeadsUp</title>
    <link rel="alternate" type="text/html" href="http://www.cfcl.com/rdm/weblog/archives/001716.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.cfcl.com/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=3/entry_id=1716" title="SketchUp MashUp HeadsUp" />
    <id>tag:www.cfcl.com,2009:/rdm/weblog//3.1716</id>
    
    <published>2009-11-10T00:40:57Z</published>
    <updated>2009-11-10T00:38:22Z</updated>
    
    <summary>My recent post, Using Cucumber with SketchUp, discussed one possible way to combine Google SketchUp with other packages. However, there are lots of other possible mash-ups. This entry discusses some of these; other suggestions are welcome......</summary>
    <author>
        <name>Rich</name>
        <uri>http://www.cfcl.com/~rdm/weblog</uri>
    </author>
    
        <category term="Computers" />
    
        <category term="Ruby" />
    
        <category term="Semantic Web" />
    
        <category term="Technology" />
    
    <content type="html" xml:lang="en" xml:base="http://www.cfcl.com/rdm/weblog/">
        <![CDATA[My recent post,
<a href="http://www.cfcl.com/rdm/weblog/archives/001714.html"
  >Using Cucumber with SketchUp</a>,
discussed one possible way to combine
<a href="http://www.google.com">Google</a>
<a href="http://en.wikipedia.org/wiki/SketchUp">SketchUp</a>
with other packages.
However, there are lots of other possible mash-ups.
This entry discusses some of these;
other suggestions are welcome...]]>
        <![CDATA[<p>
<b>Note:</b> As in the Cucumber post,
my implementation notions will focus on the Mac OS X environment.
Windows certainly supports some of the needed facilities (eg,
<a href="http://en.wikipedia.org/wiki/Daemon_%28computer_software%29">daemons</a>,
<a href="http://en.wikipedia.org/wiki/Inter-process_communication">IPC</a>),
but I'm not in a position to suggest specific approaches.
I have no idea whether Windows has anything equivalent to
<a href="http://en.wikipedia.org/wiki/AppleScript">AppleScript</a>.
In short, <a href="http://dictionary.reference.com/browse/YMMV"
            >Your Mileage May Vary</a>.

<p>
<h2>Development Tools</h2>
<p>
As most software developers have noticed,
it's really handy to have well-polished development tools.
Here are some possibilities that might match the needs of professional SketchUp users.

<p>
<h3>Revision Control</h3>
<p>
<a href="http://en.wikipedia.org/wiki/Revision_control"
  >Revision control</a> systems can save your efforts
in an organized and annotated way.
Think "Save As" with automated support
for branch handling, check-in messages, and revision history.
<p>
These systems are particularly useful when multiple developers are involved,
because they keep the developers from stepping on each other's changes.
I'm not sure whether or how SketchUp documents could be merged in an automated fashion,
but collections of Ruby scripts and/or text-based data (eg, configuration) files
are obvious candidates for branching and merging.
<p>
My preference would be to use an
<a href="http://en.wikipedia.org/wiki/Open_source">open source</a>,
<a href="http://en.wikipedia.org/wiki/Distributed_revision_control"
  >distributed revision control</a> system
such as <a href="http://en.wikipedia.org/wiki/Git_%28software%29">Git</a>
or <a href="http://en.wikipedia.org/wiki/Mercurial">Mercurial</a>,
integrated into SketchUp by means of an intermediary plugin.
However, any form of revision control could provide significant benefits.

<p>
<h3>Build Automation</h3>
<p>
Once revision control is in place, some
<a href="http://en.wikipedia.org/wiki/Build_automation"
  >build automation</a> can be brought into play.
After performing some regression tests
(eg, via <a href="http://cukes.info/">Cucumber</a>),
a build system might
<a href="http://en.wikipedia.org/wiki/Rendering_%28computer_graphics%29"
  >render</a> some images and/or animation videos,
upload changes to Google's
<a href="http://sketchup.google.com/3dwarehouse">3D Warehouse</a>, etc.

<p>
<h2>External Storage</h2>
<p>
SketchUp makes extensive use of file-based I/O.
Files are used for documents, libraries, and scripts,
as well as for various forms of input and output.
However, SketchUp isn't really network-enabled.
SketchUp plugins should be able to make use of network-based resources,
gaining capabilities and reducing the size of documents.

<p>
<h3>Databases, etc.</h3>
<p>
Databases are good at providing rapid access to large amounts of information.
In the context of SketchUp,
a database could support dynamic menu generation, access to metadata, etc.
There are are various types of databases, optimized for differing needs.
<p>
When most people speak of a database, they are referring to a
<a href="http://en.wikipedia.org/wiki/Relational_database_management_system"
  >relational database management system</a> (RDBMS).
These come in a wide variety of configurations,
but we don't need an enterprise-grade system
to support a typical SketchUp user's needs.
<p> 
Conveniently, Mac OS X ships with a copy of <a href="http://http://www.sqlite.org/">SQLite</a>,
a <a href="http://www.sqlite.org/mostdeployed.html">widely deployed</a>
single-file <a href="http://en.wikipedia.org/wiki/SQL">SQL</a> database engine.
Although it might possible to use SQLite from within a SketchUp plugin
(eg, via <a href="http://sqlite-ruby.rubyforge.org/sqlite3/faq.html"
           >SQLite/Ruby</a>), it may be easier to use a lightweight
<a href="http://www.simplicidade.org/notes/archives/2009/02/sqlite_server.html">SQLite server</a>.
In either case, it doesn't look like rocket science.
<p>
As hinted earlier, however,
there are a variety of databases that don't fit the RDBMS paradigm.
For example, there are
<a href="http://en.wikipedia.org/wiki/Document-oriented_database"
  >document-oriented databases</a> such as Apache CouchDB, MongoDB, and Riak.
The hallmark of these systems is flexibility:
because they don't demand that a database schema be defined in advance,
new documents and indexing schemes can be added at any time.
<p>
None of these databases are installed by default on Mac OS X,
but they aren't all that hard to install.
So, if your application needs rapid access based on arbitrary indexes,
one of these might be just the thing.
<p>
Finally, there are a number of systems that store graph-structured information,
perform user-definable inferencing, and do other cute tricks.
Most of these support <a href="http://en.wikipedia.org/wiki/Semantic_Web"
                        >Semantic Web</a> standards such as
<a href="http://en.wikipedia.org/wiki/Resource_Description_Framework"
  >Resource Description Framework</a> (RDF) and
<a href="http://en.wikipedia.org/wiki/Web_Ontology_Language"
  >Web Ontology Language</a> (OWL),
so it's possible to pick one that is well tuned to a particular set of needs.
</a> 

<p>
<h3>Internet Resources</h3>
<p>
There's no reason a plugin shouldn't be able to access Internet resources.
These could include Google's 3D Warehouse, collections of plugins,
or even Semantic Web-style
<a href="http://en.wikipedia.org/wiki/Linked_Data">Linked Data</a>.
The critical thing to keep in mind is that SketchUp doesn't have to be a silo,
limited to the files on the local desktop.

<p>
<h2>Dynamic Components</h2>
<p>
<b>Note:</b>
The initial version of this blog entry neglected this topic.
<p>
Sketchup 7 introduces a powerful construct called
<a href="http://sketchup.google.com/product/dcs.html">dynamic components</a>.
Among other things, these allow modelers to register named attributes for components.
Changing the value of the attribute then causes changes in the component, and vice versa.
Making matters more interesting, attributes can be modified in several ways.
<p>
Like spreadsheet cells, attributes can be defined as literals or expressions.
Attributes defined as literals can be modified via user actions or Ruby API calls.
Attributes defined as expressions, however,
are modified whenever a referenced attribute changes its value.
<p>
So, it's possible to set up several attributes as functions of a common attribute.
By varying the common attribute, the other values can be varied in parallel, corresponding ways.
This can be used, for example, to produce synchronized animations.
<p>
Expressions have the sorts of convenience functions (eg, SIN, SQRT)
that you might expect to see in a spreadsheet.
By combining these in algebraic ways, all sorts of behavior can be produced.
However, that's just the tip of the iceberg.

<p>
<h2>Virtualization</h2>
<p>
<b>Note:</b>
The initial version of this blog entry neglected this topic.
<p>
Although it's possible to run SketchUp in batch mode,
Mac OS X imposes some awkward constraints:

<ul>
  <p><li>Only one copy of an app can run at a time.
  <p><li>Only one app can run in the foreground.
</ul>

So, even on a multi-core Mac,
we can't simply run several copies of SketchUp in batch mode.
Nor can a modeler start up a background testing or rendering job and keep working.
Fortunately, there appears to be reasonable (and legal!) workaround.
<p>
Apple has modified the Mac OS X Server EULA
to allow one or more licensed copies to be run under a VM.
VMware has taken advantage of this in VMware Fusion 2.0 beta 2,
offering a way to run multiple copies of Mac OS X Server.
I'm sure there are gnarly details to work out,
but this sounds very promising.
Here's a marketing
<a href="http://www.youtube.com/watch?v=kkOoz0-Kb8o&#38;hl=en&#38;fs=1">video</a> and a
a <a href="http://blogs.vmware.com/teamfusion/2008/08/best-practices.html">blog entry</a>.


<p>
<h3>The Ruby Connection</h3>
<p>
If a Ruby plugin adds a method to the <tt>DCFunctionsV1</tt> class,
it gets added to SketchUp's built-in set of functions.
For example, here's how an arctangent function could be added,
as discussed in this
<a href="http://forums.sketchucation.com/viewtopic.php?f=10&#38;t=21474&#38;p=180654&#38;hilit=atan2#p180654"
  >thread</a>:

<ul><pre>
class DCFunctionsV1
&#160; protected
&#160; def atan2(a)
&#160;   return Math::atan2(a[0], a[1]).radians
&#160; end
end
</pre></ul>

<p>
This could be extremely convenient:
parenthesized algebraic expressions are all very well,
but they have some real limitations.
They become awkward when complicated calculations are needed,
can't handle selection (eg, if or case statements), etc.
So, Ruby methods are a much nicer way to express complicated calculations.
<p>
However, Ruby methods can do a <i>lot</i> more than performing calculations.
For example, a method could access a file or even a remote server.
For that matter, it could even make calls into the SketchUp API.
I'm not exactly sure how I'll end up using this connection,
but I'm <i>quite</i> sure I'll find some interesting things to do with it!]]>
    </content>
</entry>

<entry>
    <title>Using Cucumber with SketchUp</title>
    <link rel="alternate" type="text/html" href="http://www.cfcl.com/rdm/weblog/archives/001714.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.cfcl.com/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=3/entry_id=1714" title="Using Cucumber with SketchUp" />
    <id>tag:www.cfcl.com,2009:/rdm/weblog//3.1714</id>
    
    <published>2009-11-02T05:27:58Z</published>
    <updated>2009-11-02T05:24:11Z</updated>
    
    <summary>Cucumber is a Ruby-based tool (technically, a domain-specific language) that helps programmers and their clients define and agree on tests of program behavior. These tests can be used to guide development, enforce acceptance criteria, and detect regressions. Although it is popular in the larger Ruby community, Cucumber has not been used (as far as I can tell) to develop Ruby-based extensions for Google SketchUp. This seems like an unfortunate situation; perhaps it&apos;s time to see what can be done about...</summary>
    <author>
        <name>Rich</name>
        <uri>http://www.cfcl.com/~rdm/weblog</uri>
    </author>
    
        <category term="Computers" />
    
        <category term="Ruby" />
    
        <category term="Technology" />
    
    <content type="html" xml:lang="en" xml:base="http://www.cfcl.com/rdm/weblog/">
        <![CDATA[<a href="http://cukes.info/">Cucumber</a>
is a <a href="http://en.wikipedia.org/wiki/Ruby_%28programming_language%29">Ruby</a>-based tool
(technically, a <a href="http://en.wikipedia.org/wiki/http://en.wikipedia.org/wiki/Domain-specific_language"
                  >domain-specific language</a>)
that helps programmers and their clients define and agree on tests of program behavior.
These tests can be used to guide development, enforce acceptance criteria, and detect regressions.
<p>
Although it is popular in the larger Ruby community,
Cucumber has not been used (as far as I can tell) to develop Ruby-based extensions
for Google <a href="http://en.wikipedia.org/wiki/SketchUp">SketchUp</a>.
This seems like an unfortunate situation;
perhaps it's time to see what can be done about it...]]>
        <![CDATA[<p>
<h2>Background</h2>
<p>
The following information is largely intended
for readers who are unfamiliar with Cucumber and/or SketchUp,
but other readers should probably skim it, just in case...

<p>
<h3>Cucumber</h3>
<p>
<p>
Cucumber was created to support a software development technique called
<a href="http://en.wikipedia.org/wiki/Behavior_driven_development">Behavior Driven Development</a> (BDD).
In this technique, development starts with a description of desired behavior (ie, a feature),
then proceeds "inward" to create tests (ie, steps), make them pass,
and then refactor the code into civility.
<p>
A Cucumber <b>feature</b> describes some desired behavior,
using language that the client should find comfortable.
It begins with some context information,
then specifies one or more <b>scenarios</b> composed of single-line <b>steps</b>.
Here is a simple example:

<ul><pre>
Feature: Demonstration
  In order to demonstrate Cucumber
  As a blogger
  I want to use some values from a Scenario

  Scenario: Capture and use values
    Given this example is about 'adding numbers'
    And I specify values such as 1 and 2
    Then the result should equal 3
    And this step should be undefined
</pre></ul>

The typical client should have no trouble reading this description
and determining whether it expresses his or her understanding of the desired behavior.
Use of familiar concepts and language reduces the chance of ambiguity or confusion.
In short, the client can feel comfortable
about using Cucumber features as acceptance criteria.
<p>
However, the description is not <i>really</i> written in English.
Rather, it uses a (minimalistic, but easily extensible)
<a href="http://en.wikipedia.org/wiki/Controlled_natural_language">controlled natural language</a>.
So, the programmer can create a collection of <b>step definitions</b>
that match specific steps and perform appropriate actions:

<ul><pre>
Given /^this example is about '(.*)'$/ do |topic|
  puts "    >> topic: '#{topic}'"
end

And /^I specify values such as (.*) and (.*)$/ do |v1, v2|
  @v1, @v2 = v1, v2
end

Then /^the result should equal (.*)$/ do |result|
  result == @v1 + @v2
end
</pre></ul>

In general, clients will never need to look at step definitions.
Ruby programmers, however, will recognize them as code blocks,
guarded by <a href="http://en.wikipedia.org/wiki/Regular_expression">regular expressions</a> (REs).
When a line in a scenario matches a step definition, Cucumber runs the corresponding step.
For flexibility, portions of the line (eg, 'adding numbers', 1) can be "captured"
and passed to the code inside the block.
<p>

When Cucumber is run, it generates output which reports on problems
(eg, overlapping or missing steps, syntax errors) and summarizes the results:

<ul><pre>
Feature: Demonstration
  In order to demonstrate Cucumber
  As a blogger
  I want to use some values from a Scenario

  Scenario: Capture and use values               # features/foo.feature:6
    >> topic: 'adding numbers'
    Given this example is about 'adding numbers' # features/step_definitions/foo.rb:1
    And I specify values such as 1 and 2         # features/step_definitions/foo.rb:5
    Then the result should equal 3               # features/step_definitions/foo.rb:9
    And this step should be undefined            # features/foo.feature:10

1 scenario (1 undefined)
4 steps (1 undefined, 3 passed)
0m0.003s

You can implement step definitions for undefined steps with these snippets:

Then /^this step should be undefined$/ do
  pending
end
</pre></ul>

<p>
Note that Cucumber deals gracefully with undefined steps.
It reports that they (and the scenarios that use them) are undefined,
then generates sample code for the needed step definition(s).
Sweet!
<p>
When all of the tests pass and the code is in acceptable condition,
the feature is ready for inspection by the client.
If the program does not act as the client expects and/or desires,
the feature and test descriptions can be examined to locate and resolve the discrepancy.
Passing sets of descriptions are retained as a useful form
of <a href="http://en.wikipedia.org/wiki/Regression_testing">regression testing</a>.

<p>
<h3>SketchUp</h3>
<p>
SketchUp is a powerful and free
(<a href="http://en.wiktionary.org/wiki/free_as_in_beer">as in beer</a>) tool
for creating and rendering 3D models.
These models can be exported
to <a href="http://en.wikipedia.org/wiki/Google_Earth">Google Earth</a>,
turned into animations, etc.
Although SketchUp is most commonly used to model buildings,
it can be used to model a wide variety of objects.
However, it is limited to modeling surfaces, as opposed to 3D volumes.
<p>
<a href="http://en.wikipedia.org/wiki/SketchUp_Ruby">SketchUp Ruby</a>,
an embedded Ruby <a href="http://en.wikipedia.org/wiki/Application_programming_interface">API</a>,
provides a convenient and flexible way to create and add extensions (aka macros, plugins, Rubies).
These extensions can be used to import, examine, and modify data, place "observers" on objects, etc.
So, the API provides a powerful mechanism for testing (eg, controlling, inspecting) extensions.
If we could use this mechanism under Cucumber,
the benefits of BDD would be available in a totally different environment!
<p>
For more information on SketchUp (etc), see:

<ul>
  <li><a href="http://sketchup.google.com/3dwarehouse/">Google 3D warehouse</a>
  <p>
  <li><a href="http://code.google.com/apis/sketchup/">Google SketchUp Ruby API</a>
  <li><a href="http://code.google.com/apis/sketchup/docs/index.html">SketchUp Ruby API Documentation
</a>
  <p>
  <li><a href="http://sketchup.google.com/download/rubyscripts.html">Google SketchUp - Ruby Scripts</a>
  <li><a href="http://www.crai.archi.fr/RubyLibraryDepot/Ruby/RUBY_Library_Depot.htm">Ruby Library Depot</a>
  <li><a href="http://smustard.com">Smustard</a>
</ul>

<p>
<h2>Analysis and Design</h2>
<p>
It might be possible to make Cucumber run as a SketchUp extension,
but this approach doesn't look either easy or robust.
Cucumber and SketchUp are far too likely to get in each other's way.
Also, if either program changed, we might have to rework our modifications.
So, let's run them as separate processes,
letting Cucumber initiate, interact with, and monitor SketchUp plugins.
<p>
Cucumber can be used to test various kinds of programs,
but it is most commonly used to test web servers.
So, it supports the specification of many small steps (eg, HTTP requests),
possibly accompanied by some setup and/or teardown code.
To match this orientation to SketchUp, we need ways for Cucumber to:

<ul>
  <p><li>install plugins, then launch SketchUp

  <p><li>run, interact with, and monitor plugins
</ul>

<p>
<h3>High-level Interaction</h3>
<p>
Our step definitions will need to interact with SketchUp at a high level:
pushing buttons, selecting menu items, etc.
Although SketchUp makes no provision for this, Apple's
<a href="http://en.wikipedia.org/wiki/AppleScript#Open_Scripting_Architecture"
  >Open Scripting Architecture</a> (OSA) and
<a href="http://www.apple.com/accessibility">accessibility</a> support
give us a convenient "back door".
<p>
OSA makes it possible to send arbitrary
<a href="http://en.wikipedia.org/wiki/Apple_events">Apple Events</a> to applications.
Apple's
<a href="http://developer.apple.com/mac/library/releasenotes/UserExperience/RN-AccessibilityInspector/index.html"
  >Accessibility Inspector</a> allows any sufficiently motivated and qualified coder
to examine applications and develop event description code
(eg, using <a href="http://en.wikipedia.org/wiki/AppleScript">AppleScript</a>
or <a href="http://rubyosa.rubyforge.org">RubyOSA</a>).
However, none of this is easy enough for lazy folks like me.
<p>
Fortunately, <a href="http://prefabsoftware.com">PreFab Software</a>'s
<a href="http://prefabsoftware.com/uibrowser/">UI Browser</a>
offers a very convenient way to generate AppleScript code.
Basically, the programmer points the UI Browser at an application,
navigates to the widget of interest, and specifies the desired action.
The resulting code can then be edited (eg, generalized), stored in a file,
and used via the <tt>osascript</tt> command.

<p>
<h3>Low-level Interaction</h3>
<p>
Although it is quite possible for a plugin to operate autonomously,
it would be convenient for Cucumber to be able
to direct (and monitor) the plugin at a lower level.
For example, Cucumber could start up a plugin,
then feed it a series of actions and report on the results.
<p>
<a href="http://en.wikipedia.org/wiki/Distributed_Ruby">Distributed Ruby</a> (DRb)
should provide a convenient and powerful way for Ruby code in Cucumber
to interact with objects in SketchUp (and vice versa).
However, I haven't (yet) been able to <tt>require</tt> DRb within SketchUp.
Anyone who knows how to do this (on Mac OS X) is implored to get in touch!
In the meanwhile, I plan to use named pipes, temporary files, etc.

<p>
<h3>General Approach</h3>
<p>
Bringing these details together, we arrive at the following general approach:

<ul>
  <p><li>Using Cucumber, we create a working directory of configuration data,
         AppleScript (interaction) and Ruby (plugin) code, etc.
         We store the directory's path in an environment variable
         and launch SketchUp, possibly specifying one or more documents.

  <p><li>SketchUp loads assorted plugin files,
         including one for Cucumber initialization.

  <p><li>The Cucumber plugin saves the directory's path,
         then processes each Ruby file in the directory
         (eg, running require, creating an initialization flag file).

  <p><li>The Ruby files perform various setup actions.
         Some provide infrastructure (eg, "helper" methods).
         Some create test methods and register them
         as "Plugin" menu items (eg, Cuke_0001) in SketchUp.

  <p><li>When Cucumber detects the last flag file, it waits a couple of seconds,
         then uses AppleScript to select the menu item for the first test.

  <p><li>Some test plugins may act as "servers",
         allowing Cucumber to specify and monitor fine-grained activities.

  <p><li>As each test plugin finishes,
         it creates a "flag" file for the run.
         Cucumber can use these (and a fallback timer) to pace its activities.
</ul>

At this writing, I am able to use the command line
(supplemented by AppleScript and Ruby code)
to launch SketchUp, run plugins, display and dismiss message boxes, etc.
In short, the high-level plumbing is basically in place.
Hooking it up to Cucumber will take some work,
but I don't expect any show-stoppers to emerge.

<p>
<h2>Thoughts on Testing</h2>
<p>
SketchUp's API provides a wealth of methods
that allow plugins to examine and modify the current model.
Based on these, we should be able to build up some "helper" methods
akin to the ones used to test Ruby on Rails applications.
However, that's mostly a pipe dream at this point.
Meanwhile, we need to develop testing approaches
that work well with both Cucumber and SketchUp.

<p>
<h3>Case Study: Stairs</h3>
<p>
The <tt>stairs.rb</tt> script, described in the
<a href="http://code.google.com/apis/sketchup/docs/gsrubyapi_examples.html"
  >Google SketchUp Ruby API Developer's Guide</a>,
is supposed to generate a simple staircase (ie, 12 treads and 12 risers).
How can we verify that it is behaving correctly?
<p>
For simplicity, we'll start with an empty model, then run the plugin.
We can then examine the model to see what entities the plugin produced.
Here are some assertions to consider testing:

<ul>
  <li>There should be 24 faces and 73 edges (3 edges/face, plus one).
  <li>All faces should be rectangular and have the same width.
  <li>The faces should form a series, joined by "width" edges.
  <li>The series should begin with a riser and end with a tread.
  <li>Risers should be vertical and have the same height.
  <li>Treads should be horizontal and have the same depth.
</ul>

<p>
These assertions are at a level that would work for a client,
but they also look reasonably easy to test (famous last words :-).
Once I have a bit more plumbing in place,
I'll give them a try and find out where the monsters lurk...

<p>
<h3>Reducing Confusion</h3>
<p>
It won't always be possible to start with an empty model.
For example, the plugin being tested may need entities to operate on.
So, there is lots of opportunity for the test code to get confused.
Which entities (eg, edges, faces) "belong together"?
Which entities did the plugin create, modify, etc?
<p>
SketchUp can't answer these sorts of questions,
but it provides a facility which can be used to do so.
Any object can be given any number of attributes.
If the plugin under test leaves a suitable trail of attributes,
the test code should be able to follow it.
Again, this is mostly speculation at this point,
but I expect to use SketchUp attributes pretty heavily
to reduce confusion in testing.

<p>
<h3>Project Wiki</h3>
<p>
For brevity, I have glossed over quite a few implementation details.
These may be found, in their current (evolving) state, in my SketchUp project wiki:

<ul>
  <li><a href="http://cfcl.com/twiki/bin/view/Projects/SketchUp/AppLaunching"
        >App Launching</a>
  <li><a href="http://cfcl.com/twiki/bin/view/Projects/SketchUp/AppleEvents"
        >Apple Events</a>
  <li><a href="http://cfcl.com/twiki/bin/view/Projects/SketchUp/CucumberLinkage"
        >Cucumber Linkage</a>
  <li><a href="http://cfcl.com/twiki/bin/view/Projects/SketchUp/DistributedRuby"
        >Distributed Ruby</a>
</ul>]]>
    </content>
</entry>

<entry>
    <title>Visualizing RDF and OWL data models</title>
    <link rel="alternate" type="text/html" href="http://www.cfcl.com/rdm/weblog/archives/001703.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.cfcl.com/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=3/entry_id=1703" title="Visualizing RDF and OWL data models" />
    <id>tag:www.cfcl.com,2009:/rdm/weblog//3.1703</id>
    
    <published>2009-10-20T20:52:30Z</published>
    <updated>2009-10-20T20:51:28Z</updated>
    
    <summary>The data models for Resource Description Framework (RDF) and Web Ontology Language (OWL) can be a bit difficult to understand, even at the simplest level. Here are some visualizations (and explanations) I&apos;ve found useful: Table Abstract 3D Space Directed Graph Decorated Hierarchy Sets and Mappings Collections of Triples I&apos;d be delighted to hear about other ways of thinking about this. All comments and/or corrections are welcome......</summary>
    <author>
        <name>Rich</name>
        <uri>http://www.cfcl.com/~rdm/weblog</uri>
    </author>
    
        <category term="Computers" />
    
        <category term="Semantic Web" />
    
        <category term="Technology" />
    
    <content type="html" xml:lang="en" xml:base="http://www.cfcl.com/rdm/weblog/">
        <![CDATA[The data models for
<a href="http://en.wikipedia.org/wiki/Resource_Description_Framework">Resource Description Framework</a> (RDF) and
<a href="http://en.wikipedia.org/wiki/Web_Ontology_Language">Web Ontology Language</a> (OWL)
can be a bit difficult to understand, even at the simplest level.
Here are some visualizations (and explanations) I've found useful:

<ul>
  <p><li><a href="#T"   >Table</a>
  <p><li><a href="#A3DS">Abstract 3D Space</a>
  <p><li><a href="#DG"  >Directed Graph</a>
  <p><li><a href="#DH"  >Decorated Hierarchy</a>
  <p><li><a href="#SaM" >Sets and Mappings</a>
  <p><li><a href="#CoT" >Collections of Triples</a>
</ul>

I'd be delighted to hear about other ways of thinking about this.
All comments and/or corrections are welcome...]]>
        <![CDATA[<p>
<a name="T"><h2>Table</h2></a>
<p>
The most obvious representation of RDF (and thereby, OWL)
is a rectangular table where each row encodes a simple "fact".
For example, here is a trivial ontology:

<ul>
<table>
  <tr><th width=100 align=left>Subject</th>
      <th width=100 align=left>Predicate</th>
      <th width=100 align=left>Object</th></tr>
  <tr><td>Dog</td><td>is_a</td><td>Thing</td></tr>
  <tr><td>Cat</td><td>is_a</td><td>Thing</td></tr>
  <tr><td>Cat</td><td>teases</td><td>Dog</td></tr>
  <tr><td>Dog</td><td>chases</td><td>Cat</td></tr>
</table>
<p>
</ul>

This table is quite similar to ones found in a
<a href="http://en.wikipedia.org/wiki/Relational_database_management_system"
>relational database management system</a> (RDBMS),
in that cells contain literal values (eg, integers or strings) or keys
(eg, <a href="http://en.wikipedia.org/wiki/Uniform_Resource_Identifier">URIs</a>).
However, some of the details are quite different.
For example:

<ul>
  <p><li>
    RDF - uses a single table for everything<br>
    RDBMS - uses (many) separate tables
  <p><li>
    RDF - object can be a literal value or a key<br>
    RDBMS - column data types are constrained
  <p><li>
    RDF - queries can follow chains of inferences<br>
    RDBMS - queries examine explicit connections
</ul>

Conveniently, this visualization of RDF is echoed in the simplest form of 
<a href="http://en.wikipedia.org/wiki/Turtle_%28syntax%29">Turtle</a> syntax:

<ul><pre>
Dog   is_a     Thing.
Cat   is_a     Thing.
Cat   teases   Dog.
Dog   chases   Cat.
</pre></ul>

<p>
<a name ="A3DS"><h2>Abstract 3D Space</h2></a>
<p>
As pointed out in Chapter 3 of
<a href="http://www.amazon.com/dp/047041801X">Semantic Web Programming</a>,
the tabular representation of RDF statements can be visualized
as points in an abstract three-dimensional (3D) space:

<ul>
  <p><li>The axes represent the subject, predicate, and object.<br>
  <p><li>Each axis contains every possible URI or literal.
  <p><li>Each point represents a small atom of information.
  <p><li>Each point is independent of all other points.
  <p><li>Definitions (ie, triples) are unordered.
  <p><li>Identical definitions are ignored.
</ul>

<p>
<ul><img src="http://www.cfcl.com/rdm/weblog/images/3dData.png" width=60% /></ul>

<p>
<a name="DG"><h2>Directed Graph</h2></a>
<p>
Ontologies are an application of
<a href="http://en.wikipedia.org/wiki/Graph_%28mathematics%29#Directed_graph">directed graphs</a>:
each RDF triple is an edge, each literal or URI is a node.
Diagramming (a subset of) the graph can allow readers to follow connections, recognize patterns, etc.
Here is our sample ontology, as drawn by (and encoded for) the
<a href="http://en.wikipedia.org/wiki/Graphviz">Graphviz</a> utility.

<table><tr>
<td><img src="http://www.cfcl.com/rdm/weblog/images/DAG.png" /></td>
<td><pre>
digraph DAG { rankdir = "BT";
  Dog -> Thing [label = "  is_a"];
  Cat -> Thing [label = "  is_a"];
  Dog -> Cat   [label = "  chases"];
  Cat -> Dog   [label = "  teases"];
}
</pre></td>
</tr></table>

<p>
<a name="DH"><h2>Decorated Hierarchy</h2></a>
<p>
Although RDF is mostly concerned with entities and relations,
OWL raises the level of discourse to include classes, restrictions, and more.
In particular, OWL structures ontologies as class hierarchies, decorated by other relations.
<p>
This simplifies the notation, because "is_a" relations can be expressed by indentation (etc).
Here is an OWLish (ASCII Art) representation of our ontology:

<ul><pre>
Thing
|
|`--  Cat  (teases Dog)
 `--  Dog  (chases Cat)
</pre></ul>

<p>
<a name="SaM"><h2>Sets and Mappings</h2></a>
<p>
Each RDF relation defines a
<a href="http://en.wikipedia.org/wiki/Map_%28mathematics%29">mapping</a> between two entities.
OWL provides the ability to restrict the domain and range sets of mappings,
providing protections analogous to the data type and foreign key constraints found in an RDBMS.

<p>
<ul><img src="http://www.cfcl.com/rdm/weblog/images/SetMaps.png" width=60% /></ul>

<p>
<a name="CoT"><h2>Collections of Triples</h2></a>
<p>
Most production ontologies make use of pre-defined collections of triples.
Indeed, making use of such collections is widely considered to be a Best Practice,
because it reduces error, increases interoperability, etc.
<p>
Some of these collections (eg, rdf, rdfs, owl) define general, structural relations.
These are used pretty universally, though it's common to pick and choose among owl subsets.
Other pre-defined collections (eg, bibo, core, foaf) define topical relations
of use in particular domains of discourse.
For example, foaf contains relations pertaining to human interactions,
while bibo and core handle bibliographic information.
<p>
If the relevant pre-defined collections of triples don't cover everything
(or cover them in the desired manner),
an ontology may define some local, structural triples.
Finally, after the ontology is loaded into a triplestore,
it will be "populated" by instances (eg, "Berners-Lee knows Hendler"):
<p>
<ul><img src="http://www.cfcl.com/rdm/weblog/images/RDF_NS.png" /></ul>]]>
    </content>
</entry>

<entry>
    <title>Semantic Web Installfest - meeting notes</title>
    <link rel="alternate" type="text/html" href="http://www.cfcl.com/rdm/weblog/archives/001702.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.cfcl.com/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=3/entry_id=1702" title="Semantic Web Installfest - meeting notes" />
    <id>tag:www.cfcl.com,2009:/rdm/weblog//3.1702</id>
    
    <published>2009-10-20T00:01:24Z</published>
    <updated>2009-10-19T23:59:01Z</updated>
    
    <summary>I recently organized an Installfest for the San Francisco Semantic Web Meetup. It was held at PariSoMa, a co-working venue in SoMa (South of Market Street in San Francisco). Given that this was my first attempt at pulling together this sort of &quot;hands on&quot; meeting for the group, I think it went pretty well. I had several reasons for organizing the meeting, but the primary one was to bring together a group of SemWeb enthusiasts who want to learn about...</summary>
    <author>
        <name>Rich</name>
        <uri>http://www.cfcl.com/~rdm/weblog</uri>
    </author>
    
        <category term="Computers" />
    
        <category term="Semantic Web" />
    
        <category term="Technology" />
    
    <content type="html" xml:lang="en" xml:base="http://www.cfcl.com/rdm/weblog/">
        <![CDATA[I recently organized an
<a href="http://www.meetup.com/The-San-Francisco-Semantic-Web-Meetup/calendar/11433952/"
  >Installfest</a> for the
 <a href="http://www.meetup.com/The-San-Francisco-Semantic-Web-Meetup/"
      >San Francisco Semantic Web Meetup</a>.
It was held at <a href=http://parisoma.com>PariSoMa</a>,
a co-working venue in SoMa (South of Market Street in San Francisco).
Given that this was my first attempt at pulling together this sort of "hands on" meeting for the group,
I think it went pretty well.
<p>
I had several reasons for organizing the meeting,
but the primary one was to bring together a group of SemWeb enthusiasts
who want to learn about the technology, try things out, discuss alternatives, etc.
I love hearing about successful Semantic Web projects,
but I also want to "get my hands dirty" using the tools.]]>
        <![CDATA[<p>
<h2>Venue, Timing, Attendance</h2>
<p>
As I had hoped, holding the event on a Sunday afternoon allowed folks
to come in from the South Bay, etc.
The timing also worked well with PariSoMa's informal setting
to produce a relaxed feeling among the attendees.
The resulting group conversations covered a wide range of Semantic Web topics,
contributing greatly to the value of the event.
<p>
Parking in SoMa is wide-open and free on Sundays.
I was able to park right in front of the venue,
greatly easing the task of bringing in equipment and supplies.
This helped to compensate for the fact that PariSoMa's loft
is at the top of a substantial (30+ riser) staircase.
<p>
Although we had 30+ RSVPs (and the Meetup page shows 24 attendees),
my count was more like 15.
This was a fine size for an initial meeting,
but I wish that more of the no-shows had registered as "Maybe".
Fortunately, I was able to donate the leftover food to PariSoMa
for use at their party (scheduled just after the Meetup).
<p>
I'm grateful to PariSoMa for providing the venue and to
<a href="http://franz.com">Franz Inc</a> for subsidizing the refreshments.
Even with careful shopping, food and drinks add up!
I'd like to thank the folks who donated money at the meeting,
bringing the event back into the black...
<p>
<h2>Agenda and Reality</h2>
<p>
Although I had sketched out a loose
<a href="http://cfcl.com/twiki/bin/view/Projects/Semweb/2009_10_18_main">agenda</a>,
I didn't really expect to keep the group on track,
or work very hard to do so.
As <a href="http://en.wikipedia.org/wiki/Helmuth_von_Moltke_the_Elder"
     >Helmuth von Moltke the Elder</a> observed,
"No plan of operations extends with certainty beyond the first encounter
with the enemy's main strength" (ie, no plan survives contact with the enemy).
<p>
So, I gave a short presentation
(<a href="http://cfcl.com/twiki/bin/view/Projects/Semweb/2009_10_18_swpt"
   >slides</a>)
on the tool set we would be installing,
covering its inspiration, interactions and roles, etc.
I also discussed visualization of RDF and OWL
and Prot&#233;g&#233; (eg, history, user interface).
The attendees then spent about an hour quietly installing the tool set,
mostly using archives from the USB thumb drive I passed around.
<p>
Most of the laptops were running Mac OS X,
which my <a href=http://cfcl.com/twiki/bin/view/Projects/Semweb/Setup_SWP_OSX"
           >HowTo</a> seemed to handle pretty well.
There were also a couple of other OSes (eg, Linux, MS Windows) in evidence,
but nobody seemed to have much trouble installing the tools on them.
<p>
Once the installation was out of the way,
I gave a short "show and tell" on Prot&#233;g&#233; 4.
There was a small glitch when I tried to load an example ontology:
Prot&#233;g&#233; 4 was unable to get a list of packages from the
<a href="http://owl.cs.manchester.ac.uk/repository/">TONES Repository</a>.
Fortunately, I was able to use
<a href="http://en.wikipedia.org/wiki/CURL">cURL</a> to grab a copy of the
<a href="http://www.co-ode.org/ontologies/pizza/">Pizza Ontology</a>.
<p>
<h2>Follow-on Plans</h2>
<p>
PariSoMa has scheduled the room for us again in November (11/15)
and may make a short presentation on some of their SemWeb-related ideas.
I'm also hoping for presentations from Franz and Zemanta.
Finally, I'll try to have some exercises, presentations, and discussion topics ready.
If you're in the area and interested in this sort of thing, please come!
<p>
Public planning and follow-up discussions for these meetings will take place on the
<a href="http://groups.google.com/group/swsfba">sw:sfba</a>
(Semantic Web: San Francisco Bay Area) mailing list.
Announcements will be made there and on the
<a href="http://www.meetup.com/The-San-Francisco-Semantic-Web-Meetup/"
  >San Francisco Semantic Web Meetup</a>'s web site and mailing list.


















]]>
    </content>
</entry>

<entry>
    <title>Improving the Conciseness of Turtle and SPARQL</title>
    <link rel="alternate" type="text/html" href="http://www.cfcl.com/rdm/weblog/archives/001701.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.cfcl.com/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=3/entry_id=1701" title="Improving the Conciseness of Turtle and SPARQL" />
    <id>tag:www.cfcl.com,2009:/rdm/weblog//3.1701</id>
    
    <published>2009-10-13T07:54:59Z</published>
    <updated>2009-10-13T06:51:08Z</updated>
    
    <summary>RDF/XML, the &quot;official&quot; serialization format for RDF (Resource Description Framework) was never designed for use by humans. Turtle (Terse RDF Triple Language) is a great improvement, but it&apos;s still a bit verbose for my tastes. SPARQL, being largely modeled after Turtle, shares many of its limitations....</summary>
    <author>
        <name>Rich</name>
        <uri>http://www.cfcl.com/~rdm/weblog</uri>
    </author>
    
        <category term="Computers" />
    
        <category term="Semantic Web" />
    
        <category term="Technology" />
    
    <content type="html" xml:lang="en" xml:base="http://www.cfcl.com/rdm/weblog/">
        <![CDATA[<a href="http://www.w3.org/TR/rdf-syntax-grammar/">RDF/XML</a>,
the "official" serialization format for
<a href="http://en.wikipedia.org/wiki/Resource_Description_Framework">RDF</a>
(Resource Description Framework) was never designed for use by humans.
<a href="http://en.wikipedia.org/wiki/Turtle_(syntax)">Turtle</a>
(Terse RDF Triple Language) is a great improvement,
but it's still a bit verbose for my tastes.
SPARQL, being largely modeled after Turtle, shares many of its limitations.]]>
        <![CDATA[<p>
<h2>A Look at Turtle</h2>
<p>
Turtle is a <a href="http://en.wikipedia.org/wiki/Domain_Specific_Language">DSL</a>
(domain-specific language) for RDF.
Several features help to make it concise:
<ul>
<p><li>RDF-oriented syntax
<p>
Unlike RDF/XML, Turtle is not using a specialization (ie, dialect)
of a general-purpose (aka "Digital Tupperware") format.
So, many syntax elements and structural levels simply disappear.

<p><li><tt>@base</tt> and <tt>@prefix</tt> directives
<p>
These directives provide a convenient, if limited, mechanism for shortening 
<a href="http://en.wikipedia.org/wiki/Uniform_Resource_Identifier">URIs</a>
(Uniform Resource Identifiers).
So, many of the longest tokens are significantly shortened.

<p><li>comma and semi-colon symbols
<p>
The comma and semi-colon symbols allow ways to reduce explicit repetition in triples.
The semi-colon gets rid of repeated subjects;
the comma gets rid of repeated subject/predicate pairs.
</ul>

Unfortunately, the syntax can still be needlessly verbose.
Consider this example code from Chapter 4 of
<a href="http://www.amazon.com/dp/047041801X">Semantic Web Programming</a>:

<pre>
  @prefix    ex:              &lt;http://example.org/>.
  ex:Mammal  rdf:type         owl:Class.
  ex:Canine  rdf:type         owl:Class;
             rdfs:subClassOf  ex:Mammal.
  ex:Human   rdf:type         owl:Class;
             rdfs:subClassOf  ex:Mammal.
</pre>

Clearly, the "<tt>ex:</tt>" prefix and the semi-colon help,
but why are we repeating so much information?
Combining the comma symbol with a bit of OWL magic gives us:

<pre>
  h:irs_sCO  owl:inverseOf    rdfs:subClassOf.
  h:ir_Type  owl:inverseOf    rdf:type.

  @prefix    ex:              &lt;http://example.org/>.
  owl:Class  h:irs_sCO        ex:Canine, ex:Human, ex:Mammal.
  ex:Mammal  h:ir_Type        ex:Canine, ex:Human.
</pre>

Given that the "<tt>h</tt>" (helper) predicates can be defined elsewhere,
this gives quite a reduction in visible code size and apparent complexity.
But even this is a bit redundant.
Why do we need to say that "<tt>ex:Canine</tt>" is a subclass of "<tt>Class</tt>"?
Doesn't the "<tt>rdfs:subClassOf</tt>" predicate imply this?
<p>
Another minor annoyance is the fact that <tt>@prefix</tt> definitions
can't be used in defining other ones.
So, we get in-line repetition of the form:

<pre>
  @prefix    ex_foo:          &lt;http://really.verbose.example.org/foo>.
  @prefix    ex_bar:          &lt;http://really.verbose.example.org/bar>.
  @prefix    ex_baz:          &lt;http://really.verbose.example.org/baz>.
</pre>

The repetition of "<tt>http://really.verbose.example.org/</tt>" is needlessly verbose.
Worse, it violates the
<a href="http://en.wikipedia.org/wiki/Don%27t_repeat_yourself"
  >Don't Repeat Yourself</a> (DRY) principle, formally stated as:
<blockquote>
Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.
</blockquote>

So, for example, if the URL needs to be changed, large numbers of lines may need editing...

<p>
<h2>Idioms, Patterns, etc.</h2>
<p>
Every programming language produces a collection of
<a href="http://en.wikipedia.org/wiki/Programming_idiom">programming idioms</a> and
<a href="http://en.wikipedia.org/wiki/Design_pattern">design patterns</a>.
However, it has been observed that many programming design patterns
are simply workarounds for limitations of specific programming languages.
In RDF, such patterns are seen in expressions of complex relationships (eg, "second cousin")
and multi-way relationships (eg, "John drove his car to Boston on Thursday").
<p>
Imagine a DSL that could express such concepts simply and directly,
with last-minute translation into RDF.
Aside from easing the burden on humans,
this could make the system less brittle,
because the translations could be modified at any time.
This general approach (eg, functions, macros, methods, templates)
has worked well in other areas of computer engineering;
it seems reasonable to look into it for RDF.
<p>
Expressing multi-way relationships (ie, N-ary predicates) is awkward in RDF,
because they have to be mapped into sets of binary relationships.
There are several languages which handle N-ary predicates, including
<a href="http://en.wikipedia.org/wiki/Common_logic">Common logic</a>,
<a href="http://en.wikipedia.org/wiki/Conceptual_graph">Conceptual Graphs</a>, and
<a href="http://en.wikipedia.org/wiki/Object-Role_Modeling">Object-Role Modeling</a>.
Perhaps one of these could be a starting point for a DSL.
<p>
As long as we're asking for a pony,
wouldn't it be nice to use the same DSL syntax in rules, statements, and queries?
<a href="http://en.wikipedia.org/wiki/SPARQL">SPARQL</a> and
<a href="http://spinrdf.org">SPIN</a> have some interesting notions for this sort of thing;
let's see what we can borrow from them.
 
<p>
<h2>Constraints and Possible Solutions</h2>
<p>
The designers of
<a href="http://en.wikipedia.org/wiki/triplestore">RDF triplestores</a>
already have daunting challenges to handle.
So, we need to leave the basic storage model of triplestores alone.
However, we are free to use a DSL for editing,
then process it into triples (eg, RDF/XML, Turtle) for loading, etc.
Following this general approach,
here is a sampling of possible techniques.

<p>
<h3>Macro and/or Template Processors</h3>
<p>
Macro processors (eg,
<a href="http://en.wikipedia.org/wiki/C_preprocessor">cpp</a>,
<a href="http://en.wikipedia.org/wiki/M4_%28computer_language%29">m4</a>)
have been used for decades to solve problems of this sort.
More recently,
<a href="http://en.wikipedia.org/wiki/Template_processor">template processors</a>
(eg, <a href="http://en.wikipedia.org/wiki/ERuby">eRuby</a>)
have found their way into use in code generation.
Unfortunately, neither of these techniques yields flexible, attractive DSLs.
So, current solutions tend to be based on dedicated translators
or embedded (ie, language-based) DSLs.

<p>
<h3>Dedicated Translators</h3>
<p>
A dedicated translator can bring a great deal of processing power to the task.
For example, it can use a specially-crafted parser, code generator, etc.
This is a bit of a heavy-weight solution, however,
so let's leave it as a last resort.

<p>
<h3>Language-based DSLs</h3>
<p>
Concise programming languages such as
<a href="http://en.wikipedia.org/wiki/Ruby_%28programming_language%29">Ruby</a> and
<a href="http://en.wikipedia.org/wiki/Scala_%28programming_language%29">Scala</a>
are commonly extended with language-based DSLs.
Most instances of this generate code in the host language, but some do not.
The <a href="http://erector.rubyforge.org/">Erector</a> project, for example,
generates HTML by means of some carefully-contrived Ruby classes.

<p>
<h2>Resources</h2>
<p>
Here are some resources that may be interesting and/or useful...
<ul>

<p><li><a href="http://OntologyDesignPatterns.org"
         >Ontology Design Patterns</a>

<p><li><a href="http://odps.sourceforge.net"
         >Ontology Design Patterns (ODPs) Public Catalog</a>

<p><li><a href="http://phaneron.rickmurphy.org/?p=35"
         >RDFS Idioms for the Working Semiotician</a>

<p><li><a href="http://lists.w3.org/Archives/Public/semantic-web/2010Jan/0068.html"
         >Requirements for a possible "RDF 2.0"</a>

</a>]]>
    </content>
</entry>

<entry>
    <title>No Time for Docs at RubyConf 2009</title>
    <link rel="alternate" type="text/html" href="http://www.cfcl.com/rdm/weblog/archives/001700.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.cfcl.com/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=3/entry_id=1700" title="No Time for Docs at RubyConf 2009" />
    <id>tag:www.cfcl.com,2009:/rdm/weblog//3.1700</id>
    
    <published>2009-10-05T19:05:48Z</published>
    <updated>2009-10-05T19:02:01Z</updated>
    
    <summary>I don&apos;t envy conference program committee members. They have a difficult and generally thankless job, made worse by occasional rants such as this one. However, I really have to say something about RubyConf 2009....</summary>
    <author>
        <name>Rich</name>
        <uri>http://www.cfcl.com/~rdm/weblog</uri>
    </author>
    
        <category term="Computers" />
    
        <category term="Technology" />
    
    <content type="html" xml:lang="en" xml:base="http://www.cfcl.com/rdm/weblog/">
        <![CDATA[I don't envy conference program committee members.
They have a difficult and generally thankless job,
made worse by occasional rants such as this one.
However, I really have to say something about
<a href="http://www.rubyconf.org">RubyConf 2009</a>.]]>
        <![CDATA[<p>
Don't get me wrong;
I expect to really enjoy this year's conference.
There are some fascinating-looking talks (and some of my favorite speakers) on the schedule.
However, I'm really disappointed that there are no (zero, nada, zip) talks on documentation.
<p>
I won't criticize any of the 40+ talks that got accepted,
though there are a couple that I wouldn't walk across the street to hear.
I will simply ask whether <i>all</i> of these talks
are more important for the Ruby community to hear than <i>any</i>
of the proposed talks on documentation.
<p>
In particular, I would have liked to see Loren Segal
present a talk on <a href="http://yard.soen.ca/">YARD</a>,
a very promising candidate to replace <a href="http://rdoc.sourceforge.com">RDoc</a>.
I'm not the only person who likes YARD.
Yehuda Katz used it in <a href="http://merbivore.com">Merb</a>
and expects to use it in <a href="http://rubyonrails.org/merb">Rails 3</a>.
Dan Kubb uses YARD in <a href="http://datamapper.org">DataMapper</a>
and has also written <a href="http://wiki.github.com/dkubb/yardstick">Yardstick</a>,
a tool for verifying YARD documentation coverage.
<p>
So, YARD is likely to have a substantial impact on Ruby documentation.
Maybe the conference attendees should have a chance to hear about it,
discuss its strengths and weaknesses, etc.
But no, YARD (and documentation in general)
just wasn't important enough to get on the bill.
Maybe next year...
]]>
    </content>
</entry>

<entry>
    <title>Modeling for Network Administration</title>
    <link rel="alternate" type="text/html" href="http://www.cfcl.com/rdm/weblog/archives/001698.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.cfcl.com/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=3/entry_id=1698" title="Modeling for Network Administration" />
    <id>tag:www.cfcl.com,2009:/rdm/weblog//3.1698</id>
    
    <published>2009-09-23T21:59:06Z</published>
    <updated>2009-09-23T21:56:28Z</updated>
    
    <summary>Large-scale computer networks can involve hundreds of thousands of components (eg, computers, programs, routers) and an even larger number of connections. These systems are also highly interdependent: the loss of a single component or connection can have far-reaching effects....</summary>
    <author>
        <name>Rich</name>
        <uri>http://www.cfcl.com/~rdm/weblog</uri>
    </author>
    
        <category term="Computers" />
    
        <category term="Technology" />
    
    <content type="html" xml:lang="en" xml:base="http://www.cfcl.com/rdm/weblog/">
        <![CDATA[Large-scale computer networks can involve hundreds of thousands of components
(eg, computers, programs, routers) and an even larger number of connections.
These systems are also highly interdependent:
the loss of a single component or connection can have far-reaching effects.
<p>]]>
        <![CDATA[To make optimal decisions, administrators need complete, accurate information
about the network's components, connections, and dependencies.
Some of this information is relatively static:
<p>
<ul>
  <p><li>
    Which programs run on which machines at which locations?
  <p><li>
    What are the characteristics (eg, connectivity, location, capacity) of each machine?
</ul>
<p>
Other data is more dynamic:
<p>
<ul>
  <p><li>
    What is the load average of this machine?
  <p><li>
    Are this program's problems recent in nature?
</ul>
<p>
Finally, some questions require a global perspective:
<p>
<ul>
  <p><li>
    What are the most critical parts of the system?
  <p><li>
    Which parts are having the most problems?
</ul>
<p>
It's quite possible to make use of fragmentary information to solve specific problems,
but a comprehensive system model and an integrated data repository
may be better tools for overall planning and analysis.
These tools can support a wide range of needs,
from problem resolution through capacity planning and "what if" analysis.
A system model can also aid in the design and even manage the configuration
of the monitoring infrastructure.
<p>
Because the number of components and relationships is so large,
it might appear that creating and maintaining such a model
would be an immense and even unrealistic task.
However, this need not be the case.
As <a href="http://en.wikipedia.org/wiki/George_E._P._Box">George E. P. Box</a> observed,
"Essentially, all models are wrong, but some are useful".
So, the trick is to create a model which is correct and complete enough to be useful,
while ignoring enough detail to be tractable.
<p>
For example, many functional characteristics and most implementation details
of the software and hardware components can be safely ignored:
they simply aren't needed for overall network administration.
Knowing a computer's exact position in a relay rack might be useful,
but it probably doesn't need to be stored in the model.
<p>
The other mitigating factor is that networks
have relatively few <i>kinds</i> of components and relationships.
So, a reasonably small ontology (ie, formal description of an area of discourse)
can describe everything of interest in the system.
This structure can then be filled in
with mechanically-harvested instance data, annotations and observations, etc.
<p>
In particular, it's quite possible that a human will know of (or discern)
patterns or relationships that the monitoring software cannot.
A well-designed model will have a place for such observations,
along with questions, comments, etc.
<p>
<h3>Getting There</h3>
<p>
<a href="http://en.wikipedia.org/wiki/Semantic_Web"
  >Semantic Web</a> technology (eg,
<a href="http://en.wikipedia.org/wiki/Web_Ontology_Language"
  >OWL</a>,
<a href="http://en.wikipedia.org/wiki/Resource_Description_Framework"
  >RDF</a>
<a href="http://en.wikipedia.org/wiki/Triplestore"
  >triplestores</a>)
is well suited to handling this sort of problem.
There are Open Source tools that can handle billions of RDF triples;
even a large computer network should not strain their capacity.
<p>
A bit of web browsing (see Related Work, below)
confirmed that I'm far from the only person to have these ideas.
In fact, some research ontologies have already been developed.
However, it doesn't appear that they have yet entered the mainstream.
For example, none of the "Monitoring as a Service" pages I found
mention system modeling and/or ontologies as part of their strategy.
<p>
So, there's definitely room for experimentation and collaboration.
Using research ontologies as a starting point,
some large sites could evolve a "first cut" at an industry-wide standard.
This could be tried out with existing Semantic Web and network monitoring tools,
then augmented and bulletproofed to meet the needs of a production tool.
Could be interesting...
<p>
<h3>Related Work</h3>
<p>
The following list was produced by a small amount of Googling.
So, although it is indicative of research in this area,
it is by no means comprehensive.

<ul>
  <p><li>
    <a href="http://nets.ii.uam.es/publications/dsom06.pdf"
      >An Ontology-Based Approach to the Description and Execution<br>
       of Composite Network Management Processes 
       for Network Monitoring</a> (PDF)
  <p><li>
    <a href="http://www.springerlink.com/content/7622234433656114/"
      >An Ontology-Based Host Resources Monitoring Approach in Grid Environment</a>
  <p><li>
    <a href="http://www.fp7-moment.eu/publications/accepted/Monitoring%20ontology_%20J.Lopez.pdf"
      >Application of ontologies for the integration<br>
       of network monitoring platforms</a> (PDF)
  <p><li>
    <a href="http://www.springerlink.com/content/u5m35824048w60l2/"
      >Ontology-Based Network Management:<br>
       Study Cases and Lessons Learned</a>
  <p><li>
    <a href="http://eternity.iu.hio.no/theses/pdf/master2007/karim.pdf"
      >Towards an ontology for System Administration.<br>
       Case Study: Backup Operation</a> (PDF)
</ul>]]>
    </content>
</entry>

<entry>
    <title>Safety Nets for OWLs</title>
    <link rel="alternate" type="text/html" href="http://www.cfcl.com/rdm/weblog/archives/001695.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.cfcl.com/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=3/entry_id=1695" title="Safety Nets for OWLs" />
    <id>tag:www.cfcl.com,2009:/rdm/weblog//3.1695</id>
    
    <published>2009-09-08T23:41:35Z</published>
    <updated>2009-09-08T23:39:25Z</updated>
    
    <summary>I&apos;ve been programming for several decades, but I&apos;m relatively new to ontology development in general and OWL in particular. So, I&apos;m certainly not an expert on the range of work in this area. However, I think I see some areas where programming best practices and tools could provide useful &quot;safety nets&quot; for OWL-based ontology developers....</summary>
    <author>
        <name>Rich</name>
        <uri>http://www.cfcl.com/~rdm/weblog</uri>
    </author>
    
        <category term="Computers" />
    
        <category term="Ruby" />
    
        <category term="Semantic Web" />
    
        <category term="Technology" />
    
    <content type="html" xml:lang="en" xml:base="http://www.cfcl.com/rdm/weblog/">
        <![CDATA[I've been programming for several decades, but I'm relatively new to
<a href="http://en.wikipedia.org/wiki/Ontology_%28computer_science%29"
  >ontology</a> development in general and
<a href="http://en.wikipedia.org/wiki/Web_Ontology_Language">OWL</a> in particular.
So, I'm certainly not an expert on the range of work in this area.
However, I think I see some areas where programming best practices and tools
could provide useful "safety nets" for OWL-based ontology developers.]]>
        <![CDATA[<p>
Substantial development projects (whether in programming or ontology development)
quickly pass beyond the point where a developer can remember every detail,
let alone comprehend the implications of a given change.
Like <a href="http://en.wikipedia.org/wiki/Metaprogramming"
  >metaprogramming</a> in
<a href="http://en.wikipedia.org/wiki/Dynamic_programming_language"
      >dynamic programming languages</a> (eg,
<a href="http://en.wikipedia.org/wiki/Perl">Perl</a>,
<a href="http://en.wikipedia.org/wiki/Python_%28programming_language%29"
      >Python</a>,
<a href="http://en.wikipedia.org/wiki/Ruby_%28programming_language%29"
      >Ruby</a>),
the use of reasoners and inference in OWL provides great expressive power.
However, a small-looking change can have far-reaching results.
<p>
If multiple developers are involved,
things get even more treacherous.
Details may be lost or mis-communicated.
Changes may interact in unforeseen ways.
Finally, if an external client is involved,
another level of coordination is needed.
<p>
<h2>Best Practices</h2>
<p>
So, programmers have developed assorted
<a href="http://en.wikipedia.org/wiki/Best_practice"
  >best practices</a> (and supporting tools)
that provide at least partial solutions for these concerns.
Some of these practices are nearly universal;
others are used mostly in the
<a href="http://en.wikipedia.org/wiki/Agile_software_development"
      >agile software development</a>
and dynamic programming language communities:

<ul>
  <p><li>
    <a href="http://en.wikipedia.org/wiki/Design_pattern"
      >Design patterns</a>
  <p><li>
    <a href="http://en.wikipedia.org/wiki/Revision_control"
      >Revision control</a>
  <p><li>
    <a href="http://en.wikipedia.org/wiki/Software_testing"
      >Testing</a> and
    <a href="http://en.wikipedia.org/wiki/Continuous_integration"
      >Continuous Integration</a>
  <p><li>
    <a href="http://en.wikipedia.org/wiki/Test-driven_development"
      >Test-</a> and
    <a href="http://en.wikipedia.org/wiki/Behavior_driven_development"
      >Behavior-Driven Development</a>
</ul>
<p>
<h3>Design Patterns</h3>
<p>
Most modern programmers and many ontologists are already familiar
with <a href="http://en.wikipedia.org/wiki/Design_pattern"
       >design patterns</a>, but others may not be, so here is a summary:
<blockquote>
Each pattern describes a problem that occurs over and over again in our environment,
and then describes the core of the solution to that problem,
in such a way that you can use this solution a million times over,
without ever doing it the same way twice.
<p>
-- Christopher Alexander, "A Pattern Language"
</blockquote >

If a problem matches a particular design pattern,
using the pattern can reduce both effort and risk.
Design patterns also aid system documentation,
because the developer can simply mention the pattern(s) being used.
Of course, a pattern may be used inappropriately or incorrectly,
but documenting the name of the pattern may even help here,
by allowing readers to check the implementation against the intent. 
<p>

<a href="http://www.amazon.com/dp/0123735564"
  >Semantic Web for the Working Ontologist</a>
(Allemang and Hendler; Morgan Kaufmann)
is the first book I found that met my needs as an aspiring ontologist.
It uses (admittedly simple) design patterns
as a tool for teaching modeling, ontology creation, etc.
<a href="http://www.amazon.com/dp/047041801X"
  >Semantic Web Programming</a>
(Hebeler, et al; Wiley), another fine book,
also has a chapter on "Semantic Web Patterns and Best Practices".
<p>
<a href="http://ontologydesignpatterns.org/wiki/Main_Page"
  >Ontology Design Patterns</a>
is a Semantic Web portal dedicated to ontology design patterns (ODPs).
The portal, which was started under the <a href="http://www.neon-project.org">NeOn project</a>,
collects, evaluates, organizes, and publishes ODPs.

In a related effort, research is being done to evaluate the effectiveness of ODP use.
For example, I recently heard Eva Blomqvist speak at
<a href="http://kcap09.stanford.edu">K-CAP 2009</a> about
<a href="http://portal.acm.org/ft_gateway.cfm?id=1597743&#38;type=pdf&#38;coll=GUIDE&#38;dl=ACM"
  >Experiments on pattern-based ontology design</a>.
As one might suspect, the results are encouraging:
"... ontology quality is improved, coverage of the task increases, usability is improved,
and common modeling mistakes can be avoided".

<p>
<h3>Revision Control</h3>
<p>
There are
<a href="http://en.wikipedia.org/wiki/Comparison_of_revision_control_software">dozens</a>
of <a href="http://en.wikipedia.org/wiki/Revision_control"
     >revision control</a> systems, with varying architectures and feature sets.
What they have in common is the ability to "snapshot" the state of a project,
so that it can be inspected and/or recovered at some future time.
Typically, each snapshot includes a "commit" message,
giving the developer's thoughts on significant changes that the snapshot contains.
<p>
<a href="http://en.wikipedia.org/wiki/Git_%28software%29">Git</a>
is a representative example of the state of the art in this area.
As a <a href="http://en.wikipedia.org/wiki/Distributed_revision_control"
       >distributed revision control</a> system,
it supports the use of multiple "repositories".
Developers can create new repositories at will,
modify them, then (as appropriate) merge them back together.
<a href="http://en.wikipedia.org/wiki/GitHub">GitHub</a>, a popular support site,
currently supports more than 100,000 repositories.
<p>
Clearly, some of these capabilities could benefit ontologists,
if the implementation details could be worked out.
My own partly-baked idea (PBI) would be to give each OWL-based ontology editor
(eg, <a href="http://en.wikipedia.org/wiki/Prot&#233;g&#233;_%28software%29">Prot&#233;g&#233;</a>) the capability
to save (and import) text-based snapshots of the asserted ontology.
Ideally, these should be:

<ul>
  <p><li>
    <b>canonical</b> -
    serialized in a consistent manner, to allow comparisons
  <p><li>
    <b>high-level</b> -
    ignoring implementation details (eg, RDF) where possible
  <p><li>
    <b>human-readable</b> -
    commented and pretty-printed for ease of comprehension
  <p><li>
    <b>standardized</b> -
    to allow exchange between different ontology editors
</ul>

Some of these criteria (eg, standardization) can be deferred for the moment,
in the interest of getting something in place for experimentation, etc.
In the longer-term, however, all of these attributes (and probably more)
would be nice to have.
<p>
<h3>Testing and Continuous Integration</h3>
<p>
There are various, overlapping types
of <a href="http://en.wikipedia.org/wiki/Software_testing"
     >software testing</a>.
However, all of them use the computer
to check the software's behavior and report unexpected results.
Here is a partial catalog:
<ul>
  <p><li>
    <a href="http://en.wikipedia.org/wiki/Unit_testing">Unit Testing</a>
    tests small portions (ie, units) of the source code in isolation.
  <p><li>
    <a href="http://en.wikipedia.org/wiki/Integration_testing">Integration Testing</a>
    tests whether the units work together properly.
  <p><li>
    <a href="http://en.wikipedia.org/wiki/Acceptance_testing">Acceptance Testing</a>
    tests whether the source code, as a whole, performs adequately.
  <p><li>
    <a href="http://en.wikipedia.org/wiki/Regression_testing">Regression Testing</a>
    tests whether the system has regressed
    (ie, changed) to previous, undesirable behavior.
</ul>
By creating and using sets of tests,
developers can ensure that no tested behavior will fail without notice.
This is far from a complete guarantee of perfection,
but it is also far from useless.
Suites of regression tests provide a valuable "safety net"
for development (eg, debugging, feature addition, refactoring).
Unit tests, as discussed below,
can be usefully integrated into the design and implementation process.
<p>
Many development efforts have found testing so useful
that they choose to have it performed on a continuous basis.
This practice (and more) is embodied in
<a href="http://en.wikipedia.org/wiki/Continuous_integration"
  >continuous integration</a> (CI).
Each time a developer commits a change,
the CI suite runs a battery of tests.
If an error is detected, developers are notified immediately.
<p>
In an ontology-development environment,
a CI system might run a series of queries against the ontology
whenever a reasoner reported a clean result.
This would alert the ontologist to new or changed axioms
which modify the expected behavior of the ontology.
<p>
<h3>Test- and Behavior-Driven Development</h3>
<p>
In <a href="http://en.wikipedia.org/wiki/Test-driven_development"
     >Test-Driven Development</a> (TDD), tests are always written first.
The newly-added tests, which check for <i>desired</i> behavior, are then run.
These tests should fail, confirming that they check for an unimplemented behavior.
The developer then modifies the software until all the tests pass.
Finally, the developer checks for "code smells"
(things that work, but are not cleanly implemented).
<p>
<a href="http://en.wikipedia.org/wiki/Behavior_driven_development"
  >Behavior-Driven Development</a> (BDD) extends upon TDD,
bringing the "client" into the process.
In BDD, the developer and client develop and agree on a desired set of behaviors,
encoded in a constrained natural language.
The developer then creates the needed code,
testing it against the agreed-upon specification.
<p>
When the tests pass, the developer shows the results to the client.
This may result in acceptance or a further refinement of the specification.
In any case, the client is part of the process
and the agreed-upon criteria are mechanically guaranteed to be met.
<p>
It strikes me that the relationship between a client and a programmer
is quite analogous to that between a domain expert and an ontologist.
If something like BDD could be used in ontology development,
any number of mis-communications might be detected and eliminated.
<p>
<h2>Getting There...</h2>
<p>
The programming community has already pioneered the practices described above.
In fact, much of the infrastructure they have developed might be applicable
to OWL-based ontology development.
So, I don't believe that there are any particularly difficult technical issues.
<p>
The real difficulties will lie in changing people's attitudes and behavior.
Getting ontologists (let alone domain experts) to accept these practices
may be an uphill battle.
However, the evidence from the software development arena is pretty compelling,
so I think the attempt is worthy of consideration.]]>
    </content>
</entry>

<entry>
    <title>Apple Bug Report 7109559: UI problems in iTunes</title>
    <link rel="alternate" type="text/html" href="http://www.cfcl.com/rdm/weblog/archives/001689.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.cfcl.com/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=3/entry_id=1689" title="Apple Bug Report 7109559: UI problems in iTunes" />
    <id>tag:www.cfcl.com,2009:/rdm/weblog//3.1689</id>
    
    <published>2009-07-31T20:54:48Z</published>
    <updated>2009-07-31T20:53:00Z</updated>
    
    <summary>Not everyone knows that Apple has a way to submit bug reports. Or, more to the point, that they actually read them. However, it turns out that Apple engineers interact continuously with a bug tracking system called RADAR and that there is a reliable way for &quot;civilians&quot; to submit reports to it. I&apos;ve submitted quite a few reports (about one report a month, for several years) and the responses were always polite and generally clueful....</summary>
    <author>
        <name>Rich</name>
        <uri>http://www.cfcl.com/~rdm/weblog</uri>
    </author>
    
        <category term="Books, Movies, Music" />
    
        <category term="Computers" />
    
        <category term="Technology" />
    
    <content type="html" xml:lang="en" xml:base="http://www.cfcl.com/rdm/weblog/">
        <![CDATA[Not everyone knows that <a href="http://www.apple.com">Apple</a>
has a way to submit bug reports.
Or, more to the point, that they actually <i>read</i> them.
However, it turns out that Apple engineers interact continuously
with a bug tracking system called RADAR
and that there is a reliable way for "civilians" to submit reports to it.
I've submitted quite a few reports
(about one report a month, for several years)
and the responses were always polite and generally clueful.]]>
        <![CDATA[<p>
If you'd like to give Apple your thoughts on occasion,
just sign up <a href="http://developer.apple.com/products/membership.html">here</a>
for a (free) "ADC Online Membership" with the
<a href="http://developer.apple.com">Apple Developer Connection</a> (ADC).
Then, go to the
<a href="https://bugreport.apple.com/cgi-bin/WebObjects/RadarWeb.woa">Bug Reporter</a>
login page and dive in!
<p>
Reporting bugs is occasionally effective in promoting changes,
though there's no guarantee of that.
It's also a back-door way to find out that you've simply missed a feature.
In any case, I'd recommend it to any confirmed Apple user.
<p>
I recently submitted a bug report about "UI problems in iTunes".
I've submitted similar complaints before, without results,
so I'm reproducing a formatted version of my report below.
If you agree (or disagree!) with my suggestions,
send Apple your own report...
<p>
<h2>The Report</h2>
<p>
I use iTunes quite regularly, both as a way to load my iPod
and as a way to play music at my desk.
By and large, it's a nice program, but it has some annoying UI problems.
<p>
<h3>Background</h3>
<p>
I have a Mac Pro, an iPod HiFi (model A1121),
and a 60 GB iPod (model A1099).
I have ~11K songs (~50 GB) stored in iTunes.
Almost all of the songs were ripped from CDs,
as I have no tolerance for
<a href="http://en.wikipedia.org/wiki/Digital_rights_management"
  >digital rights management</a> (DRM).
<p>
<h3>Hierarchical Views</h3>
<p>
In general, I listen to music by a particular artist.
So, I click on "LIBRARY > Music", select the "Artist" column,
then scroll down to the desired artist's area.
I can then double-click on a starting point and minimize the window.
<p>
Unfortunately, scrolling down 11K entries is inconvenient.
I realize that I can click on an entry and type in a couple of characters,
but this isn't very efficient, either.
<p>
What I'd LIKE is the kind of flexibility offered by the Finder.
For example, the list view could provide disclosure triangles,
letting me see as much detail into a given artist's works as
desired (eg, show all albums or even all songs in an album).
Column view would also be nice.
<p>
I realize that iTune's ability to select columns
makes this a bit more complicated than the equivalent situation
in the Finder, but I don't think it's all THAT hard.
For example, if I arrange the columns in a particular order,
that order could be used for the List and Column View hierarchies
(eg, Artist, Album by Artist, Name, etc.)
<p>
<h3>Column Selection Behavior</h3>
<p>
If I select a song, then click on a different column heading,
the song should stay in view.
This is the behavior in the Finder and many other apps;
iTunes should follow it, as well.
<p>
This would let me find out, without a lot of hassle,
about any other versions I might have of a given song.
Assuming that a song is already selected (eg, under Artist),
all I would have to do is click on the Name heading
to find similarly-named songs.]]>
    </content>
</entry>

<entry>
    <title>How to Chase Away Help</title>
    <link rel="alternate" type="text/html" href="http://www.cfcl.com/rdm/weblog/archives/001685.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.cfcl.com/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=3/entry_id=1685" title="How to Chase Away Help" />
    <id>tag:www.cfcl.com,2009:/rdm/weblog//3.1685</id>
    
    <published>2009-07-28T20:59:39Z</published>
    <updated>2009-07-28T20:58:30Z</updated>
    
    <summary>I&apos;m currently looking at a wiki page for an (unnamed) Open Source project. The page has a number of minor errors; I&apos;d be happy to help the developer(s) clean it up. Except......</summary>
    <author>
        <name>Rich</name>
        <uri>http://www.cfcl.com/~rdm/weblog</uri>
    </author>
    
        <category term="Computers" />
    
        <category term="Technology" />
    
    <content type="html" xml:lang="en" xml:base="http://www.cfcl.com/rdm/weblog/">
        I&apos;m currently looking at a wiki page for an (unnamed) Open Source project.
The page has a number of minor errors;
I&apos;d be happy to help the developer(s) clean it up.
Except...
        <![CDATA[<p>
<h2>Today's Saga</h2>
<p>
This isn't an open wiki; I need to create an account and then log in.
There isn't any "create account" link, so I click the "Log in" link.
I'm taken to a page entitled "Log in / create account".  Cool.
Except that there doesn't appear to be any way to create an account.
<p>
I try giving the page a new Username and Password combination,
in case that is its way of creating accounts.
This results in an unhelpful nastygram:
"Login error: There is no user by the name "Rich_Morin". Check your spelling."
<p>
I happen to click a link entitled "talk for this ip".
It takes me to a page which had a "create an account" link.
However, clicking this link leads me to another nastygram:
"You do not have permission to create new user accounts, for the following reason:
The action you have requested is limited to users in the group Sysops."
<p>
Riiiight...
Going back to the wiki page, I find this instruction:
"Please, send any comment to the project's mailing list".
Clicking the link, I get a Firefox nastygram: "This Connection is Untrusted".
Apparently, the  mailing list sign-up page is set up to use HTTPS,
but isn't providing trusted identification.
After battling past some other dialogs,
I confirm the "Security Exception", got to the subscription page,
and sign up for the mailing list.
<p>
As promised, I receive a confirmation email, which I return.
After receiving the list's welcome email,
I'm finally in a position to ask about getting write access to the wiki.
Gosh; that was easy! (NOT).
<p>
<h2>And Worse...</h2>
<p>
Amazingly, this isn't the worst experience I've had in trying to offer help.
On a couple of occasions, I've emailed errata
and received vituperative responses rejecting my help:
"Maybe you have time to worry about this sort of thing, but I don't...".
<p>
More commonly, I find myself staring at the man page (or equivalent)
for an Open Source package.
The page has an error which I'd like to report,
but I know that it will take me at least a half-hour to do so.
<p>
If there is an online bug-tracking system,
I get to register,
make a desultory search for the issue,
pick the nearest category,
and post my "bug report".
Otherwise, I may find myself groveling through the package's source code to find the right file,
editing the file, and creating a diff (in whatever format the project requires).
<p>
Depending on my level of motivation,
I may also download and install the documentation tool chain used by the project,
so that I can ensure that my fix actually formats properly.
Finally, I submit my patch and (possibly) defend it in the ensuing discussion.
<p>
<h2>Do I really care?</h2>
<p>
When I visit a project's web site for the first time,
I'm not really very committed.
Frankly, I'm not even sure that this package will do what I need.
If I have to jump through a bunch of hoops to simply ask a question,
I may well go elsewhere.
<p>
Nor are most readers as compulsive as I am about reporting errors.
If it's <i>really easy</i> to report an error, they may take the time to do so.
Otherwise, they are more likely to shrug their shoulders and get on with their lives.
So, <b>make it easy for them</b>.
<p>
<a href="http://en.wikipedia.org/wiki/Alan_Cooper">Alan Cooper</a> says:
"Where there is output, let there be input."
He's mostly referring to interactivity in user interfaces,
but the principle applies here, as well.
Set up your documentation, web pages, and other content
so that the user can easily make comments, ask questions, etc.
By treating users as if you want their help,
you may actually get some...]]>
    </content>
</entry>

<entry>
    <title><![CDATA[A Triple Sm&ouml;rg&aring;sbord]]></title>
    <link rel="alternate" type="text/html" href="http://www.cfcl.com/rdm/weblog/archives/001684.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.cfcl.com/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=3/entry_id=1684" title="A Triple Sm&amp;ouml;rg&amp;aring;sbord" />
    <id>tag:www.cfcl.com,2009:/rdm/weblog//3.1684</id>
    
    <published>2009-07-27T05:56:24Z</published>
    <updated>2009-07-27T05:53:11Z</updated>
    
    <summary><![CDATA[I recently had the pleasure of attending OSCON 2009, O'Reilly's broad-spectrum Open Source conference. It was quite an event: really, a triple sm&ouml;rg&aring;sbord....]]></summary>
    <author>
        <name>Rich</name>
        <uri>http://www.cfcl.com/~rdm/weblog</uri>
    </author>
    
        <category term="Computers" />
    
        <category term="Politics" />
    
        <category term="Technology" />
    
    <content type="html" xml:lang="en" xml:base="http://www.cfcl.com/rdm/weblog/">
        <![CDATA[I recently had the pleasure of attending
<a href="http://en.oreilly.com/oscon2009">OSCON 2009</a>,
<a href="http://oreilly.com">O'Reilly's</a> broad-spectrum
<a href="http://en.wikipedia.org/wiki/Open_source">Open Source</a> conference.
It was quite an event: really,
a triple <a href="http://en.wikipedia.org/wiki/Smorgasbord">sm&ouml;rg&aring;sbord</a>.]]>
        <![CDATA[<p>
I haven't attended any OSCONs in the last several years,
partly because they were being held about twelve hours away, by car.
Fortunately for me, this one was being held in San Jose, CA.
This called for about a third as much driving,
and it was also divided into ten bite-size (45 minute) commutes.
So, a no-brainer...
<p>
<h2>The Food</h2>
<p>
Most of us like to complain about the food at conferences
(often with good reason),
but the conference organizers are as unhappy as anyone.
Not only do they have to eat the same food as the attendees,
but they also have to figure out how to fit it into the budget.
<p>
Well over a decade ago, a conference organizer told me
that our "sandwich boxes" were costing $15 apiece.
I shudder to think what large conference venues charge today.
In short, if you want fancy lunches at conferences,
you had better be prepared to pay handsomely for them...
<p>
That said, OSCON 2009's lunches were at least as palatable
as any at the conferences I've attended in recent years.
Even WWDC, which used to have <i>very nice lunches</i>,
was totally mediocre the last time I attended (in 2008).
So, I was amazed and delighted by the lunch offerings
that <a href="http://www.google.com">Google</a> sponsored
on the first day of the conference:
<ul>
<p><li>The Network Buffet Menu
<p>
<ul>
<li>Hearts of Lettuce
<li>Nugget Potato Salad with Feta &amp; Kalamata Olives
<li>Seafood Salad with Melon and a Lemony Dressing
<li>West Coast Smoked Fish and Seafood Platter
<li>Baked Salmon Fillets with a Roma Tomato Fondue
<li>Mexican Pepper Rice Roasted Autumn Vegetables
<li>Potato Lasagna w/ Fennel &amp; Emmenthal
</ul>
<p><li>Carving Station
<p>
<ul>
<p><li>Roasted Angus Top Sirloin with Peppercorn Trio
</ul>
<p>
<p><li>Dessert
<p>
<ul>
<li>New York Style Cheesecake
<li>Sliced Fresh Fruit and Berries
<li>Dark Chocolate Mouse Torte
</ul>
</ul>
<p>
No, this wasn't the the best feed I've ever had at a conference.
One year, <a href="http://www.tek.com">Tektronix</a>
sponsored a "planked salmon" bake for the
<a href="http://www.usenix.org">USENIX</a> annual conference.
It was held at a private park outside Portland (OR)
and had a fireworks display as a closer.
I also had a fine feed at Microsoft's First International Conference on CD ROM;
I still remember the dessert tables fondly (:-).
<p>
Those two would be very hard to top,
but this was certainly in the top five over a span of 30+ years.
So, kudos to Platinum Sponsor Google
for sponsoring the meal and O'Reilly for making it happen.
Maybe we can get a competition going for next year's best meal!
<p>
<h2>BOFs, Sessions, and Tutorials</h2>
<p>
The BOFs, sessions, and tutorials provided a different,
but equally amazing buffet.
Looking at the
<a href="http://en.oreilly.com/oscon2009/public/schedule/full">schedule</a>,
I often had to make difficult decisions.
Would I rather hear about Perl 6 or JRuby?
I'd like to keep up on Rubinius,
but I'd also like to know more about Thunderbird.
<p>
I have no idea whether I got to all of the best sessions (for me),
but I <i>can</i> say that I enjoyed and learned from all of them
(even when, in some cases, they weren't quite what I had expected).
The only problem, really, was mental overload.
<a href="http://en.wikipedia.org/wiki/Damian_Conway">Damian Conway</a>,
for instance,
is always an energetic and engaging speaker;
he is also quite technical at times, so be ready to listen <i>hard</i>.
<p>
In any event, I attended presentations and discussions
on a variety of Open Source tools, including
<a href="http://en.wikipedia.org/wiki/CouchDB"                         >CouchDB</a>,
<a href="http://en.wikipedia.org/wiki/Gearman"                         >Gearman</a>,
<a href="http://en.wikipedia.org/wiki/Git_%28software%29"              >Git</a>,
<a href="http://neo4j.org"                                             >Neo4j</a>,
<a href="http://en.wikipedia.org/wiki/Perl"                            >Perl 6</a>,
<a href="http://en.wikipedia.org/wiki/Prot&#233;g&#233;_%28software%29"          >Prot&#233;g&#233;</a>,
<a href="http://en.wikipedia.org/wiki/Ruby_on_Rails"                   >Rails 3</a>,
<a href="http://en.wikipedia.org/wiki/Rubinius"                        >Rubinius 1.0</a>, and
<a href="http://en.wikipedia.org/wiki/Sesame_%28framework%29"          >Sesame</a>.
I also talked to assorted folks in the exhibit area
about packages as disparate as
<a href="http://en.wikipedia.org/wiki/GRASS_GIS"                       >GRASS GIS</a>,
<a href="http://en.wikipedia.org/wiki/PostgreSQL"                      >PostgreSQL</a>,
<a href="http://en.wikipedia.org/wiki/R_%28programming_language%29"    >R</a>, and
<a href="http://en.wikipedia.org/wiki/Sage_%28mathematics_software%29" >Sage</a>.
<p>
I also went to some talks that weren't particularly technical.
Douglas Crockford's talk on the history of
<a href="http://en.wikipedia.org/wiki/JSON"                            >JSON</a>,
for example, was a fascinating "behind the scenes" look
at how an informal standard battled its way into prominence,
despite the presence of a strongly-hyped competitor
(<a href="http://en.wikipedia.org/wiki/XML"                            >XML</a>).
<p>
At Addison Berry's session, I learned about the
<a href="http://writingopensource.com/"                                >Writing Open Source</a> group,
which sponsors a conference and forums aimed at improving the state
of Open Source documentation.
Although I'm quite interested in this area,
my focus tends to be on enabling technology
such as wikis and documentation generators.
So, I was fascinated to hear about the <i>social</i> aspects
of creating and maintaining documentation.
<p>
Some of the keynotes were also fascinating and inspiring.
I may not be able to do much to help Open Source gain traction in our government,
but I'm certainly enthusiastic about the idea.
In any case, other folks are highly involved and making substantial progress.
<a href="http://www.sunlightlabs.com/"                                 >Sunlight Labs</a>
and
<a href="http://www.redhat.com/open-source-government/"                >Open Source in Government</a>
are great places to find out about these efforts.
<p>
<h2>The Software</h2>
<p>
The real sm&ouml;rg&aring;sbord, however, is the immense spread of software
that the Open Source community is continually developing and maintaining,
both for itself and for the rest of the world.
<a href="http://en.wikipedia.org/wiki/Chris_DiBona">Chris DiBona</a>'s keynote
was chock full of statistics:
how many million lines of code are available in C, Perl, PHP, Ruby, etc.
<p>
In fact, I heard a common complaint from several different groups:
"nobody in the mainstream Open Source world knows we exist".
Sometimes this was because the software was so specialized
that only experts in a given discipline could be expected to use it.
Other times, however, it was simply buried under the competition.
What a problem for the Open Source community to have!
<p>
<h2>A Retrospective</h2>
<p>
It has been fascinating for me to watch the evolution
of Free and Open Source Software (FOSS) over the last few decades.
In the early 1980's, when I first heard about
<a href="http://en.wikipedia.org/wiki/Richard_Stallman">Richard Stallman</a>'s agenda,
my reaction was sympathetic but skeptical:
did he <i>really</i> believe that a complete,
free set of operating system software could be created?
<p>
Over the following fifteen years,
as I edited free software collections
for the Sun User Group and Prime Time Freeware,
I was frequently delighted by the novel and useful packages
I discovered on Internet FTP archives.
And, at <a href="http://en.wikipedia.org/wiki/Tim_O'Reilly">Tim O'Reilly</a>'s
Free Software Summit (the 1998 meeting that adopted the term "Open Source"),
I started to get a sense of the movement's possibilities.
<p>
Over the past decade,
I've been overwhelmed by the advances that the FOSS community has made
in gaining visibility among the general public,
as well as developing software, infrastructure, and community.
OK, I'm a slow learner,
but I think we might really be onto something...]]>
    </content>
</entry>

<entry>
    <title>Mechanizing the Path to Ruby 1.9</title>
    <link rel="alternate" type="text/html" href="http://www.cfcl.com/rdm/weblog/archives/001674.html" />
    <link rel="service.edit" type="application/atom+xml" href="http://www.cfcl.com/cgi-bin/mt/mt-atom.cgi/weblog/blog_id=3/entry_id=1674" title="Mechanizing the Path to Ruby 1.9" />
    <id>tag:www.cfcl.com,2009:/rdm/weblog//3.1674</id>
    
    <published>2009-07-18T21:59:26Z</published>
    <updated>2009-07-18T22:01:05Z</updated>
    
    <summary>In What do we need to get on Ruby 1.9?, Yehuda Katz says he thinks it&apos;s time to get serious about migrating the Ruby community to Ruby 1.9. He then asks for specific information on show-stoppers: gems, plugins, tools, and such that don&apos;t yet work on Ruby 1.9. His request has already resulted in a lot of opinions and anecdotal evidence, but I think there is a fairly obvious way to get better information....</summary>
    <author>
        <name>Rich</name>
        <uri>http://www.cfcl.com/~rdm/weblog</uri>
    </author>
    
        <category term="Computers" />
    
        <category term="Technology" />
    
    <content type="html" xml:lang="en" xml:base="http://www.cfcl.com/rdm/weblog/">
        <![CDATA[In <a href="http://yehudakatz.com/2009/07/17/what-do-we-need-to-get-on-ruby-1-9/"
     >What do we need to get on Ruby 1.9?</a>, Yehuda Katz says he thinks it's time
to get serious about migrating the Ruby community to Ruby 1.9.
He then asks for specific information on show-stoppers:
gems, plugins, tools, and such that don't yet work on Ruby 1.9.
His request has already resulted in a lot of opinions and anecdotal evidence,
but I think there is a fairly obvious way to get better information.]]>
        <![CDATA[<p>
<h2>Precis</h2>
<p>
The basic idea is that we should create and use mechanized tools
to help in assessing and tracking infrastructure dependencies
and Ruby 1.9 compatibility issues.
This involves harvesting and analyzing information, as discussed below,
but it's just "a simple matter of software" (:-).
Specifically, it doesn't appear to require any Computer Science breakthroughs
or even a great deal of new code.
<p>
<h3>Dependency Analysis</h3>
<p>
Step one is to inventory the major archives of Ruby code
and analyze their dependencies.
<a href="rubygems.org">RubyGems</a> makes this pretty easy.
Gem servers commonly provide
YAML snapshots of Gem-related metadata
(generated by "<tt>gem generate_index</tt>").
<a href="http://github.com">GitHub</a> and
<a href="http://rubyforge.org">RubyForge</a>, for example,
provide
<a href="http://gems.github.com/yaml"
  >http://gems.github.com/yaml</a> and
<a href="http://gems.rubyforge.org/yaml"
  >http://gems.rubyforge.org/yaml</a>.
These (large :) pages contain everything we need
to generate Gem dependency graphs. 
<p>
It may be a bit more work,
but we can also get the same sort of information for plugins, tools, etc.
So, we should be able to work out a relatively complete graph
of Ruby infrastructure dependencies.
This, by itself, would be quite useful to have.
<p>
For example, by knowing how many items (eg, Gems) depend on a given item,
we can estimate how critical the item is to the ecosystem
(<a href="http://google.com"
   >Google</a>'s
<a http://en.wikipedia.org/wiki/PageRank"
  >PageRank</a> algorithm uses a variation on this technique).
Fortunately, we don't have to content with the effects of
<a href="http://en.wikipedia.org/wiki/Search_engine_optimization"
  >search engine optimization</a> and such,
but we still have to consider global popularity.
If an item is used by a number of other items,
but few apps use any of them, it may not be all that critical.
<p>
<h3>Compatibility Analysis</h3>
<p>
Step two is to run each item's test suite,
finding out which tests exhibit Ruby 1.9-specific problems.
This is a much bigger step, for a variety of reasons.
Let's look at some of them...

<ul>
  <p><li><b>dependencies</b> -
    Some tests may depend on items which are broken on Ruby 1.9.
    However, this is not a total show-stopper.
    Let's say that gem2 depends on gem1.
    Even if gem1 is broken, we may still be able to run a number of gem2's tests.
    For example, a given unit test may not actually <i>use</i> any part of gem1.

  <p><li><b>resources</b> -
    The required resources are substantial.
    Processing and storage aren't a big deal:
    disk storage is cheap, at these levels,
    and most packages don't release all that often.
    However, development and administration are likely to require some Real Work (TM).
    This <i>might</i> happen on a volunteer basis,
    but some corporate sponsorship could certainly help it along.

  <p><li><b>security</b> -
    All tests need to be run in a secure manner.
    Nobody wants to find out that running a test suite has trashed their machine.
    So, for example, it might be appropriate to run each package's tests in a fresh VM.

  <p><li><b>setup</b> -
    Some tests may require a lot of setup.
    If this hasn't been totally mechanized,
    the effort has to be balanced against the results.
    That said, it may be possible to "crowdsource" this effort,
    taking advantage of existing test setups in the wild.
  <p>
</ul>
<p>
<h3>Presentation</h3>
<p>
Step three is to present the information in a digestible and useful fashion.
Ideally, we'd be able to get reports on the overall situation,
showing us which issues are the most critical to resolve.
Follow-up reports, allowing us to "drill down" into particular questions,
would also be useful.
<p>
It's not clear (to me, at least) what kinds of reports we'll ultimately need.
We're dealing with a large amount of highly-interrelated information.
Diagramming all of the dependencies
(eg, via <a href="http://en.wikipedia.org/wiki/GraphViz">GraphViz</a>)
might be impressive, but isn't going to be useful.
Humans aren't all that good at reading complex diagrams.
So, we'll need some finer-grained approaches.
<p>
<h3>Collaboration</h3>
<p>
Step four is to provide a forum for collaboration.
If there's a convenient and well-known place to discuss outstanding issues,
more developers may be interested in contributing.
<a href="http://en.wikipedia.org">Wikipedia</a> and (closer to home)
<a href="http://rubyspec.org">RubySpec</a>
are existence proofs for this sort of thing,
but building up a "critical mass" of volunteers is non-trivial.
Again, some corporate sponsorship might be useful,
lending needed credence and visibility.
<p>
Dan Kubb suggests that we could adopt some approaches taken
by the <a href="http://en.wikipedia.org/wiki/CPAN">CPAN</a>
(Perl's humongous archive of modules):
<blockquote>
When new gems are downloaded, their specs could be run first,
with installation only occurring if they pass.
This would bring more compatibility issues to light
than just blindly installing the gem and finding problems at runtime.
Plus, if we provide a nice way for the gem to phone home with the results
(after prompting for permission, of course),
we could aggregate spec failures someplace,
providing gem authors with a lot of information
about platform- and version-specific bugs.
<p>
The <a href="http://cpantesters.org">CPAN Testers Matrix</a>, for example,
has a distributed testing system where volunteers can install a small app on their machine
and it'll sync up with CPAN and run the tests for packages, reporting the results.
Here is some example output for
<a href="http://matrix.cpantesters.org/?dist=CGI-State+0.02">CGI-State 0.02</a>.
<p>
If we had something like this,
it would be easy to see what versions and platforms a gem works with.
CPAN is the most advanced language-specific distribution system,
so we should be looking to them for ideas and inspiration.
They are at least 3-5 years ahead of anything we have available now,
but a concerted effort could make significant progress on catching up.
</blockquote>
<p
<h2>Status</h2>
<p>
I don't know of anyone who is working on this exact project,
but there is clearly a lot of existing work that could be leveraged.
Harvesting Open Source files and metadata, for example,
is far easier than it was in years past.
There are also a variety of technologies (eg,
<a href="http://www.cs.wisc.edu/condor/">Condor</a>,
<a href="http://github.com/ezmobius/nanite/tree/master">Nanite</a>)
that can help in handling scaling issues.
<p>
As it happens,
I'm already working on a project that harvests and analyzes Ruby code.
<a href="http://cfcl.com/twiki/bin/view/Projects/Ontiki/PARSE">PARSE</a>
(Punish All Ruby Software Equally)
is supposed to run
<a href="http://metric-fu.rubyforge.org">MetricFu</a>,
<a href="http://yard.soen.ca">YARD</a>, and
some home-grown tools on a wide swath of Ruby code.
Checking for Ruby 1.9 (and eventually, 2.*) compatibility is an obvious fit.
<p>
That said, the Ruby 1.9 migration effort will have its own needs and schedule,
so it should have its own tool chain.
(Nor do I want Ontiki or PARSE to be on its critical path. :-)
However, I'd be more than happy to work with anyone who is interested
in crafting such a tool chain.]]>
    </content>
</entry>

</feed> 


