Sunday, June 18, 2006

RDF Actually Isn't That Hard

That's right. Not hard. Easy, in fact. RDF just hangs with a bad crowd. I'll explain:

I suspect one of the major detriments to the Resource Description Framework and its proponents is the association with XML. I also posit that the second largest source of confusion is its use of URIs as something other than URLs. Now, the populace has just begun to understand the concept of an URL as a globally unique locator for porn, news, games, porn and shopping. The relative few tech-savvy types that actually understand XML do so in the capacity that it is a way to serialize an ordered, hierarchical data structure. RDF is actually a lot more simple.

RDF is one or more statements about one or more things.

Expository — you know — like your first essay in grade school.

That's right. things. Unique things. A dog. A banana. Some glasses. Or possibly classes of things. Houses. Cars. People.

Statements. Like "knows" or "is a type of" or "recommends". Let's try one:

<Dorian> <wears> <glasses>.

Sweet. How about another?

<The glasses> <are made by> <ic!berlin>.

Looks kind of like English, huh? Well, this is (effectively) REAL RDF SYNTAX!

(Incidentally, <Dorian> <recommends> <ic!berlin>.)

So, I say to those that might want to learn RDF:

Ignore RDF/XML.

Seriously. It just complicates things. Suppose I wanted to describe the statements above as RDF/XML. It would probably look something like this:


<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foo="http://doriantaylor.com/messing/with/your/head/with/RDF#">
<rdf:Description rdf:about="http://doriantaylor.com/">
<foo:wears rdf:nodeID="GLASSES"/>
</rdf:Description>
<rdf:Description rdf:ID="GLASSES">
<foo:manufacturer rdf:resource="http://ic-berlin.de/"/>
</rdf:Description>
<-- oh, and incidentally -->
<rdf:Description rdf:about="http://doriantaylor.com/">
<foo:recommends rdf:resource="http://ic-berlin.de/"/>
</rdf:Description>
</rdf:RDF>
That's right. The near-English from above has been mangled almost completely beyond recognition. Try it, it's valid. It's just not legible. The only thing RDF/XML and non-RDF XML have in common is the raw syntax.

"So Dorian, what about that funny namespace URI you used for 'foo'?"

That's the other part.

URIs don't actually have to be web pages.

Although in this case, I'd probably want the URI "http://doriantaylor.com/messing/with/your/head/with/RDF#" to point to an RDF schema that explained nicely what my understanding of the verbs "wears" and "recommends" are. But I digress. Just remember the following:

A Uniform Resource Identifier identifies resources uniformly. Shocking, I know.

http://www.cnn.com/ is only ever going to point to CNN's homepage (unless, of course, they neglect to pay their domain bill).
urn:isbn:096139210X will only ever refer to a particular favourite book of mine.
tel:+1-900-HOT-CHIX is only ever going to be the destination of my date for Friday night.

Um, yeah.

The important part about URIs in RDF is that they represent globally unique resources — things, categories, ideas. Suppose I were to replace my original example with URIs:

<http://doriantaylor.com/>
<http://doriantaylor.com/messing/with/your/head/with/RDF#wears> <genid:GLASSES>.
<genid:GLASSES> <http://doriantaylor.com/messing/with/your/head/with/RDF#manufacturer> <http://ic-berlin.de/>.
<http://doriantaylor.com/> <http://doriantaylor.com/messing/with/your/head/with/RDF#recommends> <http://ic-berlin.de/>.

When I swap URIs in, it becomes clear that those are the only things in the world I can possibly be talking about. The only loose thread is, collectively, what I consider "to wear", what a "manufacturer" is, and what it means "to recommend". This is where stuff like RDF Schema and OWL comes in, which I consider out of the scope of this post. I will say, however, that it's usually better to pick a lingua franca to describe certain things than to come up with your own, as I oh-so-naughtily did above.

One other item: that genid:GLASSES represents an item that is local to the set of statements, in order to tie a group of statements together. I can just as easily replace it with xxx:HGLAUGAHLGA or urn:x-foo:bizzle or http://doriantaylor.com/possessions/glasses where there might be a nice picture of me wearing my glasses. In fact, having all resources refer to something globally unique is preferred, but a generated ID can suffice in a pinch.

Why are we doing all of this?

For the computers, of course! The poor darlings work so hard but they're really not that bright. Especially when it comes to icky human things like semantics. The idea is, if we give them enough clues, they will work really hard to help us get a better picture of the world around us.

If people could advertise their work in a unified way, our computers could sort and filter this information based on what it actually is, rather than words it contains or what refers to it.

And that has some seriously cool potential.