Reification, Triples, Quads and not getting it…

I’ve been working with RDF for almost 3 years now. There’s not much evidence of that here and I was recently challenged on why that is.

In large part it’s because I don’t get it. There are a lot of things I’m still struggling with in terms of how to think about solutions when using RDF and how best to work with it. Sure, I can write SPARQL with patterns several levels deep. Sure I can work with Turtle and RDF/XML in several programming languages (Java, XSLT, PHP and sed of course). I think I even understand how to think in an open-world way.

But one big thing has bugged the hell out of me for ages and ages…

I WANT QUADS

At least, I thought I did. And I thought I was alone, but then I got this in an email from Alan Dix:

One of the LBi attendees mentioned a community site they had designed for a client that allowed users to create linkages between things on the site (e.g. song/artists) … and then annotate the links. This led to short discussion (on one of my old hobby horses) on the way RDF privileges nodes over relationships because statements of triples are not labelled (do not have URIs). While the system described would have required everything to have been reified if done using RDF technology.

This sums up one of the things I’ve been struggling with so much – that there is no way to refer to the arc between two nodes. When we describe a node we use an instance URI, we say

<http://example.com/foo> a <http://example.com/schema#thing>

but standard practice when specifying predicates is to simply use the predicate, we simply use "a" rather than:

<http://…/foo> <http://…/relns/1234> <http://…/schema#thing> .
<http://…/relns/1234> means rdf:Type .

This means that while all ‘things’ have unique URIs, all type relationships use the same URI, meaning you can’t refer to the instance of a relationship directly. A URI identifying the triple would act as a surrogate, allowing you to say "The predicate on statement 97824". This is also appealing as it could also act as a surrogate for the object, where the object is a literal.

I was thinking about a problem involving incrementing a value, where I was thinking in a way that led me to want an update facility like "Increment the object of statement 87642".

Now that was just plain wrong-thinking! A statement only has identity by virtue of what it says, unlike a row in an rdbms table which has identity because of its position in the table. That is, saying "increment field 3 of row 87642" makes sense, but saying "Increment the object of statement 87642" does not. It doesn’t because as soon as the object is incremented it is a different statement. So, having triple identity to allow modification of the predicate or the object is not consistent with the way RDF is.

I was thinking about a problem involving how many times a statement had been made. So, imagine a very simple tagging statement like:

<http://…/something> tags:taggedWith "Interesting" .

I was wanting to know how many times a statement had been made, so with tagging it would give you relative sizes for a tag cloud, for example.

This is a desire for a way to refer to the statement as a whole, rather than my previous wrong-thinking which was a desire to address the parts of a statement. Other common problems that I’ve come across discussing this are around provenance or audit – who said what, when; how did that statement come to be.

Whenever I tried to discuss this I would get a blanket "REIFICATION" response. I’d read the re-ification spec and re-read it and it took me ages to get why I kept getting pointed that way.

If a triple only has identity by virtue of what it says, and giving it identity other than that leads to the kind of wrong-thinking I described earlier, then the only way to identify a statement is by virtue of what it says – that’s all re-ification is.

So, if I want to know about the tagging statement earlier

DESCRIBE ?statement WHERE {
?statement a rdf:Statement .
?statement rdf:subject <http://.../something> .
?statement rdf:predicate tags:taggedWith .
?statement rdf:object "Interesting" .
}

This allows us, simply, to identify a statement purely on the basis of what it says rather than any notion of identity other than that.

So the conclusion is, I’m wrong to want a URI for each triple and I need to fix my wrong thinking and embrace re-ification; just as soon as stores have real good support for it ;-)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>