Reification, Triples, Quads and not getting it…

I’ve been working with RDF for almost 3 years now. There’s not much evidence of that here and I was recently challenged on why that is.

In large part it’s because I don’t get it. There are a lot of things I’m still struggling with in terms of how to think about solutions when using RDF and how best to work with it. Sure, I can write SPARQL with patterns several levels deep. Sure I can work with Turtle and RDF/XML in several programming languages (Java, XSLT, PHP and sed of course). I think I even understand how to think in an open-world way.

But one big thing has bugged the hell out of me for ages and ages…

I WANT QUADS

At least, I thought I did. And I thought I was alone, but then I got this in an email from Alan Dix:

One of the LBi attendees mentioned a community site they had designed for a client that allowed users to create linkages between things on the site (e.g. song/artists) … and then annotate the links. This led to short discussion (on one of my old hobby horses) on the way RDF privileges nodes over relationships because statements of triples are not labelled (do not have URIs). While the system described would have required everything to have been reified if done using RDF technology.

This sums up one of the things I’ve been struggling with so much – that there is no way to refer to the arc between two nodes. When we describe a node we use an instance URI, we say

<http://example.com/foo> a <http://example.com/schema#thing>

but standard practice when specifying predicates is to simply use the predicate, we simply use "a" rather than:

<http://…/foo> <http://…/relns/1234> <http://…/schema#thing> .
<http://…/relns/1234> means rdf:Type .

This means that while all ‘things’ have unique URIs, all type relationships use the same URI, meaning you can’t refer to the instance of a relationship directly. A URI identifying the triple would act as a surrogate, allowing you to say "The predicate on statement 97824". This is also appealing as it could also act as a surrogate for the object, where the object is a literal.

I was thinking about a problem involving incrementing a value, where I was thinking in a way that led me to want an update facility like "Increment the object of statement 87642".

Now that was just plain wrong-thinking! A statement only has identity by virtue of what it says, unlike a row in an rdbms table which has identity because of its position in the table. That is, saying "increment field 3 of row 87642" makes sense, but saying "Increment the object of statement 87642" does not. It doesn’t because as soon as the object is incremented it is a different statement. So, having triple identity to allow modification of the predicate or the object is not consistent with the way RDF is.

I was thinking about a problem involving how many times a statement had been made. So, imagine a very simple tagging statement like:

<http://…/something> tags:taggedWith "Interesting" .

I was wanting to know how many times a statement had been made, so with tagging it would give you relative sizes for a tag cloud, for example.

This is a desire for a way to refer to the statement as a whole, rather than my previous wrong-thinking which was a desire to address the parts of a statement. Other common problems that I’ve come across discussing this are around provenance or audit – who said what, when; how did that statement come to be.

Whenever I tried to discuss this I would get a blanket "REIFICATION" response. I’d read the re-ification spec and re-read it and it took me ages to get why I kept getting pointed that way.

If a triple only has identity by virtue of what it says, and giving it identity other than that leads to the kind of wrong-thinking I described earlier, then the only way to identify a statement is by virtue of what it says – that’s all re-ification is.

So, if I want to know about the tagging statement earlier

DESCRIBE ?statement WHERE {
?statement a rdf:Statement .
?statement rdf:subject <http://.../something> .
?statement rdf:predicate tags:taggedWith .
?statement rdf:object "Interesting" .
}

This allows us, simply, to identify a statement purely on the basis of what it says rather than any notion of identity other than that.

So the conclusion is, I’m wrong to want a URI for each triple and I need to fix my wrong thinking and embrace re-ification; just as soon as stores have real good support for it ;-)

SKOS, Linked Data and LCSH!

The inimitable Ed Summers has been working inside the Library of Congress, building examples and demonstrators of how LC could be getting themselves into the semantic web, the linked-data web.

It appears he’s got fed up of waiting for the support, permission and infrastructure he so richly deserves to get this data out there and he’s been and gone and done something smart outside.

lcsh.info is now a home where you can find a copy of the Library of Congress Subject Headings available in SKOS.

This is a great piece of work and fits in perfectly with the work I’ve been doing on Semantic Marc.

After much discussion with Ed he’s provided two URI schemes, the primary scheme is based on the LC Control Number, and the second is based on the natural language term of the heading.

So, the LCSH/SKOS URIs for Beer (a subject close to my heart) are:

http://lcsh.info/label/Beer which currently redirects to http://lcsh.info/sh85012832

The concept URIs then do content negotiation to return either RDF or HTML representations.

The URIs based on the natural language term is something I’ve bent Ed’s ear about constantly, mainly because of the way it makes it possible to link bibliographic data into the LCSH data without the need for a lookup, so I’m chuffed to see it. However, what I badgered Ed for was wrong.

After a long discussion with Tom Heath about stuff I now understand why my suggestion to Ed to simply redirect from the term to the LCCN based URI was wrong – using a redirect basically hides the relationship between the term and its control number form the data layer, leaving the meaning implicit in the HTTP conversation.

What Tom suggested, and I hope edsu can do is to provide a response to the term URI that explains its relationship with the LCCN URI.

Great work Ed.

Don't touch anything!

Biometrics really annoy me. In a previous life building secure authentication systems for Egg, a major internet bank, I did a great deal of research into biometrics. Not only are there issues for a substantial minority (think those with glaucoma, burns victims, those with no hands), whichever biometric you pick. But there is also the fundamental problem that you can’t ‘reset’ a biometric in the same way as a password or a certificate.

Even more annoying is the way in which proponents seem to ignore even the most compelling evidence against biometrics – such as the obvious fact that your fingerprint is neither secure nor secret. Nor is it non-reproducible.

Something that Germany’s interior minister, Wolfgang Schauble has just found out.

Data Portability

Data Portability is a great campaign, starting to gain some momentum, about ensuring the data you put into sites like facebook and linkedin is available for you to move between sites as you choose to move. Some major sites, including facebook have agreed to work with the group to develop standards for portable data, but still a long way to go.

So, dull bit done – there’s a host of videos going around promoting Data Portability

Here are the best two (IMHO) so far…

Connect, Control, Share, Remix by Michael Pick

and Get Your Data Out! by (friend and colleague) Danny Ayers