Reification, Triples, Quads and not getting it…
I’ve been working with RDF for almost 3 years now. There’s not much evidence of that here and I was recently challenged on why that is.
In large part it’s because I don’t get it. There are a lot of things I’m still struggling with in terms of how to think about solutions when using RDF and how best to work with it. Sure, I can write SPARQL with patterns several levels deep. Sure I can work with Turtle and RDF/XML in several programming languages (Java, XSLT, PHP and sed of course). I think I even understand how to think in an open-world way.
But one big thing has bugged the hell out of me for ages and ages…
I WANT QUADS
At least, I thought I did. And I thought I was alone, but then I got this in an email from Alan Dix:
One of the LBi attendees mentioned a community site they had designed for a client that allowed users to create linkages between things on the site (e.g. song/artists) … and then annotate the links. This led to short discussion (on one of my old hobby horses) on the way RDF privileges nodes over relationships because statements of triples are not labelled (do not have URIs). While the system described would have required everything to have been reified if done using RDF technology.
This sums up one of the things I’ve been struggling with so much – that there is no way to refer to the arc between two nodes. When we describe a node we use an instance URI, we say
<http://example.com/foo> a <http://example.com/schema#thing>
but standard practice when specifying predicates is to simply use the predicate, we simply use "a" rather than:
<http://…/foo> <http://…/relns/1234> <http://…/schema#thing> .
<http://…/relns/1234> means rdf:Type .
This means that while all ‘things’ have unique URIs, all type relationships use the same URI, meaning you can’t refer to the instance of a relationship directly. A URI identifying the triple would act as a surrogate, allowing you to say "The predicate on statement 97824". This is also appealing as it could also act as a surrogate for the object, where the object is a literal.
I was thinking about a problem involving incrementing a value, where I was thinking in a way that led me to want an update facility like "Increment the object of statement 87642".
Now that was just plain wrong-thinking! A statement only has identity by virtue of what it says, unlike a row in an rdbms table which has identity because of its position in the table. That is, saying "increment field 3 of row 87642" makes sense, but saying "Increment the object of statement 87642" does not. It doesn’t because as soon as the object is incremented it is a different statement. So, having triple identity to allow modification of the predicate or the object is not consistent with the way RDF is.
I was thinking about a problem involving how many times a statement had been made. So, imagine a very simple tagging statement like:
<http://…/something> tags:taggedWith "Interesting" .
I was wanting to know how many times a statement had been made, so with tagging it would give you relative sizes for a tag cloud, for example.
This is a desire for a way to refer to the statement as a whole, rather than my previous wrong-thinking which was a desire to address the parts of a statement. Other common problems that I’ve come across discussing this are around provenance or audit – who said what, when; how did that statement come to be.
Whenever I tried to discuss this I would get a blanket "REIFICATION" response. I’d read the re-ification spec and re-read it and it took me ages to get why I kept getting pointed that way.
If a triple only has identity by virtue of what it says, and giving it identity other than that leads to the kind of wrong-thinking I described earlier, then the only way to identify a statement is by virtue of what it says – that’s all re-ification is.
So, if I want to know about the tagging statement earlier
DESCRIBE ?statement WHERE {
?statement a rdf:Statement .
?statement rdf:subject <http://.../something> .
?statement rdf:predicate tags:taggedWith .
?statement rdf:object "Interesting" .
}
This allows us, simply, to identify a statement purely on the basis of what it says rather than any notion of identity other than that.
So the conclusion is, I’m wrong to want a URI for each triple and I need to fix my wrong thinking and embrace re-ification; just as soon as stores have real good support for it ;-)
7 Comments to Reification, Triples, Quads and not getting it…
So, still trying to get into RDF and understand this kind of stuff, but do I read you right to say that you no longer currently wish for quads, you realize that everything CAN be done with triples, at least with reification?
No, you do want quads. If you use a BNode for the context position then the store will assign a unique identifier for the statement. If you only do this for the “distinct triples” (s,p,o) then you have unique identifiers for your statements in the triple store and you can go ahead and make assertions about that statements.
Reification is wrong here from two perspectives. First, reification is a statement model and says nothing about whether or not the statement itself is asserted. Second, while a lot of people have handled this problem using reification, it blows up the size of the database considerably since it adds 4 assertions for each original triple.
Use quads.
Use a bnode for the context position.
You’ll be fine.
Caveat: SPARQL allows the quad position to be interpreted in a variety of manners and the semantics basically depend on your application’s commitment to how it is going to manage the context position. Different applications can do different things and this can lead to confusion when you try to combine the data together.
I’m with Jonathan — if you’ve come to a great realization, I don’t get it yet (why quads aren’t needed) and would appreciate some further pointers.
April 3, 2008
WRT the tag cloud example, and aside from any arguments over the
semantics of duplicate statements, many stores do not support them*, so
the answer to the sparql query you use here will be the same no matter
how many times you assert or reassert the statement
<http://…./something> tags:taggedWith “Interesting”. So I guess what
you really want is for stores to internally reify statements so that one
assertion of s p o can be differentiated from another. In addition to
this, large swathes of the RDF community consider reification in RDF to
be fundamentally broken, due to the inability to differentiate between
quoted and asserted statements.
An alternative approach could be to make the modelling slightly more
complex, along the lines of like:
<http://…/tags/123> a <http://…/schema/Tag>.
<http://…/tags/123> tags:tagValue “Interesting” .
<http://…/taggings/abc> a <http://…/schema/Tagging> .
tags:tag <http://…/tags/123> .
tags:thingBeingTagged <http://…/somthing> .
Incrementing the tag count, when someone else creates the same tag is a
case of adding a few statements.
<http://…/taggings/def> a <http://…/schema/Tagging> /
tags:tag <http://…/tags/123> .
tags:thingBeingTagged <http://…/something> .
If you add graph support on top of this, I guess you could partition the
individual taggings into distinct graphs and get some rudimentary
provenance information (or you could add properties to the Tagging
instances). I can’t help but feel that I’m missing something here, so if
we’re at cross purposes here, sorry.
*I’m sure there’s better documentation about this, but
http://lists.w3.org/Archives/Public/semantic-web/2005Oct/0193.html was
all I could find right off
Cheers,
Sam
April 3, 2008
I suspect my terminology has confused things – when many people talk about quads they are talking about the fourth aspect being the graph that a triple belongs to.
I was saying that I have reached the conclusion that all the ways in which I was thinking about triples having identity, other than by virtue of what they state, were wrong. That’s not to say others don’t have good arguments in support – I just don’t have any, anymore.
as far as the rest of the comments go… ? eh? took me several minutes to parse most of those sentences ;-)
I don’t like reification, it adds too much data for too little result. Reification can be replaced with proper use of graphs in many cases. Say, the best way of tracking multiset of tags is to keep personal tags of every origin in a separate graph and use the quad store that can efficiently query all graphs in a single triple pattern.
Leave a comment
Additional comments powered by BackType
Search
What I'm Doing...
- @chriskeene Does the uni have it's own local weather system? (http://twitter.com/chriskeene/status/10314171215 and go left) in reply to chriskeene 1 hr ago
- @_philjohn should I expect a late arrival then? in reply to _philjohn 1 hr ago
- I have #md5 working on #arduino thanks to http://jarkko.jukarainen.biz/index.php?i=Projects 17 hrs ago
- More updates...
Recent Comments
- Patents are Property – Like it or Not « Chasing the Power Curve on When Patents Go Wrong…
- Arizona Joe on Fixing a plasma TV
- alex_turner11 on Ground roundup of new eReaders at CES on CNN
- negative_charge on Hacking Into Your Account is as Easy as 123456
- infopeep on Hacking Into Your Account is as Easy as 123456
- BenenhaleyBrian on The 18 Mistakes That Kill Startups
- Brian Benenhaley on The 18 Mistakes That Kill Startups
- infopeep on The 18 Mistakes That Kill Startups
- Rob Styles on Ruby Mock Web Server
- Jim on Fixing a plasma TV
Categories
- .Net Technical (8)
- Blog on Blog (6)
- commands I have issued (9)
- Enterprise Architecture (19)
- event (4)
- Fiction Book Review (2)
- Food (2)
- Intellectual Property (9)
- Interaction Design (27)
- Internet Social Impact (43)
- Internet Technical (16)
- IP Law (10)
- Library Tech (19)
- Music (2)
- New Toy (4)
- Non-Fiction Book Review (7)
- Ontologies (6)
- Open Data (7)
- Other Technical (20)
- Personal (36)
- Random Thought (16)
- Resourcing (4)
- Review (1)
- Security And Privacy (11)
- Semantic Web (30)
- Software Business (10)
- Software Engineering (37)
- Talis Technical (9)
- Uncategorized (44)
- Working at Talis (26)
- [grid::blogpaper] (8)
- [grid::fatherhood] (4)
Archives
- February 2010 (1)
- January 2010 (4)
- November 2009 (10)
- October 2009 (4)
- September 2009 (2)
- August 2009 (9)
- July 2009 (12)
- June 2009 (5)
- May 2009 (6)
- April 2009 (7)
- March 2009 (3)
- February 2009 (6)
- January 2009 (10)
- December 2008 (4)
- November 2008 (4)
- October 2008 (9)
- September 2008 (23)
- August 2008 (8)
- July 2008 (1)
- June 2008 (1)
- May 2008 (6)
- April 2008 (14)
- March 2008 (3)
- January 2008 (5)
- December 2007 (6)
- November 2007 (13)
- October 2007 (9)
- July 2007 (2)
- June 2007 (1)
- May 2007 (10)
- April 2007 (5)
- March 2007 (11)
- February 2007 (10)
- January 2007 (13)
- December 2006 (8)
- November 2006 (8)
- September 2006 (2)
- August 2006 (1)
- June 2006 (2)
- February 2006 (2)
- January 2006 (3)
- December 2005 (3)
- November 2005 (2)
- September 2005 (2)
- August 2005 (5)
- July 2005 (8)
- June 2005 (3)
- May 2005 (2)
- February 2005 (1)
- January 2005 (4)
- December 2004 (3)
- November 2004 (6)
- October 2004 (2)
- September 2004 (2)
- August 2004 (5)
- July 2004 (1)
- June 2004 (4)
- May 2004 (4)
- April 2004 (3)
- March 2004 (13)
- February 2004 (6)
- December 2003 (3)
- November 2003 (1)
- August 2003 (2)
- July 2003 (1)
- June 2003 (2)
- May 2003 (1)
- March 2003 (1)
- January 2003 (1)
- October 2002 (1)
- May 2002 (1)
- March 2002 (1)
- August 2001 (1)
- May 2001 (1)
- April 2001 (1)
- January 2001 (1)
- December 2000 (1)
- November 2000 (1)
- December 1999 (1)
- November 1999 (1)
- July 1999 (1)
April 2, 2008