Multi-Tenant Configuration Schema

Are you writing multi-tenant software? Are you using RDF at all? Do you want to keep track of your tenants?

You might want to comment on the first draft of the new Multi-Tenant Configuration Schema.

This schema attempts to describe a simple set of concepts and relationships about tenants within a multi-tenant software system. It avoids anything that would constitute application configuration, but will happily co-exist with classes and properties to do that. The documentation is sparse currently, awaiting questions and comment so that I can expand on areas that require further explanation. Comment here, or email me.

Panlibus » Blog Archive » Library of Congress launch Linked Data Subject Headings

Agree with this summary from Richard

On the surface, to those not yet bought in to the potential of Linked Data, and especially Linked Open Data, this may seem like an interesting but not necessarily massive leap forward. I believe that what underpins the fairly simple functional user interface they provide will gradually become core to bibliographic data becoming a first-class citizen in the web of data.

Overnight this uri ‘http://id.loc.gov/authorities/sh85042531’ has now become the globally available, machine and human readable, reliable source for the description for the subject heading of ‘Elephants’ containing links to its related terms (in a way that both machines and humans can navigate). This means that system developers and integrators can rely upon that link to represent a concept, not necessarily the way they want to [locally] describe it. This should facilitate the ability for disparate systems and services to simply share concepts and therefore understanding – one of the basic principles behind the Semantic Web.

from Panlibus » Blog Archive » Library of Congress launch Linked Data Subject Headings.

Great to see LoC doing this stuff and getting it out there.

Domain Specific Editing Interface using RDFa and jQuery

I wrote back in January about Resource Lists, Semantic Web, RDFa and Editing Stuff. This was based on work we’d done in Talis Aspire.

Several people suggested this should be written up as a fuller paper, so Nad, Jeni and I wrote it up as a paper for the SFSW 2009 workshop. It’s been accepted and will be published there, but unfortunately due to work priorities that have come up we won’t be able to attend.

A draft of the paper is here: A Pattern for Domain Specific Editing Interfaces Using Embedded RDFa and HTML Manipulation Tools.

The camera ready copy will be published in the conference proceedings. Feedback welcomed.

BlueBlog: How and Why Glue is Using Amazon SimpleDB instead of a Relational Database

Alex blogs over at Adaptive Blue about their use of Amazon’s SimpleDB to power their browser add-on Glue.

The post is interesting, and the comments useful. What I noticed, though, is that they’re using natural keys…

The solution that Glue uses relies on data duplication. Each Person and each Thing in our system has a unique key. In the case of a Person, the key is the username. In the case of a Thing, the key is a combination of the type, its name and an attribute, like author for a book or director for a movie, which provides a way to disambiguate among the objects that have the same type and the same name.

via BlueBlog: How and Why Glue is Using Amazon SimpleDB instead of a Relational Database.

Resource Lists, Semantic Web, RDFa and Editing Stuff

Some of the work I’ve been doing over the past few months has been on a resource lists product that helps lecturers and students make best use of the educational material for their courses.

One of the problems we hoped to address really well was the editing of lists. Historically products that do this have been deemed cumbersome and difficult by academic staff who will often produce lists as simple documents in Word or the like.

We wanted to make an editing interface that really worked for the academic community so they could keep the lists as accurate and current as they wanted.

Chris Clarke, our Programme Manager, and Fiona Grieg, one of our pilot customers, describe the work in a W3C case study. Ivan Hermann then picks up on one of the way we decided to implement editing using RDFa within the HTML DOM. In the case study Chris describes it like this:

The interface to build or edit lists uses a WYSIWYG metaphor implemented in Javascript operating over RDFa markup, allowing the user to drag and drop resources and edit data quickly, without the need to round trip back to the server on completion of each operation. The user’s actions of moving, adding, grouping or editing resources directly manipulate the RDFa model within the page. When the user has finished editing, they hit a save button which serialises the RDFa model in the page into an RDF/XML model which is submitted back to the server. The server then performs a delta on the incoming model with that in the persistent store. Any changes identified are applied to the store, and the next view of the list will reflect the user’s updates.

This approach has several advantages. First, as Andrew says

One thing I hadn’t used until recently was RDFa. We’ve used it on one of the main admin pages in our new product and it’s made what was initially quite a complex problem much simpler to implement.

The problem that’s made simpler is this – WYSIWYG editing of the page was best done using DOM manipulation techniques, and most easily using existing libraries such as prototype. But what was being edited isn’t really the visual document, it is the underlying RDF model. Trying to keep a version of the model in a JS array or something in synch with the changes happening in the DOM seemed to be a difficult (and potentially bug-ridden) option.

By using RDFa we can distribute the model through the DOM and have the model updated by virtue of having updated the DOM itself. Andrew describes this process nicely:

Currently using Jeni Tennison’s RDFQuery library to parse an RDF model out of an XHTML+RDFa page we can mix this with our own code and end up with something that allows complex WYSIWYG editing on a reading list. We use RDFQuery to parse an initial model out of the page with JavaScript and then the user can start modifying the page in a WYSIWYG style. They can drag new sections onto the list, drag items from their library of bookmarked resources onto the list and re-order sections and items on the list. All this is done in the browser with just a few AJAX calls behind the scenes to pull in data for newly added items where required. At the end of the process, when the Save button is pressed, we can submit the ‘before’ and ‘after’ models to our back-end logic which builds a Changeset from before and after models and persists this to a data store on the Talis Platform.

Building a Changeset from the two RDF models makes quite a complex problem relatively straightforward. The complexity now just being in the WYSIWYG interface and the dynamic updating of the RDFa in the page as new items are added or re-arranged.

As Andrew describes, the editing starts by extracting a copy of the model. This allows the browser to maintain before and after models. This is useful as when the before and after get posted to the server the before can be used to spot if there have been editing conflicts with someone else doing a concurrent edit – this is an improvement to how Chris described it in the case study.

There are some gotchas in this approach though. Firstly, some of the nodes have two-way links:

<http://example.com/lists/foo> <http://purl.org/vocab/resourcelist/schema#contains> <http://example.com/items/bar>
<http://example.com/items/bar> <http://purl.org/vocab/resourcelist/schema#list> <http://example.com/lists/foo>

So that the relationship from the list to the item gets removed when the item is deleted from the DOM we use the @rev attribute. This allows us to put the relationship from the list to the item with the item, rather than with the list.

The second issue is that we use rdf:Seq to maintain the ordering of the lists, so when the order changes in the DOM we have to do a quick traversal of the DOM changing the sequence predicates (_1, _2 etc) to match the new visual order.

Neither of these were difficult problems to solve :-)

My thanks go out to Jeni Tennison who helped me get the initial prototype of this approach working while we were at Swig back in Novemeber.

Exploring OpenLibrary Part Two

This post also appears on the n2 blog.

More than two weeks on from my last look at the OpenLibrary authors data and I’m finally finding some time to look a bit deeper. Last time I finished off thinking about the complete list of distinct dates within the authors file and how to model those.

Where I’ve got to today is tagged as day 2 of OpenLibrary in the n2 subversion.

First off, a correction – foaf:Name should have been foaf:name. Thanks to Leigh for pointing that out. I haven’t fixed in this tag, tagged before I realised I’d forgotten it, but next time, honestly.

It’s clear that there is some stuff in the data that simply shouldn’t be there, things that cannot possibly be a birth date such [from old catalog] and *. and simply ,. When I came across —oOo— I was somewhat dismayed. MARC data, where most of this data has come from, has a long and illustrious history, but one of the mistakes made early on was to put display data into the records in the form of ISBD punctuation. This, combined with the real inflexibility of most ILSs and web-based catalogs has forced libraries to hack there records with junk like —oOo— to fix display errors. This one comes from Antonio Ignacio Margariti.

In total there are only 6,156 unique birth date datums and 4,936 unique death dates. Of course there is some overlap, so in total there’s only 9,566 datums to worry about overall.

So what I plan to do is to set up the recognisable patterns in code and discard anything I don’t recognise as a date or date range. Doing that may mean I lose some date information, but I can add that back in later as more patterns get spotted. So far I’ve found several patterns (shown here using regex notation)…

“^[0-9]{1,4}$” – A straightforward number of 4 digits or fewer, no letters, punctuation or whitespace. These are simple years, last week I popped them in using bio:date . That’s not strictly within the rules of the bio schema as that really requires a date formatted in accordance with ISO8601. Ian had already implied his dis-pleasure with my use of bio:date and suggested I use the more relaxed dc elements date. However, on further chatting what we actually have is a date range within which the event occurred, so we need to show that the event happened somewhere within a date range. This can be solved using the W3C Time Ontology which allows for better description.

I spent some time getting hung up on exactly what is being said by these date assertions on a bio:Birth event. That is, are we saying that the birth took place somewhere within that period, or that the event happened over that period. This may seem a daft question to ask, but as others start modelling events in peoples’ bios this could easily become indistinguishable. Say I want to model my grandfather’s experience of the second world war. I’d very likely model that as an event occurring over a four year period. So, I feel the need to distinguish between an event happening over a period and an event happening at an unknown time within a period. I thought I was getting too pedantic about this, but Ian assured me I’m not and that the distinction matters.

The model we end up with is like this


@prefix bio: <http://vocab.org/bio/0.1/> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix mine: <http://example.com/mine/schema#> .
@prefix time: <http://www.w3.org/TR/owl-time/> .

<http://example.com/a/OL149323A>
	foaf:Name "Schaller, Heinrich";
	foaf:primaryTopicOf <http://openlibrary.org/a/OL149323A>;
	bio:event <http://example.com/a/OL149323A#birth>;
	a foaf:Person .

<http://example.com/a/OL149323A#birth>
	dc:date <http://example.com/a/OL149323A#birthDate>;
	a bio:Birth .

<http://example.com/names/schallerheinrich>
	mine:name_of <http://example.com/a/OL149323A>;
	a mine:Name .

<http://example.com/dates/gregorian/ad/years/1900>
	time:unitType time:unitYear;
	time:year "1900";
	a time:DateTimeDescription .

<http://example.com/a/OL149323A#birthDate>
	time:inDateTime <http://example.com/dates/gregorian/ad/years/1900>;
	a time:Instant .

The simple year accounts for 731,304 of the 748,291 birth dates and for 13,151 of the 181,696 death dates, about 80% of the dates overall. Following the 80/20 rule almost perfectly, the remaining 20% is going to be painful. It has been suggested I should stop here, but it seems a shame to not have access to the rest if we can dig in, and I can, so…

First of the remaining correct entries are the approximate years, recorded as ca. 1753 or (ca.) 1753 and other variants of that. These all suffer from leading and trailing junk, but I’ll catch the clean ones of these with “^[(]?ca\.[)]? ([0-9]{1,4})$”. The difficulty with these is that you can’t really convert these into a single year or even a date range as what people consider as within the “circa” will vary widely in different contexts. So, the interval can be described in the same way as a simple year, but the relationship with the authors birth is not simply time:inDateTime. I haven’t found a sensible circa predicate, so for now I’ll drop into mine.


@prefix bio: <http://vocab.org/bio/0.1/> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix mine: <http://example.com/mine/schema#> .
@prefix time: <http://www.w3.org/TR/owl-time/> .

<http://example.com/a/OL151554A>
	foaf:Name "Altdorfer, Albrecht";
	foaf:primaryTopicOf <http://openlibrary.org/a/OL151554A>;
	bio:event <http://example.com/a/OL151554A#birth>;
	bio:event <http://example.com/a/OL151554A#death>;
	a foaf:Person .

<http://example.com/a/OL151554A#birth>
	dc:date <http://example.com/a/OL151554A#birthDate>;
	a bio:Birth .

<http://example.com/a/OL151554A#death>
	dc:date <http://example.com/a/OL151554A#deathDate>;
	a bio:Death .

<http://example.com/names/altdorferalbrecht>
	mine:name_of <http://example.com/a/OL151554A>;
	a mine:Name .

<http://example.com/dates/gregorian/ad/years/1480>
	time:unitType time:unitYear;
	time:year "1480";
	a time:DateTimeDescription .

<http://example.com/a/OL151554A#birthDate>
	mine:circaDateTime <http://example.com/dates/gregorian/ad/years/1480>;
	a time:Instant .

Ok, it’s time to stop there until next time. I have several remaining forms to look at and some issues of data cleanup.

Next time I’ll be looking at parsing out date ranges of a few years, shown in the data 1103 or 4. These will go in as longer date time descriptions so no new modelling needed.

Then we have centuries, 7th cent., again just a broader date time description required I hope. There are some entries for works from before the birth of Christ – 127 B.C.. I’ll have to take a look at how those get described. Then we have entries starting with an l like l854. I had thought that these may indicate a different calendaring system, but it appear not. Perhaps it’s bad OCRing as there are also entries like l8l4. Not sure what to do with those just yet.

In terms of data cleanup, there are dates in the birth_date field of the form d. 1823 which means that it’s actually a death date. There are also dates prefixed with fl. which means they are flourishing dates. These are used when a birth date is unknown but the period in which the creator was active is known. These need to be pulled out and handled separately.

Of course, I haven’t dealt with the leading and trailing punctuation yet or those that have names mixed in with the dates, so still much work to do in transforming this into a rich graph.