Resource Lists, Semantic Web, RDFa and Editing Stuff

Some of the work I’ve been doing over the past few months has been on a resource lists product that helps lecturers and students make best use of the educational material for their courses.

One of the problems we hoped to address really well was the editing of lists. Historically products that do this have been deemed cumbersome and difficult by academic staff who will often produce lists as simple documents in Word or the like.

We wanted to make an editing interface that really worked for the academic community so they could keep the lists as accurate and current as they wanted.

Chris Clarke, our Programme Manager, and Fiona Grieg, one of our pilot customers, describe the work in a W3C case study. Ivan Hermann then picks up on one of the way we decided to implement editing using RDFa within the HTML DOM. In the case study Chris describes it like this:

The interface to build or edit lists uses a WYSIWYG metaphor implemented in Javascript operating over RDFa markup, allowing the user to drag and drop resources and edit data quickly, without the need to round trip back to the server on completion of each operation. The user’s actions of moving, adding, grouping or editing resources directly manipulate the RDFa model within the page. When the user has finished editing, they hit a save button which serialises the RDFa model in the page into an RDF/XML model which is submitted back to the server. The server then performs a delta on the incoming model with that in the persistent store. Any changes identified are applied to the store, and the next view of the list will reflect the user’s updates.

This approach has several advantages. First, as Andrew says

One thing I hadn’t used until recently was RDFa. We’ve used it on one of the main admin pages in our new product and it’s made what was initially quite a complex problem much simpler to implement.

The problem that’s made simpler is this – WYSIWYG editing of the page was best done using DOM manipulation techniques, and most easily using existing libraries such as prototype. But what was being edited isn’t really the visual document, it is the underlying RDF model. Trying to keep a version of the model in a JS array or something in synch with the changes happening in the DOM seemed to be a difficult (and potentially bug-ridden) option.

By using RDFa we can distribute the model through the DOM and have the model updated by virtue of having updated the DOM itself. Andrew describes this process nicely:

Currently using Jeni Tennison’s RDFQuery library to parse an RDF model out of an XHTML+RDFa page we can mix this with our own code and end up with something that allows complex WYSIWYG editing on a reading list. We use RDFQuery to parse an initial model out of the page with JavaScript and then the user can start modifying the page in a WYSIWYG style. They can drag new sections onto the list, drag items from their library of bookmarked resources onto the list and re-order sections and items on the list. All this is done in the browser with just a few AJAX calls behind the scenes to pull in data for newly added items where required. At the end of the process, when the Save button is pressed, we can submit the ‘before’ and ‘after’ models to our back-end logic which builds a Changeset from before and after models and persists this to a data store on the Talis Platform.

Building a Changeset from the two RDF models makes quite a complex problem relatively straightforward. The complexity now just being in the WYSIWYG interface and the dynamic updating of the RDFa in the page as new items are added or re-arranged.

As Andrew describes, the editing starts by extracting a copy of the model. This allows the browser to maintain before and after models. This is useful as when the before and after get posted to the server the before can be used to spot if there have been editing conflicts with someone else doing a concurrent edit – this is an improvement to how Chris described it in the case study.

There are some gotchas in this approach though. Firstly, some of the nodes have two-way links:

<http://example.com/lists/foo> <http://purl.org/vocab/resourcelist/schema#contains> <http://example.com/items/bar>
<http://example.com/items/bar> <http://purl.org/vocab/resourcelist/schema#list> <http://example.com/lists/foo>

So that the relationship from the list to the item gets removed when the item is deleted from the DOM we use the @rev attribute. This allows us to put the relationship from the list to the item with the item, rather than with the list.

The second issue is that we use rdf:Seq to maintain the ordering of the lists, so when the order changes in the DOM we have to do a quick traversal of the DOM changing the sequence predicates (_1, _2 etc) to match the new visual order.

Neither of these were difficult problems to solve :-)

My thanks go out to Jeni Tennison who helped me get the initial prototype of this approach working while we were at Swig back in Novemeber.

Exploring OpenLibrary Part Two

This post also appears on the n2 blog.

More than two weeks on from my last look at the OpenLibrary authors data and I’m finally finding some time to look a bit deeper. Last time I finished off thinking about the complete list of distinct dates within the authors file and how to model those.

Where I’ve got to today is tagged as day 2 of OpenLibrary in the n2 subversion.

First off, a correction – foaf:Name should have been foaf:name. Thanks to Leigh for pointing that out. I haven’t fixed in this tag, tagged before I realised I’d forgotten it, but next time, honestly.

It’s clear that there is some stuff in the data that simply shouldn’t be there, things that cannot possibly be a birth date such [from old catalog] and *. and simply ,. When I came across —oOo— I was somewhat dismayed. MARC data, where most of this data has come from, has a long and illustrious history, but one of the mistakes made early on was to put display data into the records in the form of ISBD punctuation. This, combined with the real inflexibility of most ILSs and web-based catalogs has forced libraries to hack there records with junk like —oOo— to fix display errors. This one comes from Antonio Ignacio Margariti.

In total there are only 6,156 unique birth date datums and 4,936 unique death dates. Of course there is some overlap, so in total there’s only 9,566 datums to worry about overall.

So what I plan to do is to set up the recognisable patterns in code and discard anything I don’t recognise as a date or date range. Doing that may mean I lose some date information, but I can add that back in later as more patterns get spotted. So far I’ve found several patterns (shown here using regex notation)…

“^[0-9]{1,4}$” – A straightforward number of 4 digits or fewer, no letters, punctuation or whitespace. These are simple years, last week I popped them in using bio:date . That’s not strictly within the rules of the bio schema as that really requires a date formatted in accordance with ISO8601. Ian had already implied his dis-pleasure with my use of bio:date and suggested I use the more relaxed dc elements date. However, on further chatting what we actually have is a date range within which the event occurred, so we need to show that the event happened somewhere within a date range. This can be solved using the W3C Time Ontology which allows for better description.

I spent some time getting hung up on exactly what is being said by these date assertions on a bio:Birth event. That is, are we saying that the birth took place somewhere within that period, or that the event happened over that period. This may seem a daft question to ask, but as others start modelling events in peoples’ bios this could easily become indistinguishable. Say I want to model my grandfather’s experience of the second world war. I’d very likely model that as an event occurring over a four year period. So, I feel the need to distinguish between an event happening over a period and an event happening at an unknown time within a period. I thought I was getting too pedantic about this, but Ian assured me I’m not and that the distinction matters.

The model we end up with is like this


@prefix bio: <http://vocab.org/bio/0.1/> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix mine: <http://example.com/mine/schema#> .
@prefix time: <http://www.w3.org/TR/owl-time/> .

<http://example.com/a/OL149323A>
	foaf:Name "Schaller, Heinrich";
	foaf:primaryTopicOf <http://openlibrary.org/a/OL149323A>;
	bio:event <http://example.com/a/OL149323A#birth>;
	a foaf:Person .

<http://example.com/a/OL149323A#birth>
	dc:date <http://example.com/a/OL149323A#birthDate>;
	a bio:Birth .

<http://example.com/names/schallerheinrich>
	mine:name_of <http://example.com/a/OL149323A>;
	a mine:Name .

<http://example.com/dates/gregorian/ad/years/1900>
	time:unitType time:unitYear;
	time:year "1900";
	a time:DateTimeDescription .

<http://example.com/a/OL149323A#birthDate>
	time:inDateTime <http://example.com/dates/gregorian/ad/years/1900>;
	a time:Instant .

The simple year accounts for 731,304 of the 748,291 birth dates and for 13,151 of the 181,696 death dates, about 80% of the dates overall. Following the 80/20 rule almost perfectly, the remaining 20% is going to be painful. It has been suggested I should stop here, but it seems a shame to not have access to the rest if we can dig in, and I can, so…

First of the remaining correct entries are the approximate years, recorded as ca. 1753 or (ca.) 1753 and other variants of that. These all suffer from leading and trailing junk, but I’ll catch the clean ones of these with “^[(]?ca\.[)]? ([0-9]{1,4})$”. The difficulty with these is that you can’t really convert these into a single year or even a date range as what people consider as within the “circa” will vary widely in different contexts. So, the interval can be described in the same way as a simple year, but the relationship with the authors birth is not simply time:inDateTime. I haven’t found a sensible circa predicate, so for now I’ll drop into mine.


@prefix bio: <http://vocab.org/bio/0.1/> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix mine: <http://example.com/mine/schema#> .
@prefix time: <http://www.w3.org/TR/owl-time/> .

<http://example.com/a/OL151554A>
	foaf:Name "Altdorfer, Albrecht";
	foaf:primaryTopicOf <http://openlibrary.org/a/OL151554A>;
	bio:event <http://example.com/a/OL151554A#birth>;
	bio:event <http://example.com/a/OL151554A#death>;
	a foaf:Person .

<http://example.com/a/OL151554A#birth>
	dc:date <http://example.com/a/OL151554A#birthDate>;
	a bio:Birth .

<http://example.com/a/OL151554A#death>
	dc:date <http://example.com/a/OL151554A#deathDate>;
	a bio:Death .

<http://example.com/names/altdorferalbrecht>
	mine:name_of <http://example.com/a/OL151554A>;
	a mine:Name .

<http://example.com/dates/gregorian/ad/years/1480>
	time:unitType time:unitYear;
	time:year "1480";
	a time:DateTimeDescription .

<http://example.com/a/OL151554A#birthDate>
	mine:circaDateTime <http://example.com/dates/gregorian/ad/years/1480>;
	a time:Instant .

Ok, it’s time to stop there until next time. I have several remaining forms to look at and some issues of data cleanup.

Next time I’ll be looking at parsing out date ranges of a few years, shown in the data 1103 or 4. These will go in as longer date time descriptions so no new modelling needed.

Then we have centuries, 7th cent., again just a broader date time description required I hope. There are some entries for works from before the birth of Christ – 127 B.C.. I’ll have to take a look at how those get described. Then we have entries starting with an l like l854. I had thought that these may indicate a different calendaring system, but it appear not. Perhaps it’s bad OCRing as there are also entries like l8l4. Not sure what to do with those just yet.

In terms of data cleanup, there are dates in the birth_date field of the form d. 1823 which means that it’s actually a death date. There are also dates prefixed with fl. which means they are flourishing dates. These are used when a birth date is unknown but the period in which the creator was active is known. These need to be pulled out and handled separately.

Of course, I haven’t dealt with the leading and trailing punctuation yet or those that have names mixed in with the dates, so still much work to do in transforming this into a rich graph.

Exploring OpenLibrary Part One

This post also appears on the n2 blog.

I thought it was about time I got around to taking a better look at what might be possible with the OpenLibrary data.

My plan is to try and convert it into meaningful RDF and see what we can find out about things along the way. The project is an own-time project mostly, so progress isn’t likely to be very rapid. Let’s see how it goes. I’ll diary here as stuff gets done.

To save me typing loads of stuff out here, today’s source code is tagged and in the n2 subversion as day 1 of OpenLibrary.

Day one, 3rd October 2008, I downloaded the authors data from OpenLibrary and unzipped it. I’m also downloading the editions data from OpenLibrary, but that’s bigger (1.8Gb) so I’m playing with the author data while that comes down the tubes.

The data has been exported by OpenLibrary as JSON, so is pretty easy to work with. I’m going to write some PHP scripts on the command line to mess with it and it looks great for doing that.

Each line of the JSON in the authors file represents a single author, although some authors will have more than one entry. Taking a look at Iain Banks (aka Iain M Banks) we have the following entries:


{"name": "Banks, Iain", "personal_name": "Banks, Iain", "key": "\/a\/OL32312A", "birth_date": "1954", "type": {"key": "\/type\/type"}, "id": 81616}
{"name": "Banks, Iain.", "type": {"key": "\/type\/type"}, "id": 3011389, "key": "\/a\/OL954586A", "personal_name": "Banks, Iain."}
{"type": {"key": "\/type\/type"}, "id": 9897124, "key": "\/a\/OL2623466A", "name": "Iain Banks"}
{"type": {"key": "\/type\/type"}, "id": 9975649, "key": "\/a\/OL2645303A", "name": "Iain Banks         "}
{"type": {"key": "\/type\/type"}, "id": 10565263, "key": "\/a\/OL2774908A", "name": "IAIN M. BANKS"}
{"type": {"key": "\/type\/type"}, "id": 10626661, "key": "\/a\/OL2787336A", "name": "Iain M. Banks"}
{"type": {"key": "\/type\/type"}, "id": 12035518, "key": "\/a\/OL3127859A", "name": "Iain M Banks"}
{"type": {"key": "\/type\/type"}, "id": 12078804, "key": "\/a\/OL3137983A", "name": "Iain M Banks         "}
{"type": {"key": "\/type\/type"}, "id": 12177832, "key": "\/a\/OL3160648A", "name": "IAIN M.BANKS"}

In total the file contains 4,174,245 entries. First job is to get a more manageable set of data to work with. So, I wrote a short script to extract 1 line in every 10 from a file. The resulting sample author data file contains 417,424 entries. This is more manageable for quick testing of what I’m doing.

So now we can start writing some code to produce some RDF. Given the size of these files, I need to stream the data in and out again in chunks. The easiest format I find for that is turtle which has the added benefit of being human readable. YMMV. Previously I’ve streamed stuff out using n-triples. That has some great benefits too, like being able to generate different parts of the graph, for the same subject, in different parts of the file then being them together using a simple command line sort. It’s also a great format for chunking the resulting data into reasonable size files as breaking on whole lines doesn’t break the graph, whereas with rdf/xml and turtle it does.

So, I may end up dropping back to n-triples, but for now I’m going to use turtle.

I also like working on the command line and love the unix pipes model, so I’ll be writing the cli (command line) tools to read from STDIN and write to STDOUT so I can mess with the data using grep, sed, awk, sort, uniq and so on.

First things first, Let’s find out what’s really in the authors data. Reading the json line by line and converting each line into an associative array is simple in PHP, so let’s do that, keep track of all the keys we find in the arrays and recurse into the nested arrays to look at them – then dump the result out. The arrays contain this set of keys:

alternate_names
alternate_names
alternate_names\1
alternate_names\2
alternate_names\3
bio
birth_date
comment
date
death_date
entity_type
fuller_name
id
key
location
name
numeration
personal_name
photograph
title
type
type\key
website

So, they have names, birth dates, death dates, alternate names and a few other bits and pieces. And they have a ‘key’ which turns out to be the resource part of the OpenLibrary url. That’s means we can link back into OpenLibrary nice and easy. Going back to our previous Iain Banks examples, we want to create something like this for each one:


@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix bio: <http://vocab.org/bio/0.1/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<http://example.com/a/OL32312A>
	foaf:Name "Banks, Iain";
	foaf:primaryTopicOf <http://openlibrary.org/a/OL32312A>;
	bio:event <http://example.com/a/OL32312A#birth>;
	a foaf:Person .

<http://example.com/a/OL32312A#birth>
	bio:date "1954";
	a bio:Birth .

This gives us a foaf:Person for the author and tracks his birth date using a bio:Birth event. While tracking the birth as a separate entity may seem odd it gives the opportunity to say things about the birth itself. We’ll model death dates the same way, for the same reason. I’ve written some basic code to generate foaf from the OpenLibrary authors.

Linking back to the OpenLibrary url has been done here using foaf:primaryTopicOf. I didn’t use owl:sameAs because the url at OpenLibrary is that of a web page, whereas the uri here (http://example.com/a/OL32312A) represents a person. Clearly a person is not the same as a web page that contains information about them.

The only thing worrying me is that the uris we’re using are constructed from OpenLibrary’s keys. This makes matching them up with other data sources hard. Matching with other data sources requires a natural key, but there’s not enough data in these author entries to create one. The best I can do is to create a natural key that will enable people to discover the group of authors that share a name.


@prefix mine: <http://example.com/mine/schema#> .
<http://example.com/names/banksiain>
	mine:name_of <http://example.com/a/OL32312A>;
	a mine:Name .

These uris will enable me to find authors that share the same name easily, either because they do share the same name or because they’re duplicates. The natural key is simply the author’s name with any casing, whitespace or punctuation stripped out. That might need to evolve as I start looking at the names in more detail later.

Next step is to look in more detail at the dates in here, we have some simple cases of trailing whitespace or trailing punctuation, but also some more interesting cases of approximate dates or possible ranges – these occur for historical authors mostly. The complete list of distinct dates within the authors file is in svn. If you know anything about dates, feel free to throw me some free advice on what to do with them…

Pages, Screens, MVC and not getting it…

MVC Web

About two years ago my colleague Ian Davis and I were talking about different approaches to building web applications. I was advocating that we use ASP.Net; The framework it provides for nesting controls within controls (server control and user controls) is very powerful. I was describing it as a component-centric approach where we could build pages rapidly by plugging controls together.

Ian was describing a page-centric approach, and advocating XSLT (within PHP) as one of several possible solutions. He was suggesting that his approach was both simpler and that we could be more productive using it. Having spent two years working with ASP.Net I was not at all convinced.

Two years on and I think I finally get what he was saying. What can I say, I’m a slow learner. The difference in our opinions was based on two different underlying mental models.

The ASP.Net mental model is that of application software. It tries to bring the way we build windows software to the web. ASP.Net has much of the same feature set that we have if building a Windows Forms app; it’s no coincidence that the two models are now called Windows Forms and Web Forms. In this model we think about the forms, or screens, that we have to build and consider the data on which they act as secondary – a parameter to the screen to tell it which user, or expense claim or whatever to load for editing.

In this mental model we end up focussing on the verbs of the problem. We end up with pages called ‘edit.aspx’, ‘createFoo.aspx’ and ‘view.aspx’; where view is the in the verb form, not the noun. ASP.Net is not unique in this, the same model exists in JSP and many people use PHP this way – it’s not specific to any technology, it’s a style of thinking and writing.

Ian’s mental model is different. Ian’s mental model is that of the web. The term URL means Uniform _Resource_ Locator. It doesn’t say Uniform _Function_ Locator. A URL is meant to refer to a noun, not a verb. This may seem like an esoteric or pedantic distinction to be making, but it affects the way we think about the structure of our applications and changing the way we think about solving a problem is always interesting.

If we think about URLs as being only nouns, no verbs, then we end up with a URL for every important thing in our site. Those URLs can then be bookmarked and linked easily. We can change code behind the scenes without changing the URLs as the URLs refer to objects that don’t change rather than functions that do.

So if URLs refer to nouns, how do we build any kind of functionality? That’s tied up in something else that Ian was saying a long time ago – when he asked me “What’s the difference between a website and a web API?”. My mental model, building web applications the way we build windows apps, was leading me to consider the UI and the API as different things. Ian was seeing them as one and the same. When I was using URIs refer to verbs I found this hard to conceptualise, but thinking about URIs as nouns it becomes clearer – That’s what REST is all about. URIs are nouns and then the HTTP verbs give you your functionality.

That realisation and others from working on Linked Open Data means I now think they’re one and the same too.

At Talis we’ve done a few projects this way. Most notably our platform, but also Project Cenote some time ago and a few internal research projects more recently. The clearest of these so far is the product I’m working on right now to support reading lists (read course reserves in the US) in Higher Education. We’re currently in pilot with University of Plymouth, here’s one of their lists on Financial Accounting and Reporting. The app is built from the ground up as Linked Data and does all the usual content negotiation goodness. We still have work to do on putting in RDFa or micro-formats and cross references between the html and rdf views – so it’s not totally linked data yet.

What I’ve found is that this approach to building web apps beats anything else I’ve worked with (In roughly chronoligical order – Lotus Domino, Netscape Application Server, PHP3, Vignette StoryServer, ASP, PHP4, ASP.Net, JSP, PHP5).

The model is inherently object-oriented, with every object (at least those of meaning to the outside world) having a URI and every object responding to the standard HTTP verbs, GET, PUT, POST, DELETE. This is object-orientation at the level of the web, not at the level of a server-side language. That’s a very different thing to what JSP does, where internally the server-side code may be object-oriented, but the URIs refer to verbs, so look more procedural or perhaps functional.

It’s also inherently MVC, with GET requests asking for a view (GET should never cause a change on the server) and PUT, POST and DELETE being handled by controllers. With MVC though, we typically think of that as happening in lots of classes in a single container, like ASP.Net or Tomcat or something like that. This comes from two factors in my experience. Firstly the friction between RDBMS models and object models and secondly the relatively poor performance of most databases. These two things combine to drive people to draw the model into objects alongside the views and controllers.

The result of this is usually that it’s not clear how update behaviour should be divided between the model and the controllers and how display behaviour should be divided between the model and the views. As a result the whole thing becomes complex and confused. That doesn’t even start to take into account the need for some kind of persistence layer that handles the necessary translation between object model and storage.

We’ve not done that. We’ve left the model in a store, in this case a Talis Platform store, but it could be any triple store. That’s what the diagram at the top shows, the model staying seperate from views and controllers… and having no behaviour.

A simple example may help, how about tagging something within an application. We have the thing we’re tagging, which we’ll call http://example.com/resources/foo and the collection of tags attached to foo which we’ll call http://example.com/resources/foo/tags. A http GET asking for /resources/foo would be routed to some view code which reads the model and renders a page showing information about foo, and would show the tags too of course. It would also render a form for adding a tag which simply posts the new tag to /resources/foo/tags.

The POST gets routed to some controller logic which is responsible for updating the model.

The UI response to the POST is to show /resources/foo again, which will now have the additional tag. Most web development approaches would simply return the HTML in response to the POST, but we can keep the controller code completely seperate from the view code by responding to a successful POST with a 303 See Other with a location of /resources/foo which will then re-display with the new tag added.

“The response to the request can be found under a different URI and SHOULD be retrieved using a GET method on that resource. This method exists primarily to allow the output of a POST-activated script to redirect the user agent to a selected resource.” rfc2616

This model is working extremely well for us in keeping the code short and very clear.

The way we route requests to code is through the use of a dispatcher, .htaccess in apache sends all requests (except those for which a file exists) to a dispatcher which uses a set of patterns to match the URI to a view or controller depending on the request being a GET or POST.

Ian has started formalising this approach into a framework he’s called Paget.

LDOW2008

IMG_1681.JPG

Linked Data on the Web 2008, a workshop attached to WWW2008, was today’s main attraction for about 100 of us here in Beijing.

Sir Tim opened the workshop again this year and was popping in at regular intervals throughout the day. His opening included the great soundbyte: "Linked Data is the Semantic Web done right, and the Web done right".

What he means by that is that the linked open data project is the closest we have to his original vision of a web of knowledge.

A number of new RDF browsers were on show this year with Sir Tim showing off version 2 of The Tabulator which provides read/write functionality via HTTP POST or WebDAV. Visual browsers, including relation browser, popped up a few times, including a great session from Uldis Bojars on Browsing Linked Data with Fenfire (standing in for Tuuka Hastrup) which looks like a great tool for playing with small amounts of semweb data. Like The Tabulator v2 Fenfire allows editing of the data – must take a look at that.

The Linked Data map continues to grow, but there were no new announcements today, although I know a few people have stuff lined up but just aren’t ready to ‘go big’ with it just yet. This map featured in every set of slides during the day by my reckoning. In ten years time people will be showing that on slides just after the photo of Sir Tim’s first server at CERN!

Kingsley Idehen kept the session real by keeping up the "give us a demo" riff throughout the day – a constant reminder that most of this stuff is practical, real stuff now. Although, being a fairly academic crowd there was plenty of discussion of theory too.

The key theme that I took away from the day was that disambiguation is (one of) the next big problem(s) the Semantic Web has to tackle. Without this having been discussed ahead of time, several people presented discussions or solutions on the problem. I presented on the use of natural keys in URIs as one aspect (a paper written by myself, Nadeem Shabir and Danny Ayers); Humboldt was suggested as a possible candidate by Kingsley Idehen and others and Afraz Jaffri spoke explicitly about URI disambiguation in the Context of Linked Data. Other aspects of disambguation were presented by Alexandre Passant in Meaning Of A Tag: A collaborative approach to bridge the gap between tagging and Linked Data.

Another great sign of this all getting real was the warm reception that Paul’s talk on Open Data Licensing got. In the same way that the open-source community matured and embraced licensing and the was that the blogosphere, photo and video sharing sites matured and embraced Creative Commons, we need the Linked Data community to mature and one aspect of that is to embrace licensing – clear statements of the terms under which all of this data is published.

The day’s sessions ended on a great discussion of the issues of distributed conversation, statements spread across the web, when we have combined the URI as identifier and the URI as locator of the content. That is, how do you ask http://example.com what it knows about http://foo.com. Or, more importantly, how do you know that you should ask http://example.com in the first place. Given that we want to be able to let anyone say anything about anything else, and that we want to be able to find everything that has been said about something, this is also going to be a key focus.

The need for disambiguation and the need for discovery of statements overlap in what Paolo Bouquet presented – An Entity Name System for Linking Semantic Web Data. Paolo presents an attempt to model a (philosophically) DNS like system, ENS, to address these problems. Controversially he suggests everyone use identifiers within their own internet domains to make statements, then use ENS based identifiers as the single unique reference off which everyone’s URIs then hang. Now that got the debate going!

More of my photos are tagged LDOW2008 on Flickr.

Strong vs Weak

My colleague Ian Davis pointed at a great post by Nelson Minar.

The deeper problem with SOAP is strong typing. WSDL accomplishes its magic via XML Schema and strongly typed messages. But strong typing is a bad choice for loosely coupled distributed systems. The moment you need to change anything, the type signature changes and all the clients that were built to your earlier protocol spec break. And I don’t just mean major semantic changes break things, but cosmetic things like accepting a 64 bit int where you use used to only accept 32 bit ints, or making a parameter optional. SOAP, in practice, is incredibly brittle. If you’re building a web service for the world to use, you need to make it flexible and loose and a bit sloppy. Strong typing is the wrong choice.

This difference in capability and philosophy betwen weak and strong typing is still something I’m getting a handle on. On the one hand I like the flexibility of wekly typed languages like Smalltalk and more recently Ruby, but on the other hand strong typing gives us great gifts like Intellisense and compile-time type checking (which I still find useful when I’m using someone elses API).

I’ve been watching Teqlo for a while (haven’t had my preview key yet though) and it seems they also appreciating strong typing

The questions in my mind are:

Do Teqlo know something I don’t about the trends for web services and typing?

Could Teqlo do what they’re trying to without strong typing?

If web services go weakly typed will Teqlo have a business?

Teqlo

I’ve been watching Teqlo for a little while (since before the renamed from Abgenial Systems) and they look very interesting. They’ve been pretty coy about what they’re doing other than letting us know they’re broadly about letting you combine web services into applications without doing any programming. Anyways, they finally got a video out showing precisely… nothing. Although the UI looks like it’ll be quite pretty…

I’ve signed up for the preview, so I hope dissing the video doesn’t stop them sending me a login…

Anti-Cache

I got a call from a friend the other day with a little bugette on his secure site. Bits of it kept getting cached by the browser – he’d been through the problem with the team and it turned out that several chunks of code generating pages that should not be cached by the browser were forgetting to set the necessary headers.

We started talking and I suggested that actually the page code was the wrong place and that we should get it into one place – in the webserver.

To do this we slapped together a quick NSAPI SAF to add the headers we wanted – in this case Expires, Cache-Control and Pragma.

Continue reading