What next?

I’m leaving Talis.

For the past seven years I have had the great fortune to learn a huge amount from awesome people. That has put me in the position of having some great conversations about what I’ll be doing next; and those conversations are exciting. More on that in a later post. First, how can someone be happy to be leaving a great company?

Back in late 2004 I joined a small library software vendor with some interesting challenges, Talis. Since then we have become one of the best known Linked Data and Semantic Web brands in the world. On that journey I have learnt so much. I’ve learnt everything from an obscure late 1960s data format (MARC) to Big Data conversions using Hadoop. The technology has been the least of it though.

I’ve been rewarded with the opportunity to hear some of the smartest people in the world speak in some amazing places. I’ve pair-programmed with Ian Davis and had breakfast with Tim Berners-Lee; I’ve seen the Rockies in Banff and walked the Great Wall of China. As a result of our brand and the work we’ve done I’ve been invited to help write the first license for open data; train government departments on how to publish their data and talk about dinosaurs with Tom Scott at the BBC.

Talis has always been about the people. People in Talis (Talisians); people outside we’ve worked with and bounced ideas off; customers who have allowed us to help with exciting projects. I have made some great friends and been taught some humbling lessons.

Amongst the sharpest highlights has been an enormously rewarding day job. At the start re-imagining Talis Base and then Talis Prism; seeding an education-focussed business and recently building an expert, international consultancy.

I joined Talis expecting to stay for a few years and found the journey so rewarding it has kept me for so much longer. It’s now time for my journey and Talis to diverge as I think about doing something different.

I still have a couple of consulting engagements to finalise, so if you’re one of those then please don’t panic; we’ll be talking soon.

Getting over-excited about Dinosaurs…

I had the great pleasure, a few weeks ago, of working with Tom Scott and Michael Smethurst at the BBC on extensions to the Wildlife Ontology that sits behind Wildlife Finder.

In case you hadn’t spotted it (and if you’re reading this I can’t believe you haven’t) Wildlife Finder provides its information in HTML and RDF — Linked Data, providing a machine-readable version of the documents for those who want to extend or build on top of it. Readers of this blog will have seen Wildlife Finder showcased in many, many Linked Data presentations.

The initial data modelling work was a joint venture between Tom Scott of BBC and Leigh Dodds of Talis and they built an ontology that is simple, elegant and extensible. So, when I got a call asking if I could help them add Dinosaurs into the mix I was chuffed — getting paid to talk about dinosaurs!

Like most children, and we’re all children really, I got over-excited and rushed up to London to find out more. Tom and I spent some time working through changes and he, being far more knowledgeable than I on these matters, let me down gently.

Dinosaurs, of course, are no different to other animals in Wildlife Finder — other than being dead for a while longer…

This realisation made me feel a little below average in the biology department I can tell you. It’s one of those things you stumble across that is so obvious once someone says it to you and yet may well not have occurred to you without a lot of thought.

 

Introducing the Web of Data

** This post originally appeared on Talis’ Platform Consulting Blog **

So, the blog is fairly new, but we’ve been here a while. For those of you who know us already you may know that Talis is more than 40 years old!

During that time the company has seen many changes in the technology landscape and has been at the forefront of many changes.

Linked Data is not too much different. We’ve been doing Linked Data and Semantic Web stuff for several years now. We think we’ve learned some lessons along the way.

If you’ve been to one of our open days, or paid really close attention to our branding, you’ll have noticed the strapline shared innovation™. We like to share what we’re doing and have been a little lax at talking about our consulting work here — expect that to change. 🙂

In the meantime I wanted to point to something we’ve been sharing for a while; course materials for learning about Linked Data. We originally designed this course for government departments working with data.gov.uk, refined based on our experience there and went on to deliver it to many teams throughout the BBC.

It’s now been delivered dozens of times to interested groups and inside companies with no previous knowledge who want to get into this technology fast.

In the spirit of sharing, the materials are freely available on the web and licensed under the Creative Commons Attribution License (CC-By).

Take a look and let us know what you think:

http://bit.ly/intro-to-web-of-data

We're hiring…

Fancy a job building great web apps? Interested in being an early part of publishing large amounts of data on the semantic web? Want to help build fantastically useful search interfaces to be used by millions of people? We’re hiring.

We’re looking for a Web Application Technical Lead who knows how to build great web interfaces and wants to get into the next wave of the web, Linked Data and the semantic web.

The role is to lead the development of Talis Prism, a flagship product for us and for our customers. Those customers are the biggest public and academic libraries in the UK, so Prism gets used by millions of people all over the country every day.

The job spec (pdf) gives you more detail, but one of the things we ask is that you take a pop at answering any two from the following three questions.

  1. Ensuring web applications work effectively across different browsers is hard. Explain how
    you would go about ensuring a web application functions correctly with Yahoo’s list of A-
    grade browsers, covering both development and testing approaches.
  2. URIs play a very significant role in the way a site appears on the web; WordPress blogs, for
    example, have a variety of URI schemes they can use. HttpRange-14 adds further
    implications for the use of # based URI schemes. Outline a URI scheme for a car dealership
    website and explain the trade-offs made?
  3. If you were asked to write a book based on your technical expertise, what would the title be
    and what chapters would it contain?

Now, because I’m really friendly (and because it’s my blog), I’ll give you some pointers on what we might be looking for.

With question 1, you’ve got to recognise that Prism is a SaaS product with a frequent release cycle, releasing to the live service once a month currently. That means any answer that talks about specs, manual test plans or requirement documents isn’t going to get you very far. Think about what you’d need to do if we wanted to do continuous deployment – from checkin to release in less than 30 minutes say?

On question 2 we’ll be looking for your understanding of how HTTP URIs work and how different choices work differently with browser caching, proxy servers and server-side code. If you don’t know what HttpRange-14 is then read the draft tag finding on dereferencing HTTP URIs. Take a look at How to Publish Linked Data on the Web.

Question 3, if it cropped up in a book on interviews and job applications, would be answered as “an ideal opportunity to re-present the information on your CV”. That’s because most people who interview haven’t really read your CV, so you have to say things several times. We will have read your CV, we’ll have gone through it with a fine tooth comb actually, checking all the dates and cross-referencing the technologies listed. We’ll have checked out all the sites and companies you list – even if you don’t give us links to them. We like to know who we’re interviewing, so we’ll have googled you and looked you up on Facebook, LinkedIn, Twitter and anywhere else we think you might hang out. Please don’t feel stressed about that, we’re not going to be upset if there’s a photo of you drunk at a party or if you once tweeted the F word. So, there’s no need for the book you’d write to be a game of buzzword bingo, we’re just curious about what excites and motivates you.

All in all though, we’re looking for great people to come and help us do great stuff. Get in touch!

Multi-Tenant Configuration Schema

Are you writing multi-tenant software? Are you using RDF at all? Do you want to keep track of your tenants?

You might want to comment on the first draft of the new Multi-Tenant Configuration Schema.

This schema attempts to describe a simple set of concepts and relationships about tenants within a multi-tenant software system. It avoids anything that would constitute application configuration, but will happily co-exist with classes and properties to do that. The documentation is sparse currently, awaiting questions and comment so that I can expand on areas that require further explanation. Comment here, or email me.

Ruby Mock Web Server

I spent the afternoon today working with Sarndeep, our very smart automated test guy. He’s been working on extending what we can do with rspec to cover testing of some more interesting things.

Last week he and Elliot put together a great set of tests using MailTrap to confirm that we’re sending the right mails to the right addresses under the right conditions. Nice tests to have for a web app that generates email in a few cases.

This afternoon we were working on a mock web server. We use a lot of RESTful services in what we’re doing and being able to test our app for its handling of error conditions is important. We’ve had a static web server set up for a while, this has particular requests and responses configured in it, but we’ve not really liked it because the responses are all separate from the tests and the server is another apache vhost that has to be setup when you first checkout the app.

So, we’d decided a while ago that we wanted to put in a little Ruby based web server that we could control from within the rspec tests and that’s what we built a first cut of this afternoon.

require File.expand_path(File.dirname(__FILE__) + "/../Helper")
require 'rubygems'
require 'rack'
require 'thin'
class MockServer
  def initialize()
    @expectations = []
  end
  def register(env, response)
    @expectations << [env, response]
  end
  def clear()
    @expectations = []
  end
  def call(env)
    #puts "starting call\n"
    @expectations.each_with_index do |expectation,index|
      expectationEnv = expectation[0]
      response = expectation[1]
      matched = false
      #puts "index #{index} is #{expectationEnv} contains #{response}\n\n"
      expectationEnv.each do |envKey, value|
        puts "trying to match #{envKey}, #{value}\n"
        matched = true
        if value != env[envKey]
          matched = false
          break
        end
      end
      if matched
        @expectations.delete_at(index)
        return response
      end
    end
    #puts "ending call\n"
  end
end
mockServer = MockServer.new()
mockServer.register( { 'REQUEST_METHOD' => 'GET' }, [ 200, { 'Content-Type' => 'text/plain', 'Content-Length' => '11' }, [ 'Hello World' ]])
mockServer.register( { 'REQUEST_METHOD' => 'GET' }, [ 200, { 'Content-Type' => 'text/plain', 'Content-Length' => '11' }, [ 'Hello Again' ]])
Rack::Handler::Thin.run(mockServer, :Port => 4000)

The MockServer implements the Rack interface so it can work within the Thin web server from inside the rspec tests. The expectations are registered with the MockServer and the first parameter is simply a hashtable in the same format as the Rack Environment. You only specify the entries that you care about, any that you don’t specify are not compared with the request. Expectations don’t have to occur in order (expect where the environment you give is ambiguous, in which case they match first in first matched).

As a first venture into writing more in Ruby than an rspec test I have to say I found it pretty sweet – There was only one issue with getting at array indices that tripped me up, but Ross helped me out with that and it was pretty quickly sorted.

Plans for this include putting in a verify() and making it thread safe so that multiple requests can come in parallel. Any other suggestions (including improvements on my non-idiomatic code) very gratefully received.

Resource Lists, Semantic Web, RDFa and Editing Stuff

Some of the work I’ve been doing over the past few months has been on a resource lists product that helps lecturers and students make best use of the educational material for their courses.

One of the problems we hoped to address really well was the editing of lists. Historically products that do this have been deemed cumbersome and difficult by academic staff who will often produce lists as simple documents in Word or the like.

We wanted to make an editing interface that really worked for the academic community so they could keep the lists as accurate and current as they wanted.

Chris Clarke, our Programme Manager, and Fiona Grieg, one of our pilot customers, describe the work in a W3C case study. Ivan Hermann then picks up on one of the way we decided to implement editing using RDFa within the HTML DOM. In the case study Chris describes it like this:

The interface to build or edit lists uses a WYSIWYG metaphor implemented in Javascript operating over RDFa markup, allowing the user to drag and drop resources and edit data quickly, without the need to round trip back to the server on completion of each operation. The user’s actions of moving, adding, grouping or editing resources directly manipulate the RDFa model within the page. When the user has finished editing, they hit a save button which serialises the RDFa model in the page into an RDF/XML model which is submitted back to the server. The server then performs a delta on the incoming model with that in the persistent store. Any changes identified are applied to the store, and the next view of the list will reflect the user’s updates.

This approach has several advantages. First, as Andrew says

One thing I hadn’t used until recently was RDFa. We’ve used it on one of the main admin pages in our new product and it’s made what was initially quite a complex problem much simpler to implement.

The problem that’s made simpler is this – WYSIWYG editing of the page was best done using DOM manipulation techniques, and most easily using existing libraries such as prototype. But what was being edited isn’t really the visual document, it is the underlying RDF model. Trying to keep a version of the model in a JS array or something in synch with the changes happening in the DOM seemed to be a difficult (and potentially bug-ridden) option.

By using RDFa we can distribute the model through the DOM and have the model updated by virtue of having updated the DOM itself. Andrew describes this process nicely:

Currently using Jeni Tennison’s RDFQuery library to parse an RDF model out of an XHTML+RDFa page we can mix this with our own code and end up with something that allows complex WYSIWYG editing on a reading list. We use RDFQuery to parse an initial model out of the page with JavaScript and then the user can start modifying the page in a WYSIWYG style. They can drag new sections onto the list, drag items from their library of bookmarked resources onto the list and re-order sections and items on the list. All this is done in the browser with just a few AJAX calls behind the scenes to pull in data for newly added items where required. At the end of the process, when the Save button is pressed, we can submit the ‘before’ and ‘after’ models to our back-end logic which builds a Changeset from before and after models and persists this to a data store on the Talis Platform.

Building a Changeset from the two RDF models makes quite a complex problem relatively straightforward. The complexity now just being in the WYSIWYG interface and the dynamic updating of the RDFa in the page as new items are added or re-arranged.

As Andrew describes, the editing starts by extracting a copy of the model. This allows the browser to maintain before and after models. This is useful as when the before and after get posted to the server the before can be used to spot if there have been editing conflicts with someone else doing a concurrent edit – this is an improvement to how Chris described it in the case study.

There are some gotchas in this approach though. Firstly, some of the nodes have two-way links:

<http://example.com/lists/foo> <http://purl.org/vocab/resourcelist/schema#contains> <http://example.com/items/bar>
<http://example.com/items/bar> <http://purl.org/vocab/resourcelist/schema#list> <http://example.com/lists/foo>

So that the relationship from the list to the item gets removed when the item is deleted from the DOM we use the @rev attribute. This allows us to put the relationship from the list to the item with the item, rather than with the list.

The second issue is that we use rdf:Seq to maintain the ordering of the lists, so when the order changes in the DOM we have to do a quick traversal of the DOM changing the sequence predicates (_1, _2 etc) to match the new visual order.

Neither of these were difficult problems to solve 🙂

My thanks go out to Jeni Tennison who helped me get the initial prototype of this approach working while we were at Swig back in Novemeber.