Putting Government Data online – Design Issues

Government data is being put online to increase accountability, contribute valuable information about the world, and to enable government, the country, and the world to function more efficiently. All of these purposes are served by putting the information on the Web as Linked Data. Start with the “low-hanging fruit”. Whatever else, the raw data should be made available as soon as possible. Preferably, it should be put up as Linked Data. As a third priority, it should be linked to other sources. As a lower priority, nice user interfaces should be made to it — if interested communities outside government have not already done it. The Linked Data technology, unlike any other technology, allows any data communication to be composed of many mixed vocabularies. Each vocabulary is from a community, be it international, national, state or local; or specific to an industry sector. This optimizes the usual trade-off between the expense and difficulty of getting wide agreement, and the practicality of working in a smaller community. Effort toward interoperability can be spend where most needed, making the evolution with time smoother and more productive.

from Tim Berners-Lee Putting Government Data online – Design Issues.

Official Google Research Blog: Large-scale graph computing at Google

from Official Google Research Blog: Large-scale graph computing at Google.

If you squint the right way, you will notice that graphs are everywhere. For example, social networks, popularized by Web 2.0, are graphs that describe relationships among people. Transportation routes create a graph of physical connections among geographical locations. Paths of disease outbreaks form a graph, as do games among soccer teams, computer network topologies, and citations among scientific papers. Perhaps the most pervasive graph is the web itself, where documents are vertices and links are edges. Mining the web has become an important branch of information technology, and at least one major Internet company has been founded upon this graph.

Just like Map/Reduce, Logic programming or OO, having more ways of thinking about a problem is a good thing :-)

Sir Tim Berners-Lee to advise the Government on public information delivery – PublicTechnology.net

From: Sir Tim Berners-Lee to advise the Government on public information delivery – PublicTechnology.net

The Prime Minister has announced the appointment of the man credited with inventing the World Wide Web, Sir Tim Berners-Lee as expert adviser on public information delivery. The announcement was part of a statement on constitutional reform made in the House of Commons this afternoon.

Sir Tim Berners-Lee, who is currently director of the World Wide Web Consortium which overseas the web’s continued development. He will head a panel of experts who will advise the Minister for the Cabinet Office on how government can best use the internet to make non-personal public data as widely available as possible.

He will oversee the work to create a single online point of access for government held public data and develop proposals to extend access to data from the wider public sector, including selecting and implementing common standards. He will also help drive the use of the internet to improve government consultation processes.

TimBL talked about this at TED2009 and the video is below:

This is fantastic news, of course. Ambitious timescales, following the lead of the Obama administration, opening up government data for re-use as well as public oversight. All very good things.

The technical challenges in doing this will be very interesting. First off, the service will undoubtedly by Linked Data – the pattern of the Semantic Web or Web of Data. TimBL has been describing the efforts of the Linked Open Data community as “the web done right” for some time now. Linked data is also the approach taken by the US administration and is really starting to gather pace just like the early days of the document web. That will be interesting to see as it’s a different discipline to developing a basic html site with a different set of balances and trade-offs in the data modeling, granularity, URI design and so on.

Second up will be scaling to meet the traffic demand. As both a high profile linked data service and UK government data it will be highly in demand from day one. Coping with peak traffic loads is not technically difficult as long as someone has their eye on that ball from the start. It’s likely that demand for this data will be global, at least from those exploring what has been published, so traffic could get very high indeed. One of the aspects that might make this easier is that it will almost certainly be read-only for the foreseeable future, and that allows far more flexibility (and simplicity) in the approach to scaling.

Talking of it being read-only… Being a high profile data-source there will need to be a focus on securing it, not to prevent access, but to prevent unauthorised changes. Given the current atmosphere surrounding MPs expense claims and the level of voting in the recent European parliament elections it seems obvious that this will be a target for disgruntled and technically adept individuals both here and abroad. The read-only nature of the service helps make this easier, as does the linked data approach as that is the same in many security respects to the web of documents we have today – that is, securing it is well understood.

Definitely a project to watch closely.

[Disclosure – I work for Talis, a software company that offers a semantic web platform for doing this kind of publishing]