OTTO - Controllerism Instrument at djtechtools.com

Thursday, July 2nd, 2009 | Interaction Design | No Comments

from OTTO - Controllerism Instrument at djtechtools.com:

Controllerism continues to take small leaps forward as the software and techniques improve but the giant steps are going to happen in the realm of performance interfaces. Without a solid controller surface that has been designed to play like an instrument we wont be able to leave the realm of noodling and enter the fabled land of flow.

Specialised controllers continue to evolve and this is a really interesting, focussed, loop controller.

Why hash tags are broken, and ideas for what to do instead.

Wednesday, July 1st, 2009 | Internet Technical, Semantic Web | 10 Comments

I was at Moseley Bar Camp last Sunday and there were some great sessions. Andy Mabbett stood up to lead a discussion entitled Let’s Play Tag: recent developments and emerging issues in the use of tagging for added semantic richness.

Andy was looking for discussion on how to solve the problem of ambiguity in hash tags - a popular technique for categorising community tweets on twitter. His example is classic event tagging, the tag for the event was #mbcamp which works fine for the duration of a Sunday afternoon event, but what if you want tags to be more enduring?

Andy took us step-by-step through the issue of ambiguity of usernames as tags on twitter and flickr and described some of the issues of differing tag normalisation rules.

Andy also asked why we tag?

  • To add semantic richness?
  • To help your friends find stuff?
  • To help machines in 100 years find stuff
  • Don’t know

I tag for all of those reasons, but not on twitter. On twitter I use hash tags to contribute to an in the moment conversation that’s happening at a particular event or on something topical.

Andy’s issue, then, is with the value of these tags longer term and on more enduring stuff like blog posts, photos on flickr and so on. Perhaps 100 years might be pushing it, but it’s worth thinking about.

The problem with hash tags comes from the tension between finding something specific enough for the moment, something short enough to not use up too many of the 140 characters and something easy to remember. That’s two forces pulling one-way (shorter) and only one pulling the other.

The shorter the tag goes the easier it is to remember and to type, and the fewer character it uses up, but it also becomes more likely to clash with others. Perhaps some mainstream trends might get away with very short tags, I thought. #fb for example means facebook, surely, but looking at the use of it apparently the references to facebook are far outweighed by the noise.

So, twitter’s 140 character limit and the profusion of clients means we can only have short, easy to remember text tags, but the need for disambiguation and to be more specific means we need something longer.

We could solve the ambiguity problem by using something like a guid, but that’s not easy to remember or type, and is generally quite long. The length issue could be solved by encoding it using unicode characters. Twitter counts multi-byte UTF8 characters as single characters, which is correct, and this opens up some interesting unique tags for those willing to forego the easy typing.

By long I mean cf629dc3-d425-4707-8119-1f35d35d7687 which is a fairly typical GUID and is 36 character long. That’s too long if you only have 140 characters to play with. The length comes from the need to encode it as ASCII. Twitter, where our length obsession comes from, doesn’t require characters to be ASCII. The 140 character limit is for 140 UTF8 characters, so we can use a much greater range of characters to represent the same degree of uniqueness in a shorter UTF8 string.

UTF8 isn’t ideal as a starting point, though, as the number of bytes per character varies. The unicode definition uses nice simple 2 byte indexes, so we match 4 ASCII characters from the GUID to a unicode character, then use the UTF8 encoding for those to write it down. By using unicode and UTF8 it becomes just a handful of characters, just 8 for this GUID.

cf62 콢, 9dc3 鷃, d425 퐥, 4707 䜇, 8119 脙, 1f35 ἵ, d35d 퍝, 7687 皇

This gives us a tag of #콢鷃퐥䜇脙ἵ퍝皇 which is not easy to type, would be difficult for many to visually identify and could, for all I know, be extremely offensive to those who read CJK, Hangul or Greek. I may have got lucky with that GUID too, there may be GUIDs that don’t produce valid unicode pairs.

But, as it’s a GUID it gives a very high confidence that is unique, it’s only 8 characters long and works as a unicode tag on Flickr and a unicode tag on Hashtags. Just don’t look at the raw URLs in the source of the page…

What we lose with that approach is a good deal of ease-of-use. I certainly wouldn’t try this technique at an event.

If you’re prepared to lose a little usability, maybe giving people an easy place to grab a copy/paste version of the tag then you could produce something more easily readable, if not easy to type: #dɯɐɔqɯ for example. I might be tempted to do that, or add a graphic symbol or something.

There’s something else that nags at me about hash tags, though. They’re really not very webby. You rely on search and on hashtags.org and other specific tools to make sense of them. They can be easily abused, as Habitat showed recently.

So are there other ways to think about tagging? Ways that work with the web rather than just on the web. Examples from those applications where the 140 character limit does not apply? Blog posts, web pages, flickr images and so on?

What if we decided that our requirements for tagging were:

  1. A very high degree of uniqueness
  2. Anyone can get information about the tag easily
  3. Spam and content visible on the tag controlled by the tag owner
  4. That the tag can be enduring
  5. That the tag can be used anywhere on the web easily
  6. That content using the tag can be found with search
  7. That content using the tag can be found without search
  8. That no particular service or piece of software is necessary

In it’s essence, tagging is about saying this comment, blog post or image is about this event, concept, product etc. In the blogging world it’s very common to say this post is about the content in this other post. We do that through trackbacks and through simple links. Many blogs accept trackbacks and look at the referring page information so that they can provide links, alongside comments, to other posts referring to them.

A similar things happens with Google’s PageRank algorithm. Words used in links to a page, as well as the content of the referring page, contribute to the way a page is indexed.

The Semantic Web bases everything on URIs (the difference between URI and URL is not important here). If you want to give something a name you don’t pick a word, you use a URI.

I wonder if we could use URIs as tags? And how that would meet the needs above. Say we were to use http://wxwm.org.uk/moseleybarcamp/2009/June to mean the event that happened last weekend.

It has a very high degree of uniqueness, so it meets our first requirement. It can be put straight into a browser and can provide a page giving details of the event, so it’s easy for anyone to get information about the tag. The page at that address can be as clever, or as dumb, as it likes about showing things that link to it - so tag spam can be removed. The link is under control of the domain owner, so can be as enduring as you want to make it. Almost everywhere on the web allows you to post links, so it’s easy to use. Links to a specific URL can be easily searched for in Google and other search engines, and in Flickr and Twitter. Most browsers will send referring page information when requesting the URL, so content can be tracked without search - this means you can find out about unindexed and intranet sites referencing the tag. The URL can be a static page, or a script, it can monitor referrers and spam filter - or not. There is not centralised service needed nor any specific software.

Oh, and it could easily be made to work as Linked Data, the pattern for publishing data on the semantic web, to provide machine-readable information about the event and the conversation happening around it…

I think that only leaves the issue of URI length. I can’t get close to the 8 characters of the guid, or the 6 of mbcamp, but using bit.ly I can make a memorable short URL such as http://bit.ly/utf8tag that redirects to a much longer one, and as bit.ly don’t re-use URLs the bit.ly link remains as unique and almost as enduring (subject to bit.ly’s survival) as your own.

Putting Government Data online - Design Issues

Wednesday, June 24th, 2009 | Internet Social Impact, Open Data, Semantic Web | No Comments

Government data is being put online to increase accountability, contribute valuable information about the world, and to enable government, the country, and the world to function more efficiently. All of these purposes are served by putting the information on the Web as Linked Data. Start with the “low-hanging fruit”. Whatever else, the raw data should be made available as soon as possible. Preferably, it should be put up as Linked Data. As a third priority, it should be linked to other sources. As a lower priority, nice user interfaces should be made to it — if interested communities outside government have not already done it. The Linked Data technology, unlike any other technology, allows any data communication to be composed of many mixed vocabularies. Each vocabulary is from a community, be it international, national, state or local; or specific to an industry sector. This optimizes the usual trade-off between the expense and difficulty of getting wide agreement, and the practicality of working in a smaller community. Effort toward interoperability can be spend where most needed, making the evolution with time smoother and more productive.

from Tim Berners-Lee Putting Government Data online - Design Issues.

STI International - Service Web 3.0 - The Future Internet Video - Quicktime - medium

This video explains really well what I’ve been doing the past few years at Talis.

the original can be found at STI International - Service Web 3.0 - The Future Internet Video - Quicktime - medium.

Official Google Research Blog: Large-scale graph computing at Google

Thursday, June 18th, 2009 | Software Engineering | No Comments

from Official Google Research Blog: Large-scale graph computing at Google.

If you squint the right way, you will notice that graphs are everywhere. For example, social networks, popularized by Web 2.0, are graphs that describe relationships among people. Transportation routes create a graph of physical connections among geographical locations. Paths of disease outbreaks form a graph, as do games among soccer teams, computer network topologies, and citations among scientific papers. Perhaps the most pervasive graph is the web itself, where documents are vertices and links are edges. Mining the web has become an important branch of information technology, and at least one major Internet company has been founded upon this graph.

Just like Map/Reduce, Logic programming or OO, having more ways of thinking about a problem is a good thing :-)

Sir Tim Berners-Lee to advise the Government on public information delivery - PublicTechnology.net

From: Sir Tim Berners-Lee to advise the Government on public information delivery - PublicTechnology.net

The Prime Minister has announced the appointment of the man credited with inventing the World Wide Web, Sir Tim Berners-Lee as expert adviser on public information delivery. The announcement was part of a statement on constitutional reform made in the House of Commons this afternoon.

Sir Tim Berners-Lee, who is currently director of the World Wide Web Consortium which overseas the web’s continued development. He will head a panel of experts who will advise the Minister for the Cabinet Office on how government can best use the internet to make non-personal public data as widely available as possible.

He will oversee the work to create a single online point of access for government held public data and develop proposals to extend access to data from the wider public sector, including selecting and implementing common standards. He will also help drive the use of the internet to improve government consultation processes.

TimBL talked about this at TED2009 and the video is below:

This is fantastic news, of course. Ambitious timescales, following the lead of the Obama administration, opening up government data for re-use as well as public oversight. All very good things.

The technical challenges in doing this will be very interesting. First off, the service will undoubtedly by Linked Data - the pattern of the Semantic Web or Web of Data. TimBL has been describing the efforts of the Linked Open Data community as “the web done right” for some time now. Linked data is also the approach taken by the US administration and is really starting to gather pace just like the early days of the document web. That will be interesting to see as it’s a different discipline to developing a basic html site with a different set of balances and trade-offs in the data modeling, granularity, URI design and so on.

Second up will be scaling to meet the traffic demand. As both a high profile linked data service and UK government data it will be highly in demand from day one. Coping with peak traffic loads is not technically difficult as long as someone has their eye on that ball from the start. It’s likely that demand for this data will be global, at least from those exploring what has been published, so traffic could get very high indeed. One of the aspects that might make this easier is that it will almost certainly be read-only for the foreseeable future, and that allows far more flexibility (and simplicity) in the approach to scaling.

Talking of it being read-only… Being a high profile data-source there will need to be a focus on securing it, not to prevent access, but to prevent unauthorised changes. Given the current atmosphere surrounding MPs expense claims and the level of voting in the recent European parliament elections it seems obvious that this will be a target for disgruntled and technically adept individuals both here and abroad. The read-only nature of the service helps make this easier, as does the linked data approach as that is the same in many security respects to the web of documents we have today - that is, securing it is well understood.

Definitely a project to watch closely.

[Disclosure - I work for Talis, a software company that offers a semantic web platform for doing this kind of publishing]

data and anti-data

php -r “include ‘moriarty/moriarty.inc.php’; include ‘moriarty/changeset.class.php’; \$data=file_get_contents(’megarecord.rdf.xml’); \$cs = new ChangeSet(array(’before’=> \$data)) ; echo \$cs->to_rdfxml();” > removal_changeset.rdf.xml

The Evolution of Cell Phone Design Between 1983-2009 | Webdesigner Depot

Friday, May 22nd, 2009 | Interaction Design | No Comments

Cell phones have evolved immensely since 1983, both in design and function.

From the Motorola DynaTAC, that power symbol that Michael Douglas wielded so forcefully in the movie “Wall Street”, to the iPhone 3G, which can take a picture, play a video, or run one of the thousands applications available from the Apple Store.

There are thousands of models of cell phones that have hit the streets between 1983 and now.

We’ve picked a few of the more popular and unusual ones to take you through the history of this device that most of us consider a part of our everyday lives.

from The Evolution of Cell Phone Design Between 1983-2009 | Webdesigner Depot.

Scripting and Development for the Semantic Web (SFSW2009)

Tuesday, May 19th, 2009 | Internet Technical, Semantic Web | No Comments

The following papers have been accepted for SFSW2009:

from Scripting and Development for the Semantic Web (SFSW2009).

Looks like a great line-up. As neither Nad, Jeni nor I are able to attend our paper will be presented (briefly) by Chris Clarke.

Multi-Tenant Configuration Schema

Are you writing multi-tenant software? Are you using RDF at all? Do you want to keep track of your tenants?

You might want to comment on the first draft of the new Multi-Tenant Configuration Schema.

This schema attempts to describe a simple set of concepts and relationships about tenants within a multi-tenant software system. It avoids anything that would constitute application configuration, but will happily co-exist with classes and properties to do that. The documentation is sparse currently, awaiting questions and comment so that I can expand on areas that require further explanation. Comment here, or email me.

Search

Right Now (ish)

Meta