Paget, MVC, RMR and why words matter

Thursday, December 18th, 2008 | Uncategorized | 3 Comments

I wrote a little while back about Pages, Screens and MVC. The motivation for the post was to help me explain why my thoughts around software and the web had changed over time. It also tied in nicely with Ian’s first post on Paget. The second round of Paget makes substantial changes and improves on the original design in many ways.

Following that Ian points us at Paul James’s post, Introducing the RMR Web Architecture. Ian says The Web is RMR not MVC and in the comments on that post we see some discussion about RMR being simply MVC using other names. We had a similar discussion over email internally.

Naming things is important to how we think about them, as Paul James says:

Alan, partly this is just a question of naming, but then a difference in naming can lead to a difference in thinking.

His second statement goes on to say what I said in Pages, Screen, MVC and note getting it.

As Ian points out above, it is more about binding actions to resources (models) rather than to controllers and of removing (or limiting) the need for routing. You say “The Controller is there to bind the system to HTTP”, but I feel that there should be no need for any binding as long as we work with HTTP to begin with rather than forcing our ways upon it.

Working with HTTP rather than forcing our ways upon it is very much the same thing as building something that is of the web rather than simply on the web.

The key reason I agree RMR is a better model is that in MVC the things we address with URIs are either views or controllers. As Andrew Davy comments:

I think the danger of MVC is that unless you explicitly use it as Alan does you default into an RPC design. (ending up with “URIs” like /customer/1/delete .. shudder!)

And this is the crucks of it for me - URIs are nouns, not verbs and they address resources and representations of resources.

By thinking about the problem in terms of RMR rather than MVC we naturally change the way we structure the code to provide different representations or how we map particular methods to code that handles them. RMR provides a way of talking about the problem that is of the web rather than of SmallTalk. RMR provides a language and a way of thinking that doesn’t obscure the mechanics of the web.

That seems like more than just a change of words to me.

Free book usage data from the University of Huddersfield » “Self-plagiarism is style”

Monday, December 15th, 2008 | Library Tech | No Comments

I’m very proud to announce that Library Services at the University of Huddersfield has just done something that would have perhaps been unthinkable a few years ago: we’ve just released a major portion of our book circulation and recommendation data under an Open Data Commons/CC0 licence. In total, there’s data for over 80,000 titles derived from a pool of just under 3 million circulation transactions spanning a 13 year period.

Free book usage data from the University of Huddersfield.

This is great to see from my POV on two levels, first a great set of data that has the potential to be really useful and secondly a clear statement of the terms under which it’s released.

Awesome Dave Pattern.

20 foot multi-touch screen

Friday, December 12th, 2008 | Interaction Design, Other Technical | 3 Comments

I haven’t posted anything on multitouch stuff for a while, mainly because I haven’t seen anything that’s really that new or exciting - I haven’t been looking too hard either, so feel free to correct me in the comments.

Today, though, Mike tweeted a link to a video of NUI Group building a 20′ multi-touch screen for an event in dubai.

pointers are awesome

Wednesday, December 10th, 2008 | Other Technical | 1 Comment

pointers are this video explaining pointers is awesome.

History, Context and Interpretation

Sunday, November 23rd, 2008 | Uncategorized | 1 Comment

I was talking about the US election with a friend recently. He’s a historian by degree, though not by trade. We were discussing the different ways in which you could choose to understand speeches by Obama, McCain, Biden and Palin and how much we really knew about their background.

The reason for the conversation was lost by the end, we started out trying to decide what McCain had meant by something he’d said. I can’t for the life of me remember what it was.

The discussion, though, was about having to understand the background of the candidates in order to be able to interpret what they were saying and what they intended to do. This is an interesting thing to think about when speaking as well. The reverse of what Humpty Dumpty said in Chapter 6 of Through the Looking Glass.

When I use a word,’ Humpty Dumpty said, in rather a scornful tone, `it means just what I choose it to mean — neither more nor less.

The reverse of this being that words mean whatever the listener decides they mean. Everything is based on context and experience. In philosophy the distinction is between knowledge that is true regardless of experience and knowledge that is true based on experience is A priori and a posteriori and what Humpty and Alice are discussing is basically the notion that there is no a priori definition of language - that words mean one thing to the speaker and another to the receiver.

It strikes me that the same is true for solutions in computing. Beauty is most definitely in the eye of the beholder. I have said, and believed, for a long time now that  particular technologies, techniques or approaches are not “better” in an absolute sense than others unless discussed in the context of how they apply to a particular problem. Even when looking at the application of, say a language, to a particular problem it’s not that one is better than another; merely that they have different tradeoffs. The common discussions about statically typed languages versus dynamically typed languages are a great example of this. Chris Smith wrote an excellent piece on what to know before debating type systems back in 2007.

I had a great conversation about some of this stuff with Daniel at work on Friday. We argued chatted for so long that he was late leaving and got the dreaded “where are you?” call on his mobile. Sorry Daniel.

We started off talking about Google Web Toolkit, GWT.

With Google Web Toolkit (GWT), you write your AJAX front-end in the Java programming language which GWT then cross-compiles into optimized JavaScript that automatically works across all major browsers. During development, you can iterate quickly in the same “edit - refesh - view” cycle you’re accustomed to with JavaScript, with the added benefit of being able to debug and step through your Java code line by line. When you’re ready to deploy, GWT compiles your Java source code into optimized, standalone JavaScript files. Easily build one widget for an existing web page or an entire application using Google Web Toolkit.

This is obviously worthy of consideration as the code comes from Google - so it must be good. But good for what? It strikes me as odd that you would want to develop an application in Java to be “compiled” into JavaScript. The approach to development promoted by the GWT folks is that you develop in Java running in the JVM, so you can debug your code, then compile down to JavaScript for deployment. This separation between the development and deployment execution environments obviously has to be handled carefully to keep everything working identically. This puts me off as it seems like unnecessary risk and complexity when writing an AJAX app in JavaScript has never seemed tough. So I wanted to understand who’s using GWT to try and understand what problem their using it to solve.

The post, from the start of this year, lists 8 interesting applications. Looking through them I see one of the first and most obvious things about them all is they’re software delivered into the browser. By that I mean that they’re windows-style GUI applications that happen to use the browser a convenient distribution mechanism. GWT makes a lot of sense in this context as it supports that conceptual model. If you want to write something that’s more native to the web, with Cool URIs, a RESTful interface and that works as a part of a larger whole then it may make less sense.

So the number one problem it solves is to abstract away the web so that delivering more traditional software interfaces into the browser is made easier. That seems like a sensible thing to do. What else is it trying to do? An insight into it’s philosophy can be gleaned, perhaps, from a post on the Google blog from August last year entitled Google Web Toolkit: Towards a better web.

Instead of spending time becoming JavaScript gurus and fighting browser quirks, developers using GWT spend time productively coding and debugging in the robust Java programming language, using their existing Java tools and expertise.

So, there are several things all wrapped up in that sentence. The implication that learning JavaScript is time consuming, that browsers have lots of quirks, that Java supports productive coding and that existing java tools and expertise are strong. All of those things will be true in some contexts and not in others. Given the lack-lustre take-up of GWT within Google perhaps it’s not true within Google.

On the other hand, listening to a couple of the GWT team present on it at Google Developer Day 2008 we get a different impression.

If you’re looking to deliver a fairly traditional gui within the browser and you’re happy and productive working in Java then GWT looks like a good tool, maybe it works well for more webby apps as well, but that’s not what they’re showcasing. But GWT is just one thing, and we make decisions about what technologies, techniques and approaches to adopt every single day. Like anything else, it’s all about context.

Schroedinger’s WorldCat

Tuesday, November 18th, 2008 | Uncategorized | No Comments

Karen Calhoun and Roy Tennant od OCLC have recorded a podcast with Richard Wallis as part of the Talking with Talis series (disclosure: I work for Talis). The podcast discusses the recently published changes to OCLC’s record usage policy. I wrote about the legal aspects of OCLC’s change from guideline to policy before and why OCLC’s policy changes matter. It’s great that they’ve come on a podcast to talk about this stuff.

I do think it’s a shame though that this podcast didn’t form November’s Library 2.0 gang. There are several regulars on the gang who would have some great angles to pick up on in this discussion. I guess it just didn’t work out right in everyone’s diaries.

Broadly the content of the podcast covers the background to the change, the legal situation, how the policy may affect things like SaaS offerings, competitors to WorldCat, OCLC’s value-add, the non-profit status, OCLC’s role as “switch” for libraries on the web and finally some closing comments. This is an hour well-filled with insights into why the policy says what it says and why it says it how it does.

I’m going to start with Karen’s and Roy’s closing comments as they seem to be the most useful starting point to understanding the answers that precede them.

Roy - @54:09 : Yeah, well I just want to make it clear that really we are trying to make it easier for our member institutions to use their data in interesting new ways. To become more effective, more efficient. I think we’re backing that up with real services, were exposing their data to them in useful ways that can be processed by software. So I think this a good direction for us and I think the new policy is a part of that new direction.

Karen - @54:48 : Well I guess I would just like to re-iterate that we have tried to make the updated policy as open as it can possibly be. To make it possible to foster these innovative new uses for WorldCat data to make it underpin a whole process of exposing library collections in lots of places on the web, the basis for our being able to partner with many organisations, both commercial and non-commercial, to encourage that process of exposing library collections and helping libraries to stay strong. So we’ve had to balance that against some economic realities of where our funding comes from and the need to protect our ability to survive as an organisation. So it’s not perfect, it’s really far from perfect. It represents this kind of uncomfortable balancing act and our hope is that this updated policy will be merely a first step in being able to facilitate more partnerships and more sharing of data and further loosening our data sharing policies as the years go by, so I guess that’s how I’d like to close.

Roy is doing his best here. I’ve met him and talked about stuff. I like Roy and he’s smart. I suspect, from reading between the lines, that he thinks the only way to change OCLC is from the inside. The mere fact that Karen and Roy recorded a podcast on this stuff is a huge leap forward from the OCLC of a couple of years ago. But on this policy I feel he is misguided. There is a constraining factor to working with services like OCLC’s grid that means only member libraries can innovate and only in ways that happen to be facilitated by the grid services. They’re a piece of the puzzle, but only one piece. Making the entire database available for anyone to innovate on top of is another piece - and probably the most important piece if libraries are to be allowed to really innovate.

I agree with Ed Corrado’s wrapping up in his post Talis Podcast about OCLC WorldCat Record Use Policy with Karen Clahoun and Roy Tennant

I believe Roy and other OCLC employees when they say that want to make it possible for libraries to “use their data in interesting new ways to become more effective, more efficient.” Roy and the other people I know who work for OCLC really do care about libraries. I just don’t see how the policy does this. While the current people speaking on behalf of OCLC may want to approve as many WorldCat record use requests as possible, they may not always be the ones making the decisions. This is why I want as much of these rights enumerated in the policy, instead of hiding behind a request form that “OCLC reserves the right to accept or reject any proposed Use or Transfer of WorldCat Records which, in OCLC’s reasonably exercised discretion, does not conform to the Policy for Use and Transfer of WorldCat Records.”

Karen must be praised for her incredible candor in her closing remarks. The policy, she says, is far from perfect and has to be so in order to protect OCLC’s business position. You see, OCLC face the classic innovator’s dilemma. To truly innovate they must cannibalize their own revenue stream. Normally when faced with the innovator’s dilemma an established company faces the prospect of someone else innovating faster than they can. This is what OCLC fears and is trying to prevent. Keeping the data locked away gives them time to innovate by preventing anyone else from damaging their revenue stream before they’re ready. The question you have to ask is how long do libraries have left to innovate their way out of decline and is it long enough for the OCLC tanker to turn itself around?

Karen herself gives us an answer in the podcast. She refers back to OCLC’s 2005 Perceptions of Libraries Report in which they say that 84% of information seekers start with a search engine. Libraries are in danger of being marginalised in the web environment, she says. The context of the answer is a discussion about the need for OCLC to act a giant “internet switch” on the web, directing searches from the likes of Google Books to a local library.

In answer to the same question, why do libraries need a switch, Roy says:

Roy - @43:47 : I can’t imagine that search engines want a world where they go and crawl everyone’s library catalog and they end up with 5 million, you know, versions of Harry Potter. That just makes no sense whatsoever. I think from the perspective of both end users and search engines really what they are going to want is the kind of situation that we’ve been able to provide which is you know there is one place where you can go to for an item and then you get shunted down to the local library that has that item again very quickly and painlessly. I think back to the days when Gopher was around and the Veronica search engine and when people exposed their library catalogs that way it was horrifying. You would do a search in the system and you’d find a library in Australia had a book but you couldn’t do anything with that information. so I don’t think that’s the world we want to see necessarily. I wouldn’t want to see it.

Let’s just look at that opening sentence again.

Roy - @43:47 : I can’t imagine that search engines want a world where they go and crawl everyone’s library catalog…

Really? I would think that’s exactly what the search engines want. The web is a level playing field where anyone, anywhere can get the number one spot on any search engine. Not through being a big player with the budget to buy the top slot, but by being the most relevant result. Reconciling different references to the same concept is a core strength for the search engines. And that’s without even considering the disambiguation and clarification potential of web-based semantics. The switch that OCLC describe is an adequate way of addressing the problem that libraries have right now - a few dominant search engines, opacs that do not play nicely for search engines and a lot of the data in a central place at OCLC.

How does the OCLC model scale though? What about all the libraries that can’t be part of the OCLC game? OCLC wants to be the single, complete source for this data, but the barriers to entry (mostly cost) are too high for this to be possible. The barrier to publishing data on the web is very low, that’s one of the many great things about it. And seriously, Roy, are you really comparing the capabilities of Veronica with what Google, Yahoo and MSN do today? Have you seen SearchMonkey?

A few moments later, in response to a question about location information, Roy goes on to say

Roy - @45:12 : Oh boy, I’d sure like to see them try. I mean, again, I don’t think they’re even interested in that problem. Again, I don’t think they could do an effective job at it and I don’t think they would want to. You know the Google’s of the world are making deals with Amazon, you know, we’re not necessarily the folks that they really want to do business with. The fact that we’re big enough we can sit down and talk to them on behalf of our members I think is an important point. For us to think that individual libraries would have enough leverage to get that kind of attention I think is obviously ridiculous.

I’m not sure what Roy was getting at with this, but the search engines sure do seem interested both in little sites, like this blog and with location data and while the guys at OCLC are smart I’d put a whole heap of cash on Google being able to do location based search ranking a whole lot more effectively than they can. Not sure? Google has mapped all the hairdressers on the web, and will show me hairdressers local to Bournville. The are no doubt many reasons search engines aren’t doing this kind of thing and more for libraries - the quality of the data presented by the opac is one reason. The restrictive agreements data providers like OCLC put on the libraries is another. Both of these issues can be fixed. A monopoly player to centralize and restrict access to all the data is not a necessary component for libraries to be a valuable part of the web.

Following my earlier post on OCLC’s Intellectual Property claims I was looking forward to hearing what OCLC had to say on this. I know that Richard had many questions about this sent in following his request for questions. This was Karen’s response…

Karen - @17:14 : Well, I know from reading the guidelines, which is pretty much the extent of what I know, that the whole issue of the copyrighting of the database goes back to 1982. I’m really not familiar with that history and I don’t know a whole lot about Copyright Law, so I really don’t feel knowledgeable enough to talk about all the details around the copyrighting of the database. I do know that the copyright is on the compilation as a whole and that’s about the extent of what I know. I also don’t have a legal background so I just don’t feel like I’m qualified to answer that. I have been forwarding all of the questions and commentary about that to our legal department and they are working on those issues, I’m not sure what will come of that but they are working on the commentary and the questions that they got.

Richard then asks specifically about the 1982 Copyright date. That precedes the Feist Publications v. Rural Telephone Service that both myself and Jonathan Rochkind keep pointing out.

Karen - @18:56 : I don’t know a whole lot about it Richard. I can tell you, that having been the head of the database quality unit for so many years, OCLC makes a tremendous investment in WorldCat. It isn’t just a pile of records that we’ve gotten from the members. And I don’t mean to denigrate the value of those records in any way. As they come to OCLC and come into the database, over the years we have invested a very large effort in maintaining the quality of that database and even improving it. When I was in charge of the database quality group for example we wrote an algorithm, probably the world’s best algorithms at that time to automatically detect duplicate records and to merge them. It was an artificial intelligence approach at that time, very very state of the art, we also created a number of algorithmic methods for managing the forms of heading, doing automated authority control in WorldCat and we corrected millions of headings. Since my return I’ve become familiar with all the things that have come out of the office of research and been moved into production in worldcat that FRBRise the records in the database, that have created worldcat identities based on what we learned from ding that automated authority control back in the early 90s. so it’s really not the same database that we get that we get from members, it’s really much improved, we continue to do a huge amount of work to make the database as valuable as it is. So we have a stake, not just the members have a stake in worldcat, OCLC is a big stakeholder and a curator of the worldcat database.

As Vice President WorldCat and Metadata Services for OCLC I am saddenned that Karen should be so ill-prepared to answer questions on the intellectual property aspects of WorldCat. What is clear, though is that Karen is true to her word about her level of understanding. Copyright is a temporary monopoly, an exclusivity, granted to the creator of something original and expressive. Legislatures all over the world developed Copyright as a means of encouraging creative expression by protecting the creators ability to make a living from it for a period of time. Feist Publications v. Rural Telephone Service is a crucial case as it specifically addresses the compilation right that Karen refers to. Not only that but it specifically stated that the compilation was specifically not to reward the effort involved in collecting information, but to promote the progress of science and useful arts.

That is, the court does not want organizations to be able to monopolize data. They want people to be able to innovate freely.

What Karen describes is a vast amount of knowledge of the data and the domain and how to do fantastic stuff with it. Like I’ve said many, many times, OCLC has lots of smart people and they have an important part to play. I believe that part will earn them money, but there is no basis other than contract law under which they can prevent the propagation of the WorldCat data. That’s why they’re attempting to change the contract libraries operate under to include the previously voluntary guidelines.

But OCLCs business model is what needs to change, not its contract with libraries. It’s Schroedinger’s WorldCat, it is both alive and dead at the same time, and as long as they can keep the lid of the box shut nobody knows for sure which it is. The library world doesn’t need a cat in box, it needs a free cat.

There is so much more to talk about in this podcast. You have to listen to it. Also must read posts:

Annoyed Librarian: How I Learned to Stop Worrying and Love OCLC

To use a prison metaphor, it’s clear that librarians dropped the soap decades ago.

Karen Coyle’s Metalogue (the comments)

Jonathan Rochkind: more OCLC

The most important negative part of the policy, which it doesn’t sound like they discussed much in the interview (?) is that any use is prohibited which “substantially replicates the function, purpose, and/or size of WorldCat.” That means that clearly OCLC would deny permission for uses they believe to be such, but also that OCLC is asserting that with or without such an agreement, such use is prohibited, by libraries or by anyone else.

Ed Corrado: Talis Podcast about OCLC WorldCat Record Use Policy with Karen Clahoun and Roy Tennant

One of the key things that Karen and Roy repeated a few times during the podcast (and OCLC people have mentioned previously in other venues) is that the goal with this policy is to drive traffic to libraries museums and archives. They also have repeated that they hope it will make it easier for libraries, museums, and archives to use their data. It is not that I am not hearing them on this second point, but I still do not see how this “tiger’s role (territorial and instinctive)” approach accomplishes this.

Stefano Mazzocchi: Rule #1 for Surviving Paradigm Shifts: Don’t S**t Where You Eat

You could think of OCLC like a Wikipedia for library cards, but there is one huge difference: there is no freedom to fork. Basically, by using OCLC’s data you agree to protect their existence.

More OCLC Policy…

Friday, November 14th, 2008 | Uncategorized | 3 Comments

There have been quite a few great posts and ideas circulating on how to respond to OCLC’s change in Record Usage Policy. The key thing is to act now - you really don’t have long to stop this if you want to. OCLC wish the new policy to come into effect in Spring 2009.

This change is important as it moves the policy from being guidelines, that member libraries are politely asked to adhere to, to part of your contract with OCLC, that can be enforced. That’s a major restriction on the members and far from being more open.

So, to the ideas… Let’s start with Aaron Swarz’s post. Aaron is one of the folks responsible for OpenLibrary, so has a significant stake in this. His post is entitled Stealing Your Library: The OCLC Powergrab is a great explanation of why you should care about this. He finishes by asking that we sign up to a petition to Stop the OCLC powergrab!

Next up we have various suggestions circulating about libraries providing their own licensing statements. For example, in More on OCLC’s policies Jonathan Rochkind suggests putting it in the 996, just like the OCLC 996 Policy Link.

Whether submitting the cataloging to OCLC, or anywhere else. Add your own 996 explaining that the data is released under Open Data Commons.

I had a little think about this and would suggest specifically using the ODC PDDL link as follows.

996 a ODCPDDL i This record is released under the Open Data Commons Public Domain Dedication and License. You are free to do with it as you please. u http://www.opendatacommons.org/odc-public-domain-dedication-and-licence/

Just like the OCLC policy link. This doesn’t go as far as Aaron asks, with his suggestion

Second, you put your own license on the records you contribute to OCLC, insisting that the entire catalog they appear in must be available under open terms.

The problem here is that there really isn’t any Intellectual Property in a MARC Record. It may take effort, skill and diligence to create a good quality record. Creative or original, in terms of Copyright, it is not.

The issue of OCLC claiming these rights in catalog data has even made it onto Slashdot where they’re covering how this Non-Profit Org Claims Rights In Library Catalog Data. Unfortunately the comments are the usual slashdot collection of ill-informed, semi-literate ramblings based on nothing more than a cursory glance of the post. Someone even appears to confuse OCLC and LoC in their response. ho hum.

Also worth mentioning is that Ryan Eby is keeping track of news and happenings with the OCLC Record Usage Policy on the Code4Lib Wiki.

Richard recorded a new Talking with Talis podcast yesterday. This will be posted on Talis’s Panlibus blog. I’ll be covering that as soon as I’ve had chance to listen to it properly.

If you’re on any mailing lists where this is being discussed, or spot other blog posts I should read then let me know in the comments.

OCLC, Record Usage, Copyright, Contracts and the Law

Thursday, November 6th, 2008 | IP Law, Internet Social Impact | 20 Comments
FUD truck by John Markos on Flickr

FUD truck by John Markos on Flickr

NB: This is my own blog. The opinions I publish do not necessarily reflect those of my employer. I am not a lawyer, but I did ask James Grimmelmann for his thoughts.

Over on Metalogue, Karen Calhoun has been clarifying OCLC’s thinking behind its intention to change the usage policy for records sourced from WorldCat. It’s great to see OCLC communicating this stuff, albeit a tad late given the furore that had already ensued. The question still remains though, are they right to be doing what they are?

Firstly, in the interest of full disclosure, let me make it perfectly clear that I work for Talis. I enjoy working for Talis and I agree with Talis’s vision. I have to say that because Karen is clearly not happy with us:

OCLC has been severely criticized for its WorldCat data sharing policies and practices. Some of these criticisms have come from people or organizations that would benefit economically if they could freely replicate WorldCat.

OCLC believe that Talis is one of those organisations, and we are. There are others too, LibraryThing, Reddit, OpenLibrary, Amazon, Google. Potentially many libraries could benefit too.

This isn’t the first time I’ve talked about OCLC’s business model. I wrote an open letter to Karen Calhoun some time ago, talking about the issues of centralised control. The same concerns raise themselves again now. I feel there are several mis-conceptions in what Karen writes that I would like to offer a different perspective on.

First off, OCLC has no right to do this. That sounds all moral and indignant. I don’t mean it that way. What I mean is, they have literally no right in law - or at least only a very limited one.

Karen talks a lot about Creative Commons in her note, it’s apparent that they even considered using a Creative Commons license

And yes, while we considered simply adopting a Creative Commons license, we chose to retain an OCLC-specific policy to help us re-express well-established community practice from the Guidelines.

There is an important thing to know about CC. Applying a Creative Commons License to data is utterly worthless. It may indicate the intent of the publisher, but has absolutely no legal standing. This is because CC is a license scheme based on Copyright. Data is not protected by Copyright. The courts settled this in Feist Publications v. Rural Telephone Service.

This means that when Karen Coombs asks for several rights for the data:

1. Perpetual use - once I’ve downloaded something from OCLC I’ve for the right to use it forever period end of story. This promotes a bunch of things including the LOCKSS principle in the event something happens to OCLC
2. Right to share - records I’ve downloaded I’ve got the right to share with others
This means share in any fashion which the library sees fit, be it Z39.50 access, SRU/W, OAI, or transmission of records via other means
3. Right to migrate format - Eventually, libraries may stop using MARC or need to move records into a non-MARC system. So libraries need the right to transform their records

it is simply a matter of the members telling OCLC that’s how it’s gonna be. For those not under contract with OCLC - you have these rights already!

Therein lies the nub of OCLC’s problem. In Europe the database would be afforded legal protection simply by virtue of having taken effort or investment to create, the so called sui-generis right. US law does not have any such protection for databases. I know this because I was heavily involved in the development of the Open Data Commons PDDL and a real-life lawyer told me.

So, other legal remedies that might be used to enforce the policy could include a claim for misappropriation - reaping where one has not sown. This would be under state, rather than federal, law. Though NBA v. Motorola suggests that misappropriation may only apply if for some reason OCLC were unable to continue their service as a result. James Grimmelmann tells me

RS: If I understand correctly that would mean the only option left for enforcing restrictions on the use of the data would be contractual. Have I missed something obvious?

JG: I could see a claim for misappropriation under state law — OCLC has invested effort in creating WorldCat, and unauthorized use would amount to “reaping where one has not sown,” in the classic phrase from INS  v. AP.  I doubt, however, that such a claim would succeed, since misappropriation law is almost completely preempted by copyright.  Recent statements of misappropriation doctrine (e.g., NBA v. Motorola) suggest that it might remain available only where the plaintiff’s service couldn’t be provided at all if the defendant were allowed to do what it’s doing.  I don’t think that applies here.  So you’re right, it’s only contractual.

Without any solid legal basis on which to build a license directly, the policy falls back to being simply a contract - and with any contract you can decide if you wish to accept it or not. That, I suspect, is why OCLC wish to turn the existing guidelines into a binding contract.

So, OCLC members have the choice as to whether or not they accept the terms of the contract, but what about OpenLibrary? Some have suggested that this change could scupper that effort due to the viral nature of the reference to the usage policy in the records ultimately derived from WorldCat.

Nonsense. This is a truck load of FUD created around the new OCLC policy. Those talking about this possibility are right to be concerned, of course, as that may well be OCLC’s intent, but it doesn’t hold water. Given that the only enforcement of the policy is as a contract, it is only binding on those who are party to the contract. If OpenLibrary gets records from OCLC member libraries the presence of the policy statement does not create a contract, so OpenLibrary would not be considered party to the contract and not subject to enforcement of it. That is, if they haven’t signed a contract with OCLC this policy means nothing to them. They are under no legal obligation to adhere to it.

This is why OCLC are insisting that everyone has an upfront agreement with them. They know they need a contract. James Grimmelmann, who confirmed my interpretations of US Law for me said this in his reply this morning

JG: Let me add that it is possible for entities that get records from entities that get records from OCLC to be parties to OCLC’s contracts; it just requires that everyone involved be meticulous about making everyone else they deal with agree to the contract before giving them records. But as soon as some entities start passing along records without insisting on a signature up front, there are players in the system who aren’t bound, and OCLC has no contractual control over the records they get.

Jonathan Rochkind also concludes that OCLC’s focus on Copyright is bogus:

All this is to say, the law has changed quite a bit since 1982. If OCLC is counting on a copyright, they should probably have their legal counsel investigate. I’m not a lawyer, it doesn’t seem good to me–and even if they did have copyright, I can’t see how this would prevent people from taking sets of records anyway, as long as they didn’t take the whole database. But I’m still not a lawyer.

This is OCLC’s fear, that the WorldCat will get out of the bag.

The comparisons with other projects that use licenses such as CC or GFDL, and even open-source licenses are also entirely without merit.

To understand why we have to understand the philosophy behind the use of licenses. In OCLC’s case the intention is to restrict the usage of the data in order to prevent competing services from appearing. In the case of wikipedia and open-source projects the use of licenses is there to allow the community to fork the project in order to prevent monopoly ownership - i.e. to allow competing versions to appear. There are many versions of Linux, the community is better for that, the good ones thrive and the bad ones die. When a good one goes bad others rise up to take its place, starting from a point just before things went bad. If this is what OCLC want they must allow anyone to take the data, all of it, easily and create a competing service - under the same constraints, that the competing service must also make its data freely available. That’s what the ODC PDDL was designed for.

The reason this works in practice is that these are digital goods, in economic terms that means they are non-rival - if I give you a copy I still have my own copy, unlike a rival good where giving it to you would mean giving it up myself. OCLC has built a business model based on the notion that its data is a rival good, but the internet, cheap computing and a more mature understanding shows that to be broken.

Jonathan Rochkind also talk about a difference in intent in criticising OCLC’s comparison with Creative Commons:

But there remains one very big difference between the CC-BY-NC license you used as a model, and the actual policy. Your actual policy requires some recipients of sharing to enter into an agreement with OCLC (which OCLC can refuse to offer to a particular entity). The CC-BY-NC very explicitly and intentionally does NOT require this, and even _removes_ the ability of any sharers to require this.

This is a very big difference, as the entire purpose of the CC licenses is to avoid the possibility of someone requiring such a thing. So your policy may be like CC-BY-NC, while removing it’s very purpose.

Striving to prevent the creation of an alternative database is anti-competitive, reduces innovation and damages the member libraries in order to protect OCLC corp.

Their [OCLC's record usage guidelines] stated rationale for imposing conditions on libraries’ record sharing is that “member libraries have made a major investment in the OCLC Online Union Catalog and expect other member libraries, member networks and OCLC to take appropriate steps to protect the database.”

This makes no sense. The investment has been made now. The money is gone. What matters now is how much it costs libraries to continue to do business. Those costs would be reduced by making the data a commodity. Several centralised efforts have the potential to do just that, but the internet itself has that potential too, a potential OCLC has been working against for a long time. Their fight has taken the form of asking member libraries and software authors like Terry Reese not to upset the status quo by facilitating easy access to the Z39.50 network and now this change to the policy.

What underlies this is a lack of trust in the members. OCLC know that if an alternative emerged its member libraries would move based on merit, and OCLC clearly doesn’t believe it could compete on that level playing field. They are saying that they require a monopoly position in order to be viable.

However, what’s good for members and what’s good for OCLC are not one and the same thing. Members’ investment would be better protected by ensuring that the data is as promiscuously copied as possible. If members were to force OCLC to release the entire database under terms that ensure anyone who takes a copy must also make that copy available to others under the same terms then competition and market would be created. Competition and market are what drive innovation both in features and in cost reduction. In fact, it would create exactly the kind of market that has caused US legislators to refuse a database right, repeatedly. Think about it.

Above all, don’t be fooled that this data is anything but yours. The database is yours. All of yours.

If WorldCat were being made available in its entirety like this, it would be entirely reasonable to put clauses in to ensure any union catalogs taking the WorldCat data had to also publish their data reciprocally. That route leads us to a point where a truly global set of data becomes possible - where World(Cat) means world rather than predominantly affluent American libraries.

Surely OCLC, with its expertise in service provision, its understanding of how to analyse this kind of data, its standing in the community and not to forget its substantial existing network of libraries and librarians would continue to carve out a substantial and prestigious role for itself?

I’ve met plenty of folks from OCLC and they’re smart. They’ll come up with plenty of stuff worth the membership fee - it just shouldn’t be the data you already own.

Orecchiette with Broccoli and Anchovies

Thursday, October 23rd, 2008 | Food | 4 Comments

I got a chain email forwarded a week or so ago that I came oh so close to actually joining in with… Then I remembered that no good can ever come of an email chain letter, no matter how good the idea.

So, I’ve decided to start an equivalent game of blog tag. The topic? One of my favourites - FOOD!

I’m going to share with you one of my all time favourite, quick, easy week-day recipes and then tag five others in the hope they’ll do the same.

Orecchiette with Broccoli and Anchovies (serves 2)

 

First things first - please don’t be afraid of the anchovies. The anchovies in here are in oil, not in salt, which means they’re not like the dry, salty, crap you get on pizzas. In this recipe they work with the chilli to create a wonderfully warm and comforting meal. There’s a heat and depth to the flavours that just makes you feel better about a bad day from the first mouthful to the last.

Ingredients

  • 250g Orecchiette pasta (or another short pasta, but the Orecchiette is best)
  • 1 medium head of broccoli, with a good main stalk
  • 2 garlic cloves
  • 50g can of anchovies in oil
  • A large pinch of dried chilli flakes
  • A substantial quantity of Parmesan, grated
  • A knob of butter

Method

  • Put a large pan of water on to boil, you need a pan large enough for the pasta and the broccoli florets, once chopped.
  • Drain a generous tablespoon of the oil from the anchovies into a large frying pan and set on a medium-low heat.
  • Chop the broccoli into small (one mouthful) florets and set aside.
  • Peel or wash the stalk, trim off any dry woody bits and finely chop.
  • Finely chop the garlic.
  • Roughly chop the anchovies (yes, all of them, honestly, trust me).
  • Set the pasta cooking, and set a timer for three minutes less than the cooking time.
  • Put the chopped broccoli stalks, garlic, anchovies and chilli in the frying pan.
  • Fry the anchovy and broccoli mix gently, stirring occasionally to stop it sticking. If it starts to dry out, turn the heat to low and cover.
  • Three minutes before the end of the pasta’s cooking time throw in the broccoli florets to cook.
  • Once cooked, drain the pasta and broccoli, holding back a little of the water and stir in with the anchovy and broccoli mix. Stir through some of the grated parmesan, drop in the knob of butter and spoon over a few tablespoons of the pasta water (you did remember to catch some didn’t you?).
  • Cover and leave for a minute.
  • Serve with more of the parmesan over the top and good twist of black pepper.

Drink

A deep red wine goes well with this, something with some fruit to it. Maybe a Merlot, Cabernet Sauvignon or Pinotage.

Tagging (as in the childrens’ game, aka tig in the UK)

Would the following people please step up and deliver a recipe and tag five further food-interested people:

  • mauvedeity Because I bet the heathen eats some really tasty stuff.
  • nadeem.shabir Because he keeps promising to share family recipes (but failing to deliver).
  • Sarah B. Because she’s a real foodie (hasn’t got kids, so has time and money to cook proper).
  • Ross Because an American perspective would be nice, maybe a nice pumpkin pie ;-).
  • Zach Because he lives in the countryside and might suggest something good to do with wild rabbit.

Please remember to comment/trackback here so we can follow the thread and to tag a further five victims :-)

Photo: Orecchiette with Broccoli and Anchovies by su-lin on Flickr, licensed under Creative Commons Attribution-Noncommercial-No Derivative Works 2.0 Generic

Betamax, VHS and RDF

Wednesday, October 22nd, 2008 | Random Thought | 1 Comment

I was chatting to a guy a few weeks ago, a Technical Account Manager at a reasonably good consultancy. We got chatting as we’re both “in IT”. I don’t actually consider myself to be “in IT” but that’s another story.

The conversation was somewhat one-sided, with this chap, let’s call him Harry, wanting to tell me all about what he does and his illustrious career with a wide range of technologies. He wasn’t interested in what I did, so I listened.

Harry explained how the consultancy he works for is doing pretty well, despite the economic situation. His group, a team of technology specialists, were not doing so well, however. Harry doesn’t understand why and we quickly moved on.

From not doing well Harry went on to detail his incredible career in technology. Putting in DEC equipment in the mid 80s (when everyone else was putting in PCs), networking several companies with Token Ring (in the late 90s when everyone else was putting in Ethernet), setting up large internal data centres based on Novell and/or IBM OS/2 (when everyone else was putting in Windows). Harry had even thrown out early copies of Microsoft Office in one company to put in Lotus 123 and AmiPro. Great decisions, choosing best-of-breed solutions from great suppliers.

The consistency of these “wrong” decisions seemed to have passed Harry by as he was saying how all of these technologies were “the best”, but were subsequently beaten in the marketplace by inferior products. I suspect Harry still has a Betamax video recorder tucked away somewhere.

What’s common across all of the products that succeeded is that they are superior in some way that the market defines, not in the way that Harry defined. They were successful in many respects simply because they were successful. That is, success begets success.

Many people are highly skeptical about the Semantic Web and RDF in particular, but in large part it seems to be in roughly the state the web was in the very early 90s. One of the browsers (Tabulator) is something that Tim Berners-Lee has written and is touting around as an example of what could be done, sites on the Semantic Web can still (just) be drawn on a single slide and lots of people are still looking at RDF and saying “it won’t work”.

But all of that misses the point. It will be successful if we make it successful. That is, it lives and dies not by how it compares to other approaches of representing data, but by how many people publish stuff this way.

Search

Right Now (ish)

Meta