Schroedinger's WorldCat

Karen Calhoun and Roy Tennant od OCLC have recorded a podcast with Richard Wallis as part of the Talking with Talis series (disclosure: I work for Talis). The podcast discusses the recently published changes to OCLC’s record usage policy. I wrote about the legal aspects of OCLC’s change from guideline to policy before and why OCLC’s policy changes matter. It’s great that they’ve come on a podcast to talk about this stuff.

I do think it’s a shame though that this podcast didn’t form November’s Library 2.0 gang. There are several regulars on the gang who would have some great angles to pick up on in this discussion. I guess it just didn’t work out right in everyone’s diaries.

Broadly the content of the podcast covers the background to the change, the legal situation, how the policy may affect things like SaaS offerings, competitors to WorldCat, OCLC’s value-add, the non-profit status, OCLC’s role as “switch” for libraries on the web and finally some closing comments. This is an hour well-filled with insights into why the policy says what it says and why it says it how it does.

I’m going to start with Karen’s and Roy’s closing comments as they seem to be the most useful starting point to understanding the answers that precede them.

Roy – @54:09 : Yeah, well I just want to make it clear that really we are trying to make it easier for our member institutions to use their data in interesting new ways. To become more effective, more efficient. I think we’re backing that up with real services, were exposing their data to them in useful ways that can be processed by software. So I think this a good direction for us and I think the new policy is a part of that new direction.

Karen – @54:48 : Well I guess I would just like to re-iterate that we have tried to make the updated policy as open as it can possibly be. To make it possible to foster these innovative new uses for WorldCat data to make it underpin a whole process of exposing library collections in lots of places on the web, the basis for our being able to partner with many organisations, both commercial and non-commercial, to encourage that process of exposing library collections and helping libraries to stay strong. So we’ve had to balance that against some economic realities of where our funding comes from and the need to protect our ability to survive as an organisation. So it’s not perfect, it’s really far from perfect. It represents this kind of uncomfortable balancing act and our hope is that this updated policy will be merely a first step in being able to facilitate more partnerships and more sharing of data and further loosening our data sharing policies as the years go by, so I guess that’s how I’d like to close.

Roy is doing his best here. I’ve met him and talked about stuff. I like Roy and he’s smart. I suspect, from reading between the lines, that he thinks the only way to change OCLC is from the inside. The mere fact that Karen and Roy recorded a podcast on this stuff is a huge leap forward from the OCLC of a couple of years ago. But on this policy I feel he is misguided. There is a constraining factor to working with services like OCLC’s grid that means only member libraries can innovate and only in ways that happen to be facilitated by the grid services. They’re a piece of the puzzle, but only one piece. Making the entire database available for anyone to innovate on top of is another piece – and probably the most important piece if libraries are to be allowed to really innovate.

I agree with Ed Corrado’s wrapping up in his post Talis Podcast about OCLC WorldCat Record Use Policy with Karen Clahoun and Roy Tennant

I believe Roy and other OCLC employees when they say that want to make it possible for libraries to “use their data in interesting new ways to become more effective, more efficient.” Roy and the other people I know who work for OCLC really do care about libraries. I just don’t see how the policy does this. While the current people speaking on behalf of OCLC may want to approve as many WorldCat record use requests as possible, they may not always be the ones making the decisions. This is why I want as much of these rights enumerated in the policy, instead of hiding behind a request form that “OCLC reserves the right to accept or reject any proposed Use or Transfer of WorldCat Records which, in OCLC’s reasonably exercised discretion, does not conform to the Policy for Use and Transfer of WorldCat Records.”

Karen must be praised for her incredible candor in her closing remarks. The policy, she says, is far from perfect and has to be so in order to protect OCLC’s business position. You see, OCLC face the classic innovator’s dilemma. To truly innovate they must cannibalize their own revenue stream. Normally when faced with the innovator’s dilemma an established company faces the prospect of someone else innovating faster than they can. This is what OCLC fears and is trying to prevent. Keeping the data locked away gives them time to innovate by preventing anyone else from damaging their revenue stream before they’re ready. The question you have to ask is how long do libraries have left to innovate their way out of decline and is it long enough for the OCLC tanker to turn itself around?

Karen herself gives us an answer in the podcast. She refers back to OCLC’s 2005 Perceptions of Libraries Report in which they say that 84% of information seekers start with a search engine. Libraries are in danger of being marginalised in the web environment, she says. The context of the answer is a discussion about the need for OCLC to act a giant “internet switch” on the web, directing searches from the likes of Google Books to a local library.

In answer to the same question, why do libraries need a switch, Roy says:

Roy – @43:47 : I can’t imagine that search engines want a world where they go and crawl everyone’s library catalog and they end up with 5 million, you know, versions of Harry Potter. That just makes no sense whatsoever. I think from the perspective of both end users and search engines really what they are going to want is the kind of situation that we’ve been able to provide which is you know there is one place where you can go to for an item and then you get shunted down to the local library that has that item again very quickly and painlessly. I think back to the days when Gopher was around and the Veronica search engine and when people exposed their library catalogs that way it was horrifying. You would do a search in the system and you’d find a library in Australia had a book but you couldn’t do anything with that information. so I don’t think that’s the world we want to see necessarily. I wouldn’t want to see it.

Let’s just look at that opening sentence again.

Roy – @43:47 : I can’t imagine that search engines want a world where they go and crawl everyone’s library catalog…

Really? I would think that’s exactly what the search engines want. The web is a level playing field where anyone, anywhere can get the number one spot on any search engine. Not through being a big player with the budget to buy the top slot, but by being the most relevant result. Reconciling different references to the same concept is a core strength for the search engines. And that’s without even considering the disambiguation and clarification potential of web-based semantics. The switch that OCLC describe is an adequate way of addressing the problem that libraries have right now – a few dominant search engines, opacs that do not play nicely for search engines and a lot of the data in a central place at OCLC.

How does the OCLC model scale though? What about all the libraries that can’t be part of the OCLC game? OCLC wants to be the single, complete source for this data, but the barriers to entry (mostly cost) are too high for this to be possible. The barrier to publishing data on the web is very low, that’s one of the many great things about it. And seriously, Roy, are you really comparing the capabilities of Veronica with what Google, Yahoo and MSN do today? Have you seen SearchMonkey?

A few moments later, in response to a question about location information, Roy goes on to say

Roy – @45:12 : Oh boy, I’d sure like to see them try. I mean, again, I don’t think they’re even interested in that problem. Again, I don’t think they could do an effective job at it and I don’t think they would want to. You know the Google’s of the world are making deals with Amazon, you know, we’re not necessarily the folks that they really want to do business with. The fact that we’re big enough we can sit down and talk to them on behalf of our members I think is an important point. For us to think that individual libraries would have enough leverage to get that kind of attention I think is obviously ridiculous.

I’m not sure what Roy was getting at with this, but the search engines sure do seem interested both in little sites, like this blog and with location data and while the guys at OCLC are smart I’d put a whole heap of cash on Google being able to do location based search ranking a whole lot more effectively than they can. Not sure? Google has mapped all the hairdressers on the web, and will show me hairdressers local to Bournville. The are no doubt many reasons search engines aren’t doing this kind of thing and more for libraries – the quality of the data presented by the opac is one reason. The restrictive agreements data providers like OCLC put on the libraries is another. Both of these issues can be fixed. A monopoly player to centralize and restrict access to all the data is not a necessary component for libraries to be a valuable part of the web.

Following my earlier post on OCLC’s Intellectual Property claims I was looking forward to hearing what OCLC had to say on this. I know that Richard had many questions about this sent in following his request for questions. This was Karen’s response…

Karen – @17:14 : Well, I know from reading the guidelines, which is pretty much the extent of what I know, that the whole issue of the copyrighting of the database goes back to 1982. I’m really not familiar with that history and I don’t know a whole lot about Copyright Law, so I really don’t feel knowledgeable enough to talk about all the details around the copyrighting of the database. I do know that the copyright is on the compilation as a whole and that’s about the extent of what I know. I also don’t have a legal background so I just don’t feel like I’m qualified to answer that. I have been forwarding all of the questions and commentary about that to our legal department and they are working on those issues, I’m not sure what will come of that but they are working on the commentary and the questions that they got.

Richard then asks specifically about the 1982 Copyright date. That precedes the Feist Publications v. Rural Telephone Service that both myself and Jonathan Rochkind keep pointing out.

Karen – @18:56 : I don’t know a whole lot about it Richard. I can tell you, that having been the head of the database quality unit for so many years, OCLC makes a tremendous investment in WorldCat. It isn’t just a pile of records that we’ve gotten from the members. And I don’t mean to denigrate the value of those records in any way. As they come to OCLC and come into the database, over the years we have invested a very large effort in maintaining the quality of that database and even improving it. When I was in charge of the database quality group for example we wrote an algorithm, probably the world’s best algorithms at that time to automatically detect duplicate records and to merge them. It was an artificial intelligence approach at that time, very very state of the art, we also created a number of algorithmic methods for managing the forms of heading, doing automated authority control in WorldCat and we corrected millions of headings. Since my return I’ve become familiar with all the things that have come out of the office of research and been moved into production in worldcat that FRBRise the records in the database, that have created worldcat identities based on what we learned from ding that automated authority control back in the early 90s. so it’s really not the same database that we get that we get from members, it’s really much improved, we continue to do a huge amount of work to make the database as valuable as it is. So we have a stake, not just the members have a stake in worldcat, OCLC is a big stakeholder and a curator of the worldcat database.

As Vice President WorldCat and Metadata Services for OCLC I am saddenned that Karen should be so ill-prepared to answer questions on the intellectual property aspects of WorldCat. What is clear, though is that Karen is true to her word about her level of understanding. Copyright is a temporary monopoly, an exclusivity, granted to the creator of something original and expressive. Legislatures all over the world developed Copyright as a means of encouraging creative expression by protecting the creators ability to make a living from it for a period of time. Feist Publications v. Rural Telephone Service is a crucial case as it specifically addresses the compilation right that Karen refers to. Not only that but it specifically stated that the compilation was specifically not to reward the effort involved in collecting information, but to promote the progress of science and useful arts.

That is, the court does not want organizations to be able to monopolize data. They want people to be able to innovate freely.

What Karen describes is a vast amount of knowledge of the data and the domain and how to do fantastic stuff with it. Like I’ve said many, many times, OCLC has lots of smart people and they have an important part to play. I believe that part will earn them money, but there is no basis other than contract law under which they can prevent the propagation of the WorldCat data. That’s why they’re attempting to change the contract libraries operate under to include the previously voluntary guidelines.

But OCLCs business model is what needs to change, not its contract with libraries. It’s Schroedinger’s WorldCat, it is both alive and dead at the same time, and as long as they can keep the lid of the box shut nobody knows for sure which it is. The library world doesn’t need a cat in box, it needs a free cat.

There is so much more to talk about in this podcast. You have to listen to it. Also must read posts:

Annoyed Librarian: How I Learned to Stop Worrying and Love OCLC

To use a prison metaphor, it’s clear that librarians dropped the soap decades ago.

Karen Coyle’s Metalogue (the comments)

Jonathan Rochkind: more OCLC

The most important negative part of the policy, which it doesn’t sound like they discussed much in the interview (?) is that any use is prohibited which “substantially replicates the function, purpose, and/or size of WorldCat.” That means that clearly OCLC would deny permission for uses they believe to be such, but also that OCLC is asserting that with or without such an agreement, such use is prohibited, by libraries or by anyone else.

Ed Corrado: Talis Podcast about OCLC WorldCat Record Use Policy with Karen Clahoun and Roy Tennant

One of the key things that Karen and Roy repeated a few times during the podcast (and OCLC people have mentioned previously in other venues) is that the goal with this policy is to drive traffic to libraries museums and archives. They also have repeated that they hope it will make it easier for libraries, museums, and archives to use their data. It is not that I am not hearing them on this second point, but I still do not see how this “tiger’s role (territorial and instinctive)” approach accomplishes this.

Stefano Mazzocchi: Rule #1 for Surviving Paradigm Shifts: Don’t S**t Where You Eat

You could think of OCLC like a Wikipedia for library cards, but there is one huge difference: there is no freedom to fork. Basically, by using OCLC’s data you agree to protect their existence.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>