The records in a university library catalogue typically have many different origins: created by the library, obtained from a national library or a book supplier etc. So, who ‘owns’ them? And what are the legal implications of making them available to others when this involves copying, transferring them into different formats, etc.?
The JISC has just commissioned a study to explore some of these issues as they apply to UK university libraries and to provide practical guidance to library managers who may be interested in making their catalogue records available in new ways. Outcomes are expected by the end of 2009.
The proceedings jumped the line to farce when Fritz Attaway and a colleague from the MPAA pulled out a cinematic demonstration of just how to camcord a movie from your television screen. (You start with a $900 HD video camera, a tripod, a flat-screen television, and a room that can be completely darkened.) Tim Vollmer captured the whole scene on a video of his own. Mind you, this is the same industry that has lobbied to make a crime of camcording in movie theaters, telling us how to frame shots properly from the television. (As Fred Benenson notes, they’re also demonstrating DRM’s impossibility of closing the “analog hole.”)
In the short run, the Google Book Search settlement will unquestionably bring about greater access to books collected by major research libraries over the years. But it is very worrisome that this agreement, which was negotiated in secret by Google and a few lawyers working for the Authors Guild and AAP (who will, by the way, get up to $45.5 million in fees for their work on the settlement—more than all of the authors combined!), will create two complementary monopolies with exclusive rights over a research corpus of this magnitude. Monopolies are prone to engage in many abuses.
The Book Search agreement is not really a settlement of a dispute over whether scanning books to index them is fair use. It is a major restructuring of the book industry’s future without meaningful government oversight. The market for digitized orphan books could be competitive, but will not be if this settlement is approved as is.
A court in Sweden has jailed four men behind The Pirate Bay (TPB), the world’s most high-profile file-sharing website, in a landmark case.
The French passed a law that forces ISPs to withdraw internet access based upon accusations of infringement by Copyright holders.
This is what many in New Zealand protested about and got delayed if not completely withdrawn.
Update: The NZ government have suspended the introduction of Section 92a (via MiramarMike)
An interesting campaign to ‘blackout’ your online presence to campaign for change to one of NZ’s clauses started today. Protest Against Guilt Upon Accusation Laws in NZ — Creative Freedom Foundation (creativefreedom.org.nz). I spotted this a few days ago thanks to Mike Brown who tweeted about it.
What’s interesting about the law is that it changes the presumption of guilt quite significantly. Currently in most Copyright jurisdictions if someone is infringing your copyright then the first thing you’d do (after asking them politely to stop) is take out an injunction against them. This involves persuading the court that you have enough of a case that the (alleged) infringer should be told to stop until the case is heard. The bar for getting an injunction is, then, quite high.
What Section 92 of the Copyright Amendment Act does is compels ISPs (and there is a broad definition of that term in the law) to take down sites or revoke internet access when an accusation of infringement is made. The clause looks like this:
Internet service provider liability
92A Internet service provider must have policy for terminating accounts of repeat infringers
- (1) An Internet service provider must adopt and reasonably implement a policy that provides for termination, in appropriate circumstances, of the account with that Internet service provider of a repeat infringer.
- (2) In subsection (1), repeat infringer means a person who repeatedly infringes the copyright in a work by using 1 or more of the Internet services of the Internet service provider to do a restricted act without the consent of the copyright owner.
The potential downsides of a law like this are many, but one of the biggest is the impact it is likely to have on fair-use. Fair use is not explicitly defined and is clarified by case law, though some common examples are often postulated – parody, criticism, illustration are often quoted. There are also tests around the commercial impact of the use.
What this means is that Copyright is not absolute, it’s a negotiation between creators and the state to strike a balance that is most effective for the country’s cultural and economic prosperity. This clause changes that for internet based uses by preventing that negotiation and also by makes people more fearful by increasing the immediate penalty for an accusation of infringement from very little to the loss of internet service. That could be enough to close many small businesses.
The reasoning behind the bill is one of practicality. Those with large catalogs of Copyright works, such as the music labels, are having a really tough time preventing copying on the internet (because teh internet is one big copying machine). The reason is that the current laws make pursuing people difficult and expensive as the RIAA have found out in the states. The solution in Section 92, though, may be a little heavy handed. ISPs are likely to comply with the law, and the cheapest thing for them to do is simply take down anything they’re asked to. ISPs are a commodity, they don’t have big profit margins to use up helping you keep your content up online.
It is easier for ISPs, Internet Service Providers, to cut off anyone who might be breaking the law.
Now, this seems to be a more and more common perception. That it would be too much trouble to ask a copyright holder to file suit and that ISPs look perfectly placed to handle issues. What that misses though is that ISPs are not at all equipped to perform any kind of arbitration, so with an individual customer on one side and a large, wealthy corporate lawyer on the other the ISP will always play it safe.
If this were happening in the US then I wouldn’t even have blogged about it, but it seems odd to me that this is happening at almost exactly the same time, and in the same city, as Webstock, one of the best web conferences in the world.
NB: This is my own blog. The opinions I publish do not necessarily reflect those of my employer. I am not a lawyer, but I did ask James Grimmelmann for his thoughts.
Over on Metalogue, Karen Calhoun has been clarifying OCLC’s thinking behind its intention to change the usage policy for records sourced from WorldCat. It’s great to see OCLC communicating this stuff, albeit a tad late given the furore that had already ensued. The question still remains though, are they right to be doing what they are?
Firstly, in the interest of full disclosure, let me make it perfectly clear that I work for Talis. I enjoy working for Talis and I agree with Talis’s vision. I have to say that because Karen is clearly not happy with us:
OCLC has been severely criticized for its WorldCat data sharing policies and practices. Some of these criticisms have come from people or organizations that would benefit economically if they could freely replicate WorldCat.
This isn’t the first time I’ve talked about OCLC’s business model. I wrote an open letter to Karen Calhoun some time ago, talking about the issues of centralised control. The same concerns raise themselves again now. I feel there are several mis-conceptions in what Karen writes that I would like to offer a different perspective on.
First off, OCLC has no right to do this. That sounds all moral and indignant. I don’t mean it that way. What I mean is, they have literally no right in law – or at least only a very limited one.
Karen talks a lot about Creative Commons in her note, it’s apparent that they even considered using a Creative Commons license
And yes, while we considered simply adopting a Creative Commons license, we chose to retain an OCLC-specific policy to help us re-express well-established community practice from the Guidelines.
There is an important thing to know about CC. Applying a Creative Commons License to data is utterly worthless. It may indicate the intent of the publisher, but has absolutely no legal standing. This is because CC is a license scheme based on Copyright. Data is not protected by Copyright. The courts settled this in Feist Publications v. Rural Telephone Service.
This means that when Karen Coombs asks for several rights for the data:
1. Perpetual use – once I’ve downloaded something from OCLC I’ve for the right to use it forever period end of story. This promotes a bunch of things including the LOCKSS principle in the event something happens to OCLC
2. Right to share – records I’ve downloaded I’ve got the right to share with others
This means share in any fashion which the library sees fit, be it Z39.50 access, SRU/W, OAI, or transmission of records via other means
3. Right to migrate format – Eventually, libraries may stop using MARC or need to move records into a non-MARC system. So libraries need the right to transform their records
it is simply a matter of the members telling OCLC that’s how it’s gonna be. For those not under contract with OCLC – you have these rights already!
Therein lies the nub of OCLC’s problem. In Europe the database would be afforded legal protection simply by virtue of having taken effort or investment to create, the so called sui-generis right. US law does not have any such protection for databases. I know this because I was heavily involved in the development of the Open Data Commons PDDL and a real-life lawyer told me.
So, other legal remedies that might be used to enforce the policy could include a claim for misappropriation – reaping where one has not sown. This would be under state, rather than federal, law. Though NBA v. Motorola suggests that misappropriation may only apply if for some reason OCLC were unable to continue their service as a result. James Grimmelmann tells me
RS: If I understand correctly that would mean the only option left for enforcing restrictions on the use of the data would be contractual. Have I missed something obvious?
JG: I could see a claim for misappropriation under state law — OCLC has invested effort in creating WorldCat, and unauthorized use would amount to “reaping where one has not sown,” in the classic phrase from INS v. AP. I doubt, however, that such a claim would succeed, since misappropriation law is almost completely preempted by copyright. Recent statements of misappropriation doctrine (e.g., NBA v. Motorola) suggest that it might remain available only where the plaintiff’s service couldn’t be provided at all if the defendant were allowed to do what it’s doing. I don’t think that applies here. So you’re right, it’s only contractual.
Without any solid legal basis on which to build a license directly, the policy falls back to being simply a contract – and with any contract you can decide if you wish to accept it or not. That, I suspect, is why OCLC wish to turn the existing guidelines into a binding contract.
So, OCLC members have the choice as to whether or not they accept the terms of the contract, but what about OpenLibrary? Some have suggested that this change could scupper that effort due to the viral nature of the reference to the usage policy in the records ultimately derived from WorldCat.
Nonsense. This is a truck load of FUD created around the new OCLC policy. Those talking about this possibility are right to be concerned, of course, as that may well be OCLC’s intent, but it doesn’t hold water. Given that the only enforcement of the policy is as a contract, it is only binding on those who are party to the contract. If OpenLibrary gets records from OCLC member libraries the presence of the policy statement does not create a contract, so OpenLibrary would not be considered party to the contract and not subject to enforcement of it. That is, if they haven’t signed a contract with OCLC this policy means nothing to them. They are under no legal obligation to adhere to it.
This is why OCLC are insisting that everyone has an upfront agreement with them. They know they need a contract. James Grimmelmann, who confirmed my interpretations of US Law for me said this in his reply this morning
JG: Let me add that it is possible for entities that get records from entities that get records from OCLC to be parties to OCLC’s contracts; it just requires that everyone involved be meticulous about making everyone else they deal with agree to the contract before giving them records. But as soon as some entities start passing along records without insisting on a signature up front, there are players in the system who aren’t bound, and OCLC has no contractual control over the records they get.
Jonathan Rochkind also concludes that OCLC’s focus on Copyright is bogus:
All this is to say, the law has changed quite a bit since 1982. If OCLC is counting on a copyright, they should probably have their legal counsel investigate. I’m not a lawyer, it doesn’t seem good to me–and even if they did have copyright, I can’t see how this would prevent people from taking sets of records anyway, as long as they didn’t take the whole database. But I’m still not a lawyer.
This is OCLC’s fear, that the WorldCat will get out of the bag.
The comparisons with other projects that use licenses such as CC or GFDL, and even open-source licenses are also entirely without merit.
To understand why we have to understand the philosophy behind the use of licenses. In OCLC’s case the intention is to restrict the usage of the data in order to prevent competing services from appearing. In the case of wikipedia and open-source projects the use of licenses is there to allow the community to fork the project in order to prevent monopoly ownership – i.e. to allow competing versions to appear. There are many versions of Linux, the community is better for that, the good ones thrive and the bad ones die. When a good one goes bad others rise up to take its place, starting from a point just before things went bad. If this is what OCLC want they must allow anyone to take the data, all of it, easily and create a competing service – under the same constraints, that the competing service must also make its data freely available. That’s what the ODC PDDL was designed for.
The reason this works in practice is that these are digital goods, in economic terms that means they are non-rival – if I give you a copy I still have my own copy, unlike a rival good where giving it to you would mean giving it up myself. OCLC has built a business model based on the notion that its data is a rival good, but the internet, cheap computing and a more mature understanding shows that to be broken.
Jonathan Rochkind also talk about a difference in intent in criticising OCLC’s comparison with Creative Commons:
But there remains one very big difference between the CC-BY-NC license you used as a model, and the actual policy. Your actual policy requires some recipients of sharing to enter into an agreement with OCLC (which OCLC can refuse to offer to a particular entity). The CC-BY-NC very explicitly and intentionally does NOT require this, and even _removes_ the ability of any sharers to require this.
This is a very big difference, as the entire purpose of the CC licenses is to avoid the possibility of someone requiring such a thing. So your policy may be like CC-BY-NC, while removing it’s very purpose.
Striving to prevent the creation of an alternative database is anti-competitive, reduces innovation and damages the member libraries in order to protect OCLC corp.
Their [OCLC's record usage guidelines] stated rationale for imposing conditions on libraries’ record sharing is that “member libraries have made a major investment in the OCLC Online Union Catalog and expect other member libraries, member networks and OCLC to take appropriate steps to protect the database.”
This makes no sense. The investment has been made now. The money is gone. What matters now is how much it costs libraries to continue to do business. Those costs would be reduced by making the data a commodity. Several centralised efforts have the potential to do just that, but the internet itself has that potential too, a potential OCLC has been working against for a long time. Their fight has taken the form of asking member libraries and software authors like Terry Reese not to upset the status quo by facilitating easy access to the Z39.50 network and now this change to the policy.
What underlies this is a lack of trust in the members. OCLC know that if an alternative emerged its member libraries would move based on merit, and OCLC clearly doesn’t believe it could compete on that level playing field. They are saying that they require a monopoly position in order to be viable.
However, what’s good for members and what’s good for OCLC are not one and the same thing. Members’ investment would be better protected by ensuring that the data is as promiscuously copied as possible. If members were to force OCLC to release the entire database under terms that ensure anyone who takes a copy must also make that copy available to others under the same terms then competition and market would be created. Competition and market are what drive innovation both in features and in cost reduction. In fact, it would create exactly the kind of market that has caused US legislators to refuse a database right, repeatedly. Think about it.
Above all, don’t be fooled that this data is anything but yours. The database is yours. All of yours.
If WorldCat were being made available in its entirety like this, it would be entirely reasonable to put clauses in to ensure any union catalogs taking the WorldCat data had to also publish their data reciprocally. That route leads us to a point where a truly global set of data becomes possible – where World(Cat) means world rather than predominantly affluent American libraries.
Surely OCLC, with its expertise in service provision, its understanding of how to analyse this kind of data, its standing in the community and not to forget its substantial existing network of libraries and librarians would continue to carve out a substantial and prestigious role for itself?
I’ve met plenty of folks from OCLC and they’re smart. They’ll come up with plenty of stuff worth the membership fee – it just shouldn’t be the data you already own.
Just blogged for work about Thomson Reuters suing George Mason University over EndNote.
It seems slightly surprising that an organisation like Thomson Reuters, who are doing cool stuff elsewhere with projects like Calais, would be asking for $10 million from an institution like George Mason University.
Unsurprisingly neither the George Mason site nor the Thomson Reuters site have any information regarding the dispute.
Various news sites are reporting on an interesting Copyright claim going through the New York courts right now. The BBC Says:
Author JK Rowling is to testify in a New York court this week over plans to publish an unofficial Harry Potter encyclopaedia.
The case centers on the suggestion that the encyclopaedia, composed from material on fan-site The Harry Potter Lexicon and original work by the author, Steve Vander Ark.
The Wall Street Journal’s Law Blog has a very fair-use centric piece written by Dan Slater, while others pit their language as "JK Rowling in bid to defend Copyright". And this distinction is at the heart of the matter. To what extent does JK own the notion of Harry Potter and everything in the world she created and to what extent does she only own the books themselves.
The case will be bringing up some interesting arguments about just what Copyright protects.
The first steps of the Semantic Web are now a short distance behind us and some organisations are starting to pick up the pace. With more and more data coming online, marked up for linking and sharing in a web of data, perhaps it’s time to look again at the trade-off of different intellectual property rights.
Back in November of 2004 James Boyle published A Natural Experiment in the Financial Times. This piece sees him debating the merits of intellectual property rights over data with Thomas Hazlett and Richard Epstein. His primary thrust is that we should be making policy decisions in this area based on empirical data about the economic benefits one way or another. Something all three protagonists agree on.
Much has changed between 2004 and now, not least our understanding of how the web can affect the way we collaborate, share, communicate; it fundamentally affects the way we live. We chat, we blog, we Twitter, we Flickr and we Joost. Content flows from person to person in unprecedented ways and at unprecedented speeds. This changes the nature of the experiment that Boyle talks about.
If the database right were working, we would expect positive answers to three crucial questions. First, has the European database industry’s rate of growth increased since 1996, while the US database industry has languished? [...] Second, are the principal beneficiaries of the database right in Europe producing databases they would not have produced otherwise? [...] Third, [...] is the right promoting innovation and competition rather than stifling it?
Boyle’s first two questions centre around the creation of databases and his third, by his own admission, is difficult to measure. If one of our primary goals for the growth of the Internet is to have a web of data that can be linked and accessed across the globe we may be better served by assessing how companies might make data open.
Boyle asks for, and discusses, the empirical evidence of databases being created in the EU and US. The differences in numbers should provide insight into the economic ups and downs as the EU adopted a robust database right in 1996 while the US ruled against such protection in 1991. I am interested in how we expect the growth of data on the Semantic Web to differ in the two jurisdictions.
Boyle explains that the US Chamber of Commerce oppose the creation of a database right in the US
[The US Chamber of Commerce] believe that database providers can adequately protect themselves with contracts, technical means such as passwords, can rely on providing tied services and so on.
And therein lies the rub. Without appropriate protection of intellectual property we have only two extreme positions available: locked down with passwords and other technical means; or wide open and in the public-domain. Polarising the possibilities for data into these two extremes makes opening up an all or nothing decision for the creator of a database.
With only technical and contractual mechanisms for protecting data, creators of databases can only publish them in situations where the technical barriers can be maintained and contractual obligations can be enforced.
We don’t tolerate this with creative works, our photographs, our blog posts and so on. Why would we expect it to make sense for databases? Whether or not it makes sense comes down to whether or not it is beneficial to society. We allow Copyright in order to provide adequate remuneration to be collected by the creator of a work. We allow patents to allow the recovery of development costs for an invention. Which is database right more like?
Patent is a very broad monopoly. If I had a patent on the clock, a mechanical means of measuring the passing of time, nobody else would be able to make clocks. Copyright, on the other hand is much narrower only allowing me to protect the specific design of my clocks. This is where it can get confusing with databases. Database right in the EU is like Copyright. It is a monopoly, but only on that particular aggregation of the data. The underlying facts are still not protected and there is nothing to stop a second entrant from collecting them independently.
Richard Epstein points to this in his contribution
The question is why do databases fall outside [the general principle of copyright], when the costs of compilation are in many cases substantial for the initial party and trivial for anyone who receives judicial blessing to copy the base? In answering this question, it will not do to say, as the Supreme Court said in the well known decision in Feist Publications v. Rural Telephone Service, (1991) that these compilations are not “original” in the sense that it requires no thought to check the spelling of the entries and to put them all in alphabetical order. But that obvious point should be met with an equally obvious rejoinder. If it requires no thought or intelligence to put the information together, then why not ask the second entrant into the market to go through the same drudge work as the first.
This is exactly what we see happening with Open Street Map. Ordnance Survey in the UK have rights over the map data they have collected. The protection covers the collection of geospatial data that they have created, they are not granted a monopoly in geospatial data.
This leaves a special case of databases, those which are created at low cost as a by-product of normal business. Examples used in Boyle’s article are telephone numbers, television schedules and concert times. Boyle gives us the answer directly
the [European] court ruled that the mere running of a business which generates data does not count as “substantial investment” enough to trigger the database right.
This reminds me strongly of The Smell of Food and the Sound of Coins a folk tale in which a wise judge decides that a restaurateur may charge for the smell of food wafting from his restaurant, however the appropriate price is the sound of coins chinking together.
That a database right may not and should not apply in all cases, and that there is a requirement to restrict anti-competitive practices, does not necessarily extend to the conclusion that a right is not required.
It seems to me that much of the debate around intellectual property rights has focussed on how they are used to keep things closed. Having suggested earlier that we have only the abilities to keep databases locked away or in contrast open them completely, I’d like to consider what it might mean to have a database right for keeping things open.
In response to Thomas Hazlett’s contribution Boyle asks
How many databases are now created and maintained entirely “free” and thus escape commercial directories altogether? There are obviously many, both in the scientific and the consumer realm. One can no more omit these from consideration, than one can omit free software from the software market.
This strikes me as a great comparison to consider. Taking one of the most prevalent free software licenses, the Gnu Public License, what might that look like for data?
One of the primary functions of the GPL is that it enforces Copyleft – the requirement to license derivative, and even complimentary, works under an the same license. That is, any commercial software that makes use of GPL code must, under the terms of the license, also be released under the GPL. The viral nature of this license is possible only because of the backing of Copyright.
Without a database right communities have no mechanism to publish openly and still insist upon this kind of Share-Alike agreement.
Consider the impact of this for situations where you you might use the idea of promiscuous copying to maintain the availability of data. Promiscuous copying relies on two things, lots of copies being made and lots of copies being available. Without the necessary licensing in place there is no mechanism with which to compel those who have copies to make those available. Public Domain means, by definition, no restriction – that means I can lock it away again.
Copyleft is just one position along a spectrum where ‘locked away’ and ‘free as a bird’ sit at each end. What the web shows us is that other business models form crucial parts of the eco-system. Epstein picks up on the controlling aspect of Boyle’s argument:
They can control their list of subscribers; give them each passwords; charge them based on the amount of the information that is used, or some other agreed-upon formula; and require them not to sell or otherwise transfer the information to third parties without the consent of the data base owner.
Imagine if this were true of Copyright material on the web? It has been, and still is on the occasional site. But mostly copyright owners are starting to see the value of publishing content online and they are underpinning the delivery of that content to consumers with other business models. Without Copyright the types of business that could participate would be reduced.
Epstein goes on to say:
The contractual solution is surely preferable, because general publication will allow for use by others that may not offend the copyright law, but which will block the possibility of payment for the costly information that is supplied.
And again, the very heart of the matter. If we are to encourage those who have large databases to make them open, to post them on the Semantic Web, we must provide them with models and solutions that are preferable to technical barriers and restrictive contracts. Allowing them to pick their own position on the spectrum seems to me to be a necessity in that. You can see any form of protection in two lights. When Boyle says
They make inventors disclose their inventions when they might otherwise have kept them secret.
They allow inventors to disclose their inventions when they might otherwise have had to keep them secret.
That’s why we’ve invested in a license to do this, properly, clearly and in a way that stays Open.
Rob Styles is Programme Manager for Data Services at Talis, a UK company building Semantic Web technologies. Rob Styles is not a lawyer.