There is no “metadata”

For a while I’ve been avoiding using the term metadata for a few reasons. I’ve had a few conversations with people about why and so I thought I’d jot the thoughts down here.

First of all, the main reason I stopped using the term is because it means too many different things. Wikipedia recognises metadata as an ambiguous term

The term metadata is an ambiguous term which is used for two fundamentally different concepts (types). Although the expression “data about data” is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at design time the application contains no data. In this case the correct description would be “data about the containers of data”. Descriptive metadata, on the other hand, is about individual instances of application data, the data content. In this case, a useful description (resulting in a disambiguating neologism) would be “data about data content” or “content about content” thus metacontent. Descriptive, Guide and the National Information Standards Organization concept of administrative metadata are all subtypes of metacontent.

and even within the world of descriptive metadata the term is used in many different ways.

I have always found a better, more accurate, complete and consistent term. Such as catalogueprovenanceauditlicensing and so on. I haven’t come across a situation yet where a more specific term hasn’t helped everyone understand the data better.

Data is just descriptions of things and if you say what aspects of a thing you are describing then everyone gets a better sense of what they might do with that. Once we realise that data is just descriptions of things, written in a consistent form to allow for analysis, we can see the next couple of reasons to stop using metadata.

Meta is a relative term. Ralph Swick of W3C is quoted as saying

What’s metadata to you, is someone else’s fundamental data.

That is to say, wether you consider something meta or not depends totally on your context and the problem you’re trying to solve. Often several people in the room will consider this differently.

If we combine that thought with the more specific naming of our data then we get the ability to think about descriptions of descriptions of descriptions. Which brings me on to something else I observe. By thinking in terms of data and metadata we talk, and think, in a vocabulary limited to two layers. Working with Big Data and Graphs I’ve learnt that’s not enough.

Taking the example of data about TV programming from todays RedBee post we could say:

  1. The Mentalist is a TV Programme
  2. The Mentalist is licensed to Channel 5 for broadcast in the UK
  3. The Mentalist will be shown at 21.00 on Thursday 12 April 2012

Statement 2 in that list is licensing data, statement 3 is schedule data. This all comes under the heading of descriptive metadata. Now, RedBee are a commercial organisation who put constraints on the use of their data. So we also need to be able to say things like

  • Statements 1, 2 and 3 are licensed to BBC for competitor analysis

This statement is also licensing data, about the metadata… So what is it? Descriptive metametadata?

Data about data is not a special case. Data is just descriptions of things and remains so wether the things being described are people, places, TV programmes or other data.

That’s why I try to replace the term metadata with something more useful whenever I can.

Building a simple HTTP-to-Z39.50 gateway using Yaz4j and Tomcat | Index Data

Yaz4J is a wrapper library over the client-specific parts of YAZ, a C-based Z39.50 toolkit, and allows you to use the ZOOM API directly from Java. Initial version of Yaz4j has been written by Rob Styles from Talis and the project is now developed and maintained at IndexData. ZOOM is a relatively straightforward API and with a few lines of code you can write a basic application that can establish connection to a Z39.50 server. Here we will try to build a very simple HTTP-to-Z3950 gateway using yaz4j and the Java Servlet technology.

from Building a simple HTTP-to-Z39.50 gateway using Yaz4j and Tomcat | Index Data.

I write Yaz4J a couple of years ago now and it’s great to see it getting some use outside of Talis.

Ground roundup of new eReaders at CES on CNN

Las Vegas, Nevada (CNN) — The first generation of electronic readers had little more than black-and-white text. The second generation had black-and-white text, simple graphics and Web connectivity.

Glimpses of the third generation are on display this week at the International Consumer Electronics Show, where manufacturers are previewing e-readers with color screens, interactive graphics and magazine-style layouts.

from Bold new e-readers grab attention at CES –

ShelterIt – My digital think-tank: On identity

Did you notice what just happened? I used used an URI as an identifier for a subject. If you popped that URI into your browser, it will take you to WikiPedia’s article on the book and provide a lot of info there in human prose about this book, and this would make it rather easy for Bob to say that, yes indeed, that’s the same book I’ve got. So now we’ve got me and Bob agreeing that we have the same book.

from ShelterIt – My digital think-tank: On identity.

Great piece by Alexander Johannesen about the future of library data, semantic web and the difficulties of getting from here to there.

yaz4j | Index Data

yaz4j is a toolkit for Java which includes a wrapper for the ZOOM API of YAZ. This allows developers to write Z39.50/SRU clients in Java. yaz4j supports both search and scan. See the javadoc for details.

from yaz4j | Index Data.

I wrote Yaz4J a couple of years ago when I needed a robust Z39.50 client. The underlying work is done by Index Data’s Yaz library, wrapped for use in Java using JNI (and yes, JNI does work fine and yes it does work cross-platform, we have it running on Linux, Windows and OS X). I hadn’t ever found the time to properly structure and mavenise the code or release it properly so it’s very pleasing that Adam Dickmeiss and Mike Taylor from Index Data along with Juan Cayetano have tidied it all up and published it under a home on Index Data’s site.


Conversation with Bruce D’Arcus on Motivation for MODS Ontology « Musings

The problem from my standpoint is that MODS has some really odd, library-specific, design choices that I don’t think map very well to the wider world. A central concept like mods:name, with mods:role as a child of that, really makes no sense, and conflicts with more common modeling you see in DC, FRBR ,etc.

It’s semantics are also really loose.

So you have to ask yourself, just how linked could a MODS view in RDF really be?

from Conversation with Bruce D’Arcus on Motivation for MODS Ontology.

Panlibus » Blog Archive » Library of Congress launch Linked Data Subject Headings

Agree with this summary from Richard

On the surface, to those not yet bought in to the potential of Linked Data, and especially Linked Open Data, this may seem like an interesting but not necessarily massive leap forward. I believe that what underpins the fairly simple functional user interface they provide will gradually become core to bibliographic data becoming a first-class citizen in the web of data.

Overnight this uri ‘’ has now become the globally available, machine and human readable, reliable source for the description for the subject heading of ‘Elephants’ containing links to its related terms (in a way that both machines and humans can navigate). This means that system developers and integrators can rely upon that link to represent a concept, not necessarily the way they want to [locally] describe it. This should facilitate the ability for disparate systems and services to simply share concepts and therefore understanding – one of the basic principles behind the Semantic Web.

from Panlibus » Blog Archive » Library of Congress launch Linked Data Subject Headings.

Great to see LoC doing this stuff and getting it out there.

Why you can't find a library book in your search engine | Technology | The Guardian

Wendy Grossman, in The Guardian, covers the difficulties of libraries publishing their catalogue data online.

Despite the internet’s origins as an academic network, when it comes to finding a book, e-commerce rules. Put any book title into your favourite search engine, and the hits will be dominated by commercial sites run by retailers, publishers, even authors. But even with your postcode, you won’t find the nearest library where you can borrow that book. (The exception is Google Books, and even that is limited.)

via Why you can’t find a library book in your search engine | Technology | The Guardian.

I get a namecheck and a quote at the end:

Rob Styles, a programme manager for Talis’s data services, says: “The main reason I think libraries need freedom to innovate is because we don’t know what they’re going to look like”.