There is no “metadata”

For a while I’ve been avoiding using the term metadata for a few reasons. I’ve had a few conversations with people about why and so I thought I’d jot the thoughts down here.

First of all, the main reason I stopped using the term is because it means too many different things. Wikipedia recognises metadata as an ambiguous term

The term metadata is an ambiguous term which is used for two fundamentally different concepts (types). Although the expression “data about data” is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at design time the application contains no data. In this case the correct description would be “data about the containers of data”. Descriptive metadata, on the other hand, is about individual instances of application data, the data content. In this case, a useful description (resulting in a disambiguating neologism) would be “data about data content” or “content about content” thus metacontent. Descriptive, Guide and the National Information Standards Organization concept of administrative metadata are all subtypes of metacontent.

and even within the world of descriptive metadata the term is used in many different ways.

I have always found a better, more accurate, complete and consistent term. Such as catalogueprovenanceauditlicensing and so on. I haven’t come across a situation yet where a more specific term hasn’t helped everyone understand the data better.

Data is just descriptions of things and if you say what aspects of a thing you are describing then everyone gets a better sense of what they might do with that. Once we realise that data is just descriptions of things, written in a consistent form to allow for analysis, we can see the next couple of reasons to stop using metadata.

Meta is a relative term. Ralph Swick of W3C is quoted as saying

What’s metadata to you, is someone else’s fundamental data.

That is to say, wether you consider something meta or not depends totally on your context and the problem you’re trying to solve. Often several people in the room will consider this differently.

If we combine that thought with the more specific naming of our data then we get the ability to think about descriptions of descriptions of descriptions. Which brings me on to something else I observe. By thinking in terms of data and metadata we talk, and think, in a vocabulary limited to two layers. Working with Big Data and Graphs I’ve learnt that’s not enough.

Taking the example of data about TV programming from todays RedBee post we could say:

  1. The Mentalist is a TV Programme
  2. The Mentalist is licensed to Channel 5 for broadcast in the UK
  3. The Mentalist will be shown at 21.00 on Thursday 12 April 2012

Statement 2 in that list is licensing data, statement 3 is schedule data. This all comes under the heading of descriptive metadata. Now, RedBee are a commercial organisation who put constraints on the use of their data. So we also need to be able to say things like

  • Statements 1, 2 and 3 are licensed to BBC for competitor analysis

This statement is also licensing data, about the metadata… So what is it? Descriptive metametadata?

Data about data is not a special case. Data is just descriptions of things and remains so wether the things being described are people, places, TV programmes or other data.

That’s why I try to replace the term metadata with something more useful whenever I can.

Getting over-excited about Dinosaurs…

I had the great pleasure, a few weeks ago, of working with Tom Scott and Michael Smethurst at the BBC on extensions to the Wildlife Ontology that sits behind Wildlife Finder.

In case you hadn’t spotted it (and if you’re reading this I can’t believe you haven’t) Wildlife Finder provides its information in HTML and RDF — Linked Data, providing a machine-readable version of the documents for those who want to extend or build on top of it. Readers of this blog will have seen Wildlife Finder showcased in many, many Linked Data presentations.

The initial data modelling work was a joint venture between Tom Scott of BBC and Leigh Dodds of Talis and they built an ontology that is simple, elegant and extensible. So, when I got a call asking if I could help them add Dinosaurs into the mix I was chuffed — getting paid to talk about dinosaurs!

Like most children, and we’re all children really, I got over-excited and rushed up to London to find out more. Tom and I spent some time working through changes and he, being far more knowledgeable than I on these matters, let me down gently.

Dinosaurs, of course, are no different to other animals in Wildlife Finder — other than being dead for a while longer…

This realisation made me feel a little below average in the biology department I can tell you. It’s one of those things you stumble across that is so obvious once someone says it to you and yet may well not have occurred to you without a lot of thought.


Schneier on Security: A Taxonomy of Social Networking Data

A Taxonomy of Social Networking Data

At the Internet Governance Forum in Sharm El Sheikh this week, there was a conversation on social networking data. Someone made the point that there are several different types of data, and it would be useful to separate them. This is my taxonomy of social networking data.

from Schneier on Security: A Taxonomy of Social Networking Data.

Follow the link for a useful breakdown of data in any community site or service.

Conversation with Bruce D’Arcus on Motivation for MODS Ontology « Musings

The problem from my standpoint is that MODS has some really odd, library-specific, design choices that I don’t think map very well to the wider world. A central concept like mods:name, with mods:role as a child of that, really makes no sense, and conflicts with more common modeling you see in DC, FRBR ,etc.

It’s semantics are also really loose.

So you have to ask yourself, just how linked could a MODS view in RDF really be?

from Conversation with Bruce D’Arcus on Motivation for MODS Ontology.

Multi-Tenant Configuration Schema

Are you writing multi-tenant software? Are you using RDF at all? Do you want to keep track of your tenants?

You might want to comment on the first draft of the new Multi-Tenant Configuration Schema.

This schema attempts to describe a simple set of concepts and relationships about tenants within a multi-tenant software system. It avoids anything that would constitute application configuration, but will happily co-exist with classes and properties to do that. The documentation is sparse currently, awaiting questions and comment so that I can expand on areas that require further explanation. Comment here, or email me.

Resource List Ontology

Working on our current project, Nad and I have been modelling resource lists – think course reading lists, but containing more than just reading material. A list is simply a collection of resources grouped into sections and annotated by the academic to give students guidance when using the list.

A resource list can take much more than just books. We’re using Bibliontology to model the resources, so more or less anything can be accurately described. We’ve requested a couple of additions and are still debating those.

The structure of the list is described using a new ontology which we’ve published on, but with a base URI. The ontology lives here:

It also works just fine with the AIISO ontology we published a few days ago – so you can say which parts of your institution are using which lists.