For a while I’ve been avoiding using the term metadata for a few reasons. I’ve had a few conversations with people about why and so I thought I’d jot the thoughts down here.
First of all, the main reason I stopped using the term is because it means too many different things. Wikipedia recognises metadata as an ambiguous term
The term metadata is an ambiguous term which is used for two fundamentally different concepts (types). Although the expression “data about data” is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at design time the application contains no data. In this case the correct description would be “data about the containers of data”. Descriptive metadata, on the other hand, is about individual instances of application data, the data content. In this case, a useful description (resulting in a disambiguating neologism) would be “data about data content” or “content about content” thus metacontent. Descriptive, Guide and the National Information Standards Organization concept of administrative metadata are all subtypes of metacontent.
and even within the world of descriptive metadata the term is used in many different ways.
I have always found a better, more accurate, complete and consistent term. Such as catalogue, provenance, audit, licensing and so on. I haven’t come across a situation yet where a more specific term hasn’t helped everyone understand the data better.
Data is just descriptions of things and if you say what aspects of a thing you are describing then everyone gets a better sense of what they might do with that. Once we realise that data is just descriptions of things, written in a consistent form to allow for analysis, we can see the next couple of reasons to stop using metadata.
Meta is a relative term. Ralph Swick of W3C is quoted as saying
What’s metadata to you, is someone else’s fundamental data.
That is to say, wether you consider something meta or not depends totally on your context and the problem you’re trying to solve. Often several people in the room will consider this differently.
If we combine that thought with the more specific naming of our data then we get the ability to think about descriptions of descriptions of descriptions. Which brings me on to something else I observe. By thinking in terms of data and metadata we talk, and think, in a vocabulary limited to two layers. Working with Big Data and Graphs I’ve learnt that’s not enough.
Taking the example of data about TV programming from todays RedBee post we could say:
- The Mentalist is a TV Programme
- The Mentalist is licensed to Channel 5 for broadcast in the UK
- The Mentalist will be shown at 21.00 on Thursday 12 April 2012
Statement 2 in that list is licensing data, statement 3 is schedule data. This all comes under the heading of descriptive metadata. Now, RedBee are a commercial organisation who put constraints on the use of their data. So we also need to be able to say things like
- Statements 1, 2 and 3 are licensed to BBC for competitor analysis
This statement is also licensing data, about the metadata… So what is it? Descriptive metametadata?
Data about data is not a special case. Data is just descriptions of things and remains so wether the things being described are people, places, TV programmes or other data.
That’s why I try to replace the term metadata with something more useful whenever I can.