Over the past weekend I participated in a Twitter conversation on the topic of meaning, data, transformation and packaging. The conversation is too long to repost here, but looking from July 11-12 for @metadata_maven should pick most of it up. Aside from my usual frustration at the message limitations in Twitter, there seemed to be a lot of confusion about what exactly we mean about ‘meaning’ and how it gets expressed in data. I had a skype conversation with @jonphipps about it, and thought I could reproduce that here, in a way that could add to the original conversation, perhaps clarifying a few things. [Probably good to read the Twitter conversation ahead of reading the rest of this.]
Jon Phipps: I think the problem that the people in that conversation are trying to address is that MARC has done triple duty as a local and global serialization (format) for storage, supporting indexing and display; a global data interchange format; and a focal point for creating agreement about the rules everyone is expected to follow to populate the data (AACR2, RDA). If you walk away from that, even if you don’t kill it, nothing else is going to be able to serve that particular set of functions. But that’s the way everyone chooses to discuss bibframe, or schema.org, or any other ‘marc replacement’.
Diane Hillmann: Yeah, but how does ‘meaning’ merely expressed on a wiki page help in any way? Isn’t the idea to have meaning expressed with the data itself?
Jon Phipps: It depends on whether you see RDF as a meaning transport mechanism or a data transport mechanism. That’s the difference between semantic data and linked data.
Diane Hillmann: It’s both, don’t you think?
Jon Phipps: Semantic data is the smart subset of linked data.
Diane Hillmann: Nice tagline 🙂
Jon Phipps: Zepheira, and now DC, seem to be increasingly looking at RDF as merely linked data. I should say a transport mechanism for ‘linked’ data.
Diane Hillmann: It’s easier that way.
Jon Phipps: Exactly. Basically what they’re saying is that meaning is up to the receiver’s system to determine. Dc:title of ‘Mr.’ is fine in that world–it even validates according to the ‘new’ AP thinking. It’s all easier for the data producers if they don’t have to care about vocabularies. But the value of RDF is that it’s brilliantly designed to transport knowledge, not just data. RDF data is intended to live in a world where any Thing can be described by any Thing, and all of those descriptions can be aggregated over time to form a more complete description of the Thing Being Described. Knowledge transfer really benefits from Semantic Web concepts like inferences and entailments and even truthiness (in addition to just validation). If you discount and even reject those concepts in a linked data world than you might as well ship your data around as CSV or even SQL files and be done with it.
One of the things about MARC is that it’s incredibly semantically rich (http://marc21rdf.info) and has also been brilliantly designed by a lot of people over a lot of years to convey an equally rich body of bibliographic knowledge. But throwing away even a small portion of that knowledge in pursuit of a far dumber linked data holy grail is a lot like saying that since most people only use a relatively limited number of words (especially when they’re texting) we have no need for a 50,000 word, or even a 5,000 word, dictionary.
MARC makes knowledge transfer look relatively easy because the knowledge is embedded in a vocabulary every cataloger learns and speaks fairly fluently. It looks like it’s just a (truly limiting) data format so it’s easy to think that replacing it is just a matter of coming up with a fresh new format, like RDF. But it’s going to be a lot harder than that, which is tacitly acknowledged by the many-faceted effort to permanently dumb-down bibliographic metadata, and it’s one of the reasons why I think bibframe.org, bibfra.me, and schema.org might end up being very destructive, given the way they’re being promoted (be sure to Park Your MARC somewhere).
[That’s why we’re so focused on the RDA data model (which can actually be semantically richer than MARC), why we helped create http://marc21rdf.info, and why we’re working at building out our RDF vocabulary management services.]
Diane Hillmann: This would be a great conversation to record for a podcast 😉
Jon Phipps: I’m not saying proper vocabulary management is easy. Look at us for instance, we haven’t bothered to publish the OMR vocabs and only one person has noticed (so far). But they’re in active use in every OMR-generated vocab.
The point I was making was that we we’re no better, as publishers of theoretically semantic metadata, at making sure the data was ‘meaningful’ by making sure that the vocabs resolved, had definitions, etc.
[P.S. We’re now working on publishing our registry vocabularies.]