In my post last week, I mentioned a paper that Gordon Dunsire, Jon Phipps and I had written for the IFLA Satellite Meeting in Paris last month “Linked Data in Libraries: Let’s make it happen!” (note the videos!). I wanted to talk about the paper and why we wrote it, but I’m not just going to summarize it–I wouldn’t want to spoil the paper for anyone!
The paper, “Versioning Vocabularies in a Linked Data World”, was written in part because we’d seen far too many examples of vocabulary management and distribution that paid little or no attention to the necessity to maintain vocabularies over time and to make them available (over and over again, of course) to the data providers using them. It goes without saying that the vocabularies were expected to change over time, but in too many cases, vocabulary owners distributed changes in document form, or as files with new data embedded but no indication of what had changed, or worse: nothing.
We have been thinking about this problem for a long time. Even the earliest instance of the NSDL Registry (precursor of the current Open Metadata Registry, or OMR, as we like to call it) incorporated a ‘history’ view of the data, basically the ‘who, what, when’ of every change made in every vocabulary. Later on, we added the ability to declare ‘versions’ of the vocabularies themselves, taking advantage of that granular history data, for those trying to manage the updating of their ‘product’ in a rational manner. Sadly enough, not very many of our users took advantage of that feature, and we’re not entirely sure why not, but there it was. Jon has always been frustrated with our first passes at this problem, and after Gordon and I discussed the problem with others at DC-2013 last year, and my rant about the lack of version control on id.loc.gov came out, it seemed time to think about the issue again.
At that point we were also planning our own big time versioning event: the unpublished first version of the RDA Element Sets were about to make their re-debut in ‘published’ form, reorganized, and with new URIs. Jon was also working on the GitHub connection with the OMR underlying the new RDA Registry site, working in a more automated mode as planned. He and Gordon and I had been discussing a new approach for some time, based on the way software is versioned and distributed, which is well-supported in Git and GitHub. So, as we drove back from ALA Midwinter in Philadelphia in January of last year, Jon and I blocked out the paper we’d agreed to do with Gordon on how we thought versioning should work in the semantic vocabulary world.
Consider: how do all of us computer nerds update our applications? Do we have to go to all sorts of websites (sometimes, but not always, prompted by an email) to determine which applications have changed and invoke an update? Well, sure, sometimes we do (particularly when they want more money!), but since the advent of the App Store and Google Play, we can do our updates much more easily, and for the most part those updates are ‘pushed’ to us for decisions on whether we want to update or not, we are told in a general way what has changed, and we click … and it’s done.
This is the way updates should happen in the Semantic Web data world, increasingly dependent on element sets and value vocabularies to provide descriptions of products of all kinds in order to provide access, drive sales or eyeballs, or support effective connections between resources. Now that we’re all reconciled to using URIs instead of text (even if our data hasn’t yet made that transition), shouldn’t we consider an important upside of that change, a simpler and more useful way to update our data?
So, I’ll quit there–go read the paper and let us know what you think. Don’t miss Gordon’s slides from Paris, available on his website. Note especially the last question on his final slide: “Is it time to get serious about linked data management?” We think it’s past time. After all, ‘management’ is our middle name.
Note: As of this week the video of Gordon’s presentation in Paris is now available.