A decade ago, when the Open Metadata Registry (OMR) was just being developed as the NSDL Registry, the vocabulary world was a very different place than it is today. At that point we were tightly focussed on SKOS (not fully cooked at that point, but Jon was on the WG that was developing it, so we felt pretty secure diving in).
But we were thinking about versioning in the Open World of RDF even then. The NSDL Registry kept careful track of all changes to a vocabulary (who, what, when) and the only way to get data in was through the user interface. We ran an early experiment in making versions based on dynamic, timestamp-based snapshots (we called them ‘time slices’, Git calls them ‘commit snapshots’) available for value vocabularies, but this failed to gain any traction. This seemed to be partly because, well, it was a decade ago for one, and while it attempted to solve an Open World problem with versioned URIs, it created a new set of problems for Closed World experimenters. Ultimately, we left the versions issue to sit and stew for a bit (6 years!).
All that started to change in 2008 as we started working with RDA, and needed to move past value vocabularies into properties and classes, and beyond that into issues around uploading data into the OMR. Lately, Git and GitHub have started taking off and provide a way for us to make some important jumps in functionality that have culminated in the OMR/GitHub-based RDA Registry. Sounds easy and intuitive now, but it sure wasn’t at the time, and what most people don’t know is that the OMR is still where RDA/RDF data originates — it wasn’t supplanted by Git/Github, but is chugging along in the background. The OMR’s RDF CMS is still visible and usable by all, but folks managing larger vocabularies now have more options.
One important aspect of the use of Git and GitHub was the ability to rethink versioning.
Just about a year ago our paper on this topic (Versioning Vocabularies in a Linked Data World, by Diane Hillmann, Gordon Dunsire and Jon Phipps) was presented to the IFLA Satellite meeting in Paris. We used as our model the way software on our various devices and systems is updated–more and more these changes happen without much (if any) interaction with us.
In the world of vocabularies defining the properties and values in linked data, most updating is still very manual (if done at all), and the important information about what has changed and when is often hidden behind web pages or downloadable files that provide no machine-understandable connections identifying changes. And just solving the change management issue does little to solve the inevitable ‘vocabulary rot’ that can make published ‘linked data’ less and less meaningful, accurate, and useful over time.
Building stable change management practices is a very critical missing piece of the linked data publishing puzzle. The problem will grow exponentially as language versions and inter-vocabulary mappings start to show up as well — and it won’t be too long before that happens.
Please take a look at the paper and join in the conversation!