I think I mentioned somewhere that one of my current projects is a part of the NISO Bibliographic Roadmap effort to develop vocabulary best practices in the areas of use/reuse, documentation and preservation. The project itself builds on a full-day vocabulary workshop brought together at the DCMI conference in 2011.
I share co-chair duties on the Use/Reuse subgroup with Daniel Lovins of NYU, but as we’ve all moved along on this project, the overlaps between the three subgroups tasked with writing the best practices documentation (the other groups are tasked specifically with documentation and preservation) becomes more and more apparent. We’ve determined that our final recommendations will be made together in one document, rather than spend our time trying to untie the Gordian Knot that is vocabulary issues in toto.
At some point I fielded a question from one of the group members about the difference between vocabulary maintenance and vocabulary preservation. It was a great question, and one that I hadn’t really thought much about. To some extent each topic has a basic assumption: for a vocabulary being actively maintained, preservation isn’t much of an issue. But when, for whatever reason, the vocabulary ceases being actively maintained, preservation becomes the issue. One question the Roadmap project is trying to address is how to tell the difference, given the dearth of data generally provided about the who, how and when of ownership and management of vocabularies, much less what policies and practices are being followed. There’s a whole range of issues around preservation: how it should be done and by whom, and what kind of context to wrap around the supine resource.
In some respects, the problem of vocabulary preservation is not easily separated from issues around the loss of funding for projects building vocabularies as well as vocabulary development or management tools, almost all of which were initially developed by funded projects. This suggests we have a preservation problem built on a sustainability problem. The report draft as it stands cites several projects which were initially funded in whole or part to address issues around vocabulary provision, in particular research or practice communities, but have not received funding to build out or maintain their tools or their resources. This is a significant issue, given that without the ability to maintain the infrastructure supporting the structural or conceptual vocabularies required to describe resources being aggregated or distributed, there is no such thing as proper distribution or maintenance of any data resources. Consider how many projects are now being funded to consider the problems of ‘big data’, particularly scientific research data, all of whose solutions depend to some extent on metadata vocabularies.
This concern is not that different than what we hear constantly about the physical infrastructure supporting our transportation systems, crumbling in ways not dissimilar to our data distribution systems (although without the scary safety implications). Many of us on the academic side of these questions take it for granted that funding comes and goes, and consider not the longer-term implications of these ebbs and flows and how funding agency priorities (generally focused on innovation rather than maintenance) affect the vocabulary environment. But it remains clear that a big reason we talk about vocabulary preservation is because there are long-term implications of depending on funded projects to build and maintain the infrastructure around vocabularies as well as the vocabularies themselves.
I speak from experience on this issue. The Open Metadata Registry (OMR) was built on funding from the US National Science Foundation (NSF), and when its funding ended in 2007 should have died the death of most such project based tools. That it didn’t is due almost entirely to the fact that it was built by two very stubborn people who were pretty hooked by the usefulness of vocabularies. For several years we searched in vain for additional funding, but at that time funding was scarce, and what little there was went to much sexier undertakings. For much of that time the OMR survived on a ‘too cheap to fail’ strategy, where early grant funding had paid ahead for long-term costs of server space and basic technical upkeep.
As far as I know, the OMR is one of the only free general-purpose vocabulary development and maintenance tools that has survived past its initial funding. At this point a large percentage of the library worlds’ vocabularies are maintained using the OMR, and although it is built to enable all of them to migrate at any time to another repository or tool, there is not currently much of anything available for them to migrate to. This suggests that sustainable funding is a far more critical issue than we’ve yet considered.
At present the OMR developers are setting up a proposal for funding a sustainability plan for the OMR, to spend the next 2-3 years bringing it up-to-date as a viable, community-driven Free Open Source Software project with long-term institutional support for its infrastructure, so it can be maintained after our retirement. If the plan works, the OMR will remain a place where vocabularies will be safe, secure, and maintainable, with room for maps and application profiles using those vocabularies. Other commercial and research-based options are still there, but not necessarily friendly to limited budgets or organizations with limited technical support.
During the later days of the National Science Digital Library (NSDL)–under whose aegis the OMR was developed–the NSF began to include requirements for sustainability into their grant process. I don’t know how many other granting agencies mandate those kinds of requirements, but judged by huge amounts of the abandoned detritus of decades of research funding, certainly more should.