I hope many of you have delved into the NISO Technical Report on Issues in Vocabulary Management, released this past summer. In my opinion, one of the most important insights contained in that report came about as the group reviewed vocabulary management tools and development projects over the past few years, most of them still cited as if they were alive and kicking. But for the most part, they weren’t.

“Finally, there is the issue of sustainability of the vocabularies selected for use in data. A sustainable vocabulary is protected by organizational or institutional commitments, policies that make clear who makes those commitments and what they mean, as well as a record of responsible maintenance and growth. A vocabulary without those commitments may not be sustainable over time and may be a questionable investment for organizations seeking to use the vocabulary in their data.”

Why is this important? A key concept in the notion of sustainability of vocabularies is expressed in the use of use of the term ‘investment’. A data provider who chooses to use a particular vocabulary for their data has (whether they recognize it or not) an investment in that vocabulary, thus an important interest in its sustainability. Consider the challenge of keeping those data instances relevant over time in an open world (which librarians have been doing for decades in a very closed world). Doing so requires a combination of sustainably managed vocabularies, and tools to provide change notification and other services to the maintainers of data containing those vocabularies. Inexplicably, there has been very little discussion of this, despite the fact that just about every model for future metadata development presumes the increased use of public vocabularies.

The NISO group spent some time on these issues:

“In some respects, the problem of vocabulary discovery and availability for general use is not easily separated from issues around the loss of funding for projects building vocabulary development or management tools, almost all of which were initially developed in time limited circumstances. This report cites several projects that were initially funded in whole or part to address issues around vocabulary provision in particular research or practice communities, but that have not received funding to extend or maintain their tools or their vocabularies. This is a significant problem, as without the ability to support the structural or conceptual vocabularies required to describe resources being aggregated or distributed, there is no such thing as useful distribution or maintenance of any data instances using those vocabularies. The many projects now being funded to consider the problems of ‘big data,’ particularly scientific research data, all depend on stable metadata vocabularies.

This concern is not that different from what we hear regularly about the crumbling physical infrastructure supporting our transportation systems (although without the scary safety implications). Many on the academic side of these questions take it for granted that funding comes and goes, and may not consider the longer-term implications of these ebbs and flows. But the reason to talk about vocabulary sustainability is that there are long-term implications of depending on funded projects to build and maintain the infrastructure around vocabularies used for linked open data, not to mention the vocabularies themselves.”

So what’s the solution for this set of problems? Is it to convince funding agencies to include sustainability plans as a requirement for receiving support (and if they’re not going to fund projects for more than two years how effective would that be)? Should we cede the development of vocabularies to institutional entities who can afford to maintain them over time (ex., the Getty)? Would a more stable funding model cover tools as well?

The current funding models have left a legacy of junk in space–many projects on the net and cited through the literature but no longer functioning, which cannot provide an infrastructure for vocabulary management we can all build upon. I wonder whether the discussion of these issues will become urgent–I hope before I retire from the scene!

By Diane Hillmann, November 30, 2017, 11:21 pm (UTC-5)

Add your own comment or set a trackback

Currently 1 comment

  1. Comment by John Graybeal

    Diane, great post, I could not agree more. We created MMI’s Ontology Registry and Repository many years ago because we saw the exact same scenario then in marine science, but these omissions are a constant in science funding.

    Now that science practitioners (and some funders) are pressing funders and publishers ever more strongly for *open* data, I think the science communities will inevitably come to realize that the published data is itself a form of “junk in space” unless it can be semantically understood. And the tools we are working on will be a part of the salvage operation that will professionalize the practices of the community.

Add your own comment

Follow comments according to this article through a RSS 2.0 feed