[Continuing from a post earlier today]
The second, and not unrelated, announcement had to do with the end of printed versions of the Red Books, which have traditionally represented LCSH in its most official form. In the LC report to CC:DA the cessation of publication of the Red Books was announced:
In 2012, LC conducted an extensive study on the impact and opportunities of changes in the bibliographic framework and the technological environment on the future distribution of its cataloging data and products. LC’s transition from print to online-only for cataloging documentation is a response to a steadily declining customer base for print and the availability of alternatives made possible by advances in technology. This shift will enable the Library to achieve a more sustainable financial model and better serve its mission in the years ahead.
Certainly there’s not much to argue with here–consumers have spoken, and LC, like every institution and service provider, needs to pay attention. But more troubling is what the online-only policy really means. The announcement includes some information on a planned PDF version of LCSH, and points to that PDF, plus the Cataloger’s Desktop and Classification Web products (both behind paywalls) as the remaining complete and up-to-date options.
Notable for its absence in that announcement is any comment on LCSH on id.loc.gov. Many of us are well aware of the gaps that make this version less than complete and up-to-date, and indeed the introduction to the service points out that:
LCSH in this service includes all Library of Congress Subject Headings, free-floating subdivisions (topical and form), Genre/Form headings, Children’s (AC) headings, and validation strings* for which authority records have been created. The content includes a few name headings (personal and corporate), such as William Shakespeare, Jesus Christ, and Harvard University, and geographic headings that are added to LCSH as they are needed to establish subdivisions, provide a pattern for subdivision practice, or provide reference structure for other terms. This content is expanded beyond the print issue of LCSH (the “red books”) with inclusion of validation strings. *Validation strings: Some authority records are for headings that have been built by adding subdivisions. These records are the result of an ongoing project to programmatically create authority records for valid subject strings from subject heading strings found in bibliographic records. The authority records for these subject strings were created so the entire string could be machine-validated. The strings do not have broader, narrower, or related terms.
It’s not clear to me that the caveats in this introduction are either widely read or completely understood, but I think a survey of random catalogers to ask how useful the service is and what it includes and doesn’t, you’d get a wide variety of responses. And of course, the updating strategy for the subject headings operates under the same ‘versioning’ pattern as the relators: files are reloaded periodically and there isn’t much in the way of the sort of versioning that could support any kind of notifications to users or support for updating linked data using LCSH outside of ILS’s or traditional central services like OCLC.
What LC has done could serve as a case study in how not to handle versioning of semantics in a public vocabulary. If we accept the premise that vocabulary semantics will change, there are very few methods to create stable systems that can rely on linked data. One option (preferred) is to use vocabularies from systems that provide stable URIs for past, present, and future versions of the vocabulary or (not preferred) to create a local, stable shadow vocabulary and map the local vocabulary to the public vocabulary over which you have little or no control. Mapping vocabularies in this way gives you the opportunity to maintain the semantic stability of your own system, your own ‘knowledge base’, while still providing the ability to maintain semantic integration with the global pool of linked data. Clearly, this is an expensive proposition. And it’s not as if these issues of reuse vs. extension are not currently under heavy discussion in a number of contexts: on the public schema.org discussion lists, for instance.
There are a number of related issues here that would also benefit from broader discussion. Large public vocabularies have tended to make an incomplete transition from print to online, getting stuck, like LC, attempting to use the file management processes of the print era to manage change behind a ‘service’ front end that isn’t really designed to do the job it’s being asked to do. What needs to be examined, soon and in public, is what the relationship is between these files and the legacy data which hangs over our heads like a boulder of Damocles. Clearly, we’re not just in need of access to files (whether one at a time or in batches) but require more of the kinds of services that support libraries in managing and improving their data. These needs are especially critical to those organizations engaged in the important work of integrating legacy and project data, and trying to figure out a workflow that allows them to make full use of the legacy public vocabularies.
Ignoring or denying these issues as important changes are made to the vocabularies that LC manages, on behalf of the cultural heritage communities across the globe, does a disservice to everyone. No one expects LC to come up with all the answers, just as they could not be expected (in the past, or now) to build the vocabularies themselves without the help of the community. NACO, SACO and PCC were, and are, models of collaboration. Why not build on that strength and push more of the discussion about needs and solutions into that same eager, and very competent, community?