The Library of Congress recently made a couple of announcements, which I’ve been thinking about in the context of the provision of linked data services.
In May, LC announced that a ‘reconciliation’ had been done on the LC relators, in part to bring them into conformance with RDA role terms. This is not at all a bad thing, but the manner in which the revisions were accomplished and presented on id.loc.gov points up some serious issues with the strategy LC is currently using to manage these vocabularies.
As part of this ‘reconciliation’ LC made a variety of changes to the old list. Some definitions were changed, but in most cases the code and derived URI remained the same, creating a situation where the semantics become unreliable. It’s not easy to determine which ones have changed, because the old file was overwritten, the previous version can’t be accessed through the service, and as far as I can tell, there’s no definitive list of changes available. The only clue in the new file are ‘Change Notes’–textual notes with dates of changes–though what changes are not specified. An example can be found under the term ‘Binder’ (code bnd), where the change note has two dates:
In another example, the term ‘Film editor’, the definition now starts: “A person who, following the script and in creative cooperation with the Director, selects, arranges, and assembles the filmed material, …” whereas the old usage note referred to “… a person or organization who is an editor of a motion picture film …”. This is a clear and significant change of definition because the reference to the organization entity has been dropped. Curiously the definition for the term ‘Scenarist’ continues to refer to “A person or organization who is the author of a motion picture screenplay …”, although the definition was changed at the same time. Perhaps the difference occurs because the change note for ‘Film editor’ refers to “FIAF”, which is probably the International Federation of Film Archives (the announcement refers to FIAT, a probable typo).
This M.O. may be perfectly satisfactory to support most human uses of the vocabulary, but it is clearly not all that useful for machines operating in a linked data environment. I was alerted to some of these issues by a colleague building a map based on the prior version, which now needs to be completely revised (and without a list of changes, this becomes a very laborious process). It’s also my understanding that the JSC just recently updated some of the relationship definitions for the most recent update of the RDA Toolkit, which are now out-of-sync with the ‘reconciled’ relator terms.
A number of questions arise as a result of this, perhaps chief among them the basic one of whether it makes sense to reconcile these vocabularies at all. Because this work was not discussed publicly before the reconciled vocabulary was unveiled (I might be wrong about this, but I’m sure someone will correct me if I missed something), the potential effect on legacy data is unknown, as are any other options for dealing with the issues created by lack of established process or opportunity for public comment. If you accept the premise that we will continue to live in an environment of multiple vocabularies for multiple uses, there are other strategies–mapping and extension, for instance–that might have a better chance to improve usefulness while avoiding the kinds of reliability and synchronization problems these changes bring to the fore.
In addition to the process issues, a strong case could be made that the current services presented under the id.loc.gov umbrella might benefit from some discussion about how the data is intended to be used and managed. Not everyone is tied to traditional ILSs now, and perhaps fewer will be in future, if current interest in linked data continues. Are all users of these vocabularies going to be expected to flush their caches of data every time a new ‘version’ of the underlying file is loaded? How would they know of change happening behind the scenes (unless, of course, they are careful readers of LC’s announcements)? If LC expects to provide services for linked data users, these issues must be discussed openly and use cases defined so that appropriate decisions are enabled. At a minimum, these practices need to be examined in the context of linked data principles that call for careful change to definitions and URIs to minimize surprises and loss of backward compatibility.
[To be continued]