Not long ago I encountered the analysis of BibFrame published by Rob Sanderson with contributions by a group of well-known librarians. It’s a pretty impressive document–well organized and clearly referenced. But in fact there’s also a significant amount of personal opinion in it, the nature of which is somewhat masked by the references to others holding the same opinion.

I have a real concern about some of those points where an assertion of ‘best practices’ are particularly arguable. The one that sticks in my craw particularly shows up in section 2.2.5:

2.2.5 Use Natural Keys in URIs
References: [manning], [ldbook], [gld-bp], [cooluris]

Although the client must treat URIs as opaque strings, it is good practice to construct URIs in a systematic and human readable fashion for both instances and ontology terms. A natural key is one that appears in the information about the resource, such as some unique identifier for the resource, or the label of the property for ontology terms. While the machine does not care about structure, memorability or readability of URIs, the developers that write the code do. Completely random URIs introduce difficult to detect semantic and algorithmic errors in both publication and consumption of the data.

Analysis:

The use of natural keys is a strength of BIBFRAME, compared to similarly scoped efforts in similar communities such as the RDA and CIDOC-CRM vocabularies which use completely opaque numbers such as P10001 (hasRespondent) or E33 (Linguistic Entity). RDA further misses the target in this area by going on to define multiple URIs for each term with language tagged labels in the URI, such as rda:hasRespondent.en mapping to P10001. This is a different predicate from the numerical version, and using owl:sameAs to connect the two just makes everyone’s lives more difficult unnecessarily. In general, labels for the predicates and classes should be provided in the ontology document, along with thorough and understandable descriptions in multiple languages, not in the URI structure.

This sounds fine so long as you accept the idea that ‘natural’ means English, because, of course, all developers, no matter their first language, must be fluent enough in English to work with English-only standards and applications. This mis-use of ‘natural’ reminds me of other problematic usages, such as the former practice in the adoption community (of which I have been a part for 40 years) where ‘natural’ was routinely used to refer to birth parents, thus relegating adoptive parents to the ‘un-natural’ realm. So in this case, if ‘natural’ means English, are all other languages inherently un-natural in the world of development? The library world has been dominated by the ‘Anglo-American’ notions of standard practice for a very long time, and happily, RDA is leading away from that, both in governance and in development of vocabularies and tools.

The Multilingual strategy adopted by RDA is based on the following points:

  1. More than a decade of managing vocabularies has convinced us that opaque identifiers are extremely valuable for managing URIs, because they need not be changed as labels change (only as definitions change). The kinds of ‘churn’ we saw in the original version of RDA (2008-2013) convinced us that label-based URIs were a significant problem (and cost) that became worse as the vocabularies grew over time.
  2. We get the argument that opaque URIs are often difficult for humans to use–but the tools we’re building (the RDA Registry as case in point) are intended to give human developers what they want for their tasks (human readable URIs, in a variety of languages) but ensure that the URIs for properties and values are set up based on what machines need. In this way, changes in the lexical URIs (human-readable) can be maintained properly without costly change in the canonical URIs that travel with the data content itself.
  3. The multiple language translations (and distributed translation management by language communities) also enable humans to build discovery and display mechanisms for users that are speakers of a variety of languages. This has been a particularly important value for national libraries outside the US, but also potentially for libraries in the US meeting the needs of non-English language communities closer to home.

It’s too easy for the English-first library development community to insist that URIs be readable in English and to turn a blind eye to the degree that this imposes understanding of the English language and Anglo-American library culture on the rest of the world. This is not automatically the intellectual gift that the distributors of that culture assume it to be. It shouldn’t be necessary for non-Anglo-American catalogers to learn and understand Anglo-American language and culture in order to express metadata for a non-Anglo audience. This is the rough equivalent of the Philadelphia cheese steak vendor who put up a sign reading “This is America. When ordering speak in English”.

We understand that for English-speaking developers http://bibframe.org/vocab/title is initially easier to use than http://rdaregistry.info/Elements/w/P10088 or even (heaven forefend!) “130_0#$a” (in RDF: http://marc21rdf.info/elements/1XX/M1300_a). That’s why RDA provides http://rdaregistry.info/Elements/w/titleOfTheWork.en but also, eventually, http://rdaregistry.info/Elements/w/拥有该作品的标题.ch and http://rdaregistry.info/Elements/w/tieneTítuloDeLaObra.es, et al (you do understand Latin of course). These ‘unnatural’ Lexical Aliases will be provided by the ‘native’ language speakers of their respective national library communities.

As one of the many thousands of librarians who ‘speak’ MARC to one another–despite our language differences–I am loathe to give up that international language to an English-only world. That seems like a step backwards.

By Diane Hillmann, January 3, 2016, 5:05 pm (UTC-5)

Add your own comment or set a trackback

Currently 1 comment

  1. Comment by Karen Coyle

    Logically, the best “best practice” would be for our tools to use labels, since URIs are for machines, not humans. The catch that I see is that URIs are unique, and labels are not. So the “readable” URI is unique within that domain, which may mean that it’s not exactly what you’d like to show the person doing input (titleOfTheWork). If we want to use labels then we have to think about creating unique, unambiguous labels within each namespace. That’s no more difficult than coming up with the strings that make up the readable URI, but that is a function that then limits our freedom in developing human-facing labels. I begin to think that we need two sets of labels: those for developers, that are unique and may not be highly intuitive, and those for users, which can be as friendly as we’d like them to be, even using natural language phrases. It’s the difference between “titleOfTheWork” and “title of the work”. We need both because they serve different functions.

Add your own comment



Follow comments according to this article through a RSS 2.0 feed