In my post last week, I mentioned a paper that Gordon Dunsire, Jon Phipps and I had written for the IFLA Satellite Meeting in Paris last month “Linked Data in Libraries: Let’s make it happen!” (note the videos!). I wanted to talk about the paper and why we wrote it, but I’m not just going to summarize it–I wouldn’t want to spoil the paper for anyone!

The paper, “Versioning Vocabularies in a Linked Data World”, was written in part because we’d seen far too many examples of vocabulary management and distribution that paid little or no attention to the necessity to maintain vocabularies over time and to make them available (over and over again, of course) to the data providers using them. It goes without saying that the vocabularies were expected to change over time, but in too many cases, vocabulary owners distributed changes in document form, or as files with new data embedded but no indication of what had changed, or worse: nothing.

We have been thinking about this problem for a long time. Even the earliest instance of the NSDL Registry (precursor of the current Open Metadata Registry, or OMR, as we like to call it) incorporated a ‘history’ view of the data, basically the ‘who, what, when’ of every change made in every vocabulary. Later on, we added the ability to declare ‘versions’ of the vocabularies themselves, taking advantage of that granular history data, for those trying to manage the updating of their ‘product’ in a rational manner. Sadly enough, not very many of our users took advantage of that feature, and we’re not entirely sure why not, but there it was. Jon has always been frustrated with our first passes at this problem, and after Gordon and I discussed the problem with others at DC-2013 last year, and my rant about the lack of version control on came out, it seemed time to think about the issue again.

At that point we were also planning our own big time versioning event: the unpublished first version of the RDA Element Sets were about to make their re-debut in ‘published’ form, reorganized, and with new URIs. Jon was also working on the GitHub connection with the OMR underlying the new RDA Registry site, working in a more automated mode as planned. He and Gordon and I had been discussing a new approach for some time, based on the way software is versioned and distributed, which is well-supported in Git and GitHub. So, as we drove back from ALA Midwinter in Philadelphia in January of last year, Jon and I blocked out the paper we’d agreed to do with Gordon on how we thought versioning should work in the semantic vocabulary world.

Consider: how do all of us computer nerds update our applications? Do we have to go to all sorts of websites (sometimes, but not always, prompted by an email) to determine which applications have changed and invoke an update? Well, sure, sometimes we do (particularly when they want more money!), but since the advent of the App Store and Google Play, we can do our updates much more easily, and for the most part those updates are ‘pushed’ to us for decisions on whether we want to update or not, we are told in a general way what has changed, and we click … and it’s done.

This is the way updates should happen in the Semantic Web data world, increasingly dependent on element sets and value vocabularies to provide descriptions of products of all kinds in order to provide access, drive sales or eyeballs, or support effective connections between resources. Now that we’re all reconciled to using URIs instead of text (even if our data hasn’t yet made that transition), shouldn’t we consider an important upside of that change, a simpler and more useful way to update our data?

So, I’ll quit there–go read the paper and let us know what you think. Don’t miss Gordon’s slides from Paris, available on his website. Note especially the last question on his final slide: “Is it time to get serious about linked data management?” We think it’s past time. After all, ‘management’ is our middle name.

Note: As of this week the video of Gordon’s presentation in Paris is now available.

By Diane Hillmann, September 22, 2014, 12:01 pm (UTC-5)

Some of you have probably noted that we’ve been somewhat quiet recently, but as usual, it doesn’t mean nothing is going on, more that we’ve been too busy to come up for air to talk about it.

A few of you might have noticed a tweet from the PBCore folks on a conversation we had with them recently. There’s a fuller note on their blog, with links to other posts describing what they’ve been thinking about as they move forward on upgrading the vocabularies they already have in the OMR.

Shortly after that, a post from Bernard Vatant of the Linked Open Vocabularies project (LOV) came over the W3C discussion list for Linked Open Data. Bernard is a hero to those of us toiling in this vineyard, and LOV one of the go-to places for those interested in what’s available in the vocabulary world and the relationships between those vocabularies. Bernard was criticizing the recent release of the DBpedia Ontology, having seen the announcement and, as is his habit, going in to try and add the new ontology to LOV. His gripes fell into a couple of important categories:

* the ontology namespace was dereferenceable, but what he found there was basically useless (his word)
* finding the ontology content itself required making a path via the documentation at another site to get to the goods
* the content was available as an archive that needed to be opened to get to the RDF
* there was no versioning available, thus no way to determine when and where changes were made

I was pretty stunned to see that a big important ontology was released in that way–so was Bernard apparently, although since that release there has apparently been a meeting of the minds, and the DBpedia Ontology is now resident in LOV. But as I read the post and its critique my mind harkened back to the conversation with PBCore. The issues Bernard brought up were exactly the ones we were discussing with them–how to manage a vocabulary, what tools were available to distribute the vocabulary to ensure easy re-use and understanding, the importance of versioning, providing documentation, etc.

These were all issues we’d been working hard on for RDA, and are still working on behind the RDA Registry. Clearly, there are a lot of folks out there looking for help figuring out how to provide useful access to their vocabularies and to maintain them properly. We’re exploring how we might do similar work for others (so ask us!).

Oh, and if you’re interested on our take on vocabulary versioning, take a look at our recent paper on the subject, presented at the IFLA satellite meeting on LOD in Paris last month.

I plan on posting more about that paper and its ideas later this week.

By Diane Hillmann, September 15, 2014, 2:31 pm (UTC-5)

A few days ago, while catching up with list traffic on RDA-L, I stumbled on a conversation between two librarians that got me thinking. They were talking about the myriad of changes in their ILS’s designed to make MARC usable with RDA. It’s a topic I still see a lot of on the lists, and it always makes me grind my teeth.

One reason for the dental destruction is that invariably the changes are small and niggly ones, the sort that make life annoying for those catalogers trying to apply RDA in a world still defined by MARC-based systems. It’s hardly news that those systems are often inflexible, but that’s primarily because they were built to supply services in an environment where data sharing was very centralized, change happened no more frequently than every six months, and everyone was prepared and in sync by the time records started flowing containing updated structure. This world is either gone or on its last legs, depending on your perspective.

What I saw underlying that conversation was the assumption that the only way change could happen was if the ILS’s themselves changed; in other words if the ILS vendors decided to lead rather than follow. The situation now is that system vendors say they’ll build RDA compliant systems when their customers ask for them, and libraries say that they’ll use ‘real’ RDA when there are systems that can support it. This is a dance of death, and nobody wins.

For years I’ve been telling librarians that they need to bug their vendors about this state of affairs, but I’m not sure there’s much of a future in that strategy, given that most of the librarians are not yet able to tell the vendors what they want in any detail, and the vendors have been unwilling to build their expertise or invest in any substantive way until they think they have customers ready to buy. This strategy of ‘wait and see’ undoubtedly has its attractions–given that it’s cheap–and the vendors don’t yet see that their current customer base has any alternatives.

But there are alternatives, albeit ones that require some initiative and investment by either libraries or vendors willing to step forward and perhaps gain a competitive advantage. In essence this strategy is based on the notion that if vendors won’t smarten up their systems, a solution that treats the vendor systems like ‘dumb consumers’ might be the best bet for moving forward. If indeed the tight coordination necessary in the past to share data and manage change is no longer optimal, distributed data need not use an ILS as its primary ‘node’ for storage and management of data. It could just sit there, waiting for some other machine that wished to share data (in or out), and still run its functions for the OPAC and pass data to OCLC.

But I suggest it’s no longer necessary for that ILS to be the center of our concern and attention, particularly if we see value in participating in the linked data world. The functions of creating and maintaining data could be accomplished elsewhere, preferably in a ‘system’ (maybe just a cache with services) designed to ingest, manage, and expose statement-based data in a variety of formats–including MARC, for as long as we want. Thus, the export of library data and the serving of it to users via an OPAC could be accomplished without giving up the MARC-based ILS, cutting the cord to OCLC, or upsetting current partners and collaborators.

Yes, we’d still have some work to do, but not nearly as much as we think. We have long understood that the old ideal of a fully ‘integrated’ system is unnecessarily cumbersome and limits our ability to improve our services. Some libraries have bolted discovery platforms on their OPAC that meet their needs better than their ILS does. Others use ERM systems to manage electronic subscriptions. Maybe it’s time to do the same with some of the other backend services that have passed their sell-by date.

Let the dis-integration begin!

By Diane Hillmann, February 18, 2014, 4:33 pm (UTC-5)

Presentations on innovative ways to gather data outside the library silo are happening all over ALA–generally hosted by committees and interest groups using speakers already planning to be at the conference. A great example of the kind of presentation I’m talking about was the Sunday presentation sponsored by the ALCTS CaMMS Cataloging & Classification Research Interest Group produced by the ProMusicaDB project, with founder Christy Crowl and metadata librarian Kimmy Szeto. They provided a veritable feast of slides and stories, all of them illustrating the new ways that we’ll all be operating in the very near future. Their slides should be available on the ALCTS Cataloging and Classification Research IG site sometime soon. [Full disclosure: I spoke at that session too--see previous blog post for more details.]

On the Saturday of Midwinter, I attended 2 parts of the CC:DA meeting (I had to leave to do a presentation to another group in the middle), but I dutifully returned for the last part. It was probably a mistake–my return occurred during the last gasp of a perfectly awful discussion. I had a brief chat with Peter Rolla (the current chair) after the meeting, and continued to think about why I was so appalled during the last part of the meeting. Later, when held hostage in a meeting by a conversation in which I had little interest, I wrote up some of my thoughts.

I would describe the discussion as one of the endless number of highly detailed conversations on improving the RDA rules that have been a “feature” of CC:DA meetings for the past few years. To be honest, I have a limited tolerance for such discussions, though I usually enjoy some of the ones at a less excruciating level of detail.

Somehow this discussion struck me as even more circular than most, and seemed to be aimed at “improving” the rules by limiting the choices allowed to catalogers–in a sense by mechanizing the descriptive process to an extreme degree. Now, I’m no foe of using automated means to create descriptive metadata, either as a sole technique or (preferably) for submission to catalogers or editors to complete. I think we ought to know a lot more about what can be done using technology rather than continue to flog any remaining potential for rule changes intended to push catalogers to supply a level of consistency that isn’t really achievable for humans. If you want consistency–particularly in transcription–use machines. Humans are far better utilized for reviewing the product and correcting errors and adding information to improve its usefulness.

But in cataloging circles, discussing the use of automated methods is generally considered off-topic. When the [technological] revolution comes, catalogers will be the first to go, or so it is too often believed. Copy cataloging and other less ‘professional’ means of cutting costs and increasing productivity is not a happy topic of conversation for this group.

But, looking ahead, I see no letup in this trajectory without some changes. Catalogers love rules, and rules are endlessly improvable, no? Maybe, maybe not, but just put a tech services administrator in the room for some of these discussions, and you’re likely to get a reaction pretty close to mine. But to my mind, the total focus on rules rather than a more practical approach to address the inevitability of change in the business of cataloging is doing more towards ensuring that the human role in the process will be limited in ways that make little sense, except monetarily.
What we need here is to change the conversation, and no group is more qualified to do that than CC:DA. To do that it’s absolutely necessary that its membership become more knowledgeable about what is now possible in automating metadata creation. Without that kind of awareness, it’s impossible to start thinking and discussing how to focus less of CC:DA’s efforts on that part of the cataloging process which should be done by machines, and more on what still needs humans to accomplish. There are several ways to do this. One is by dedicating some of CC:DA’s conference time to bringing in those folks who understand the technology issues to demonstrate, discuss, and collaborate.

Catalogers and their roles have been changing greatly over the past few years, and promises of more change must be taken seriously. Then the ultimate question might be asked: if resistance is futile (and it surely is), how can catalogers learn enough to help frame that change?

By Diane Hillmann, February 5, 2014, 4:23 pm (UTC-5)

Prior to Midwinter I posted the list of presentations I was doing over the course of Midwinter. It seemed only fair to report on some of those sessions, and to share my slides. I thought about posting them separately, since my posts tend to balloon fairly significantly once I get writing (those of you who know me are free to point out that I talk like that, too)–but given that I’ve decided to post more often, y’all will have to live with length.

Saturday, January 25, 2014, 3:00-4:00 p.m., “A Consideration of Holdings in the World Beyond MARC” [Slides]

There were two speakers at this session, at which I spoke second. The first speaker was Rebecca Guenther, who spoke on BibFrame generally as well as the BibFrame approach to holdings. BibFrame currently has a fairly simple approach, for now limited to the simpler holdings needs for non-serials. This is the easy [easier?] part of course, and it will be interesting to see how serial holdings will be integrated with the model.

My presentation briefly surveyed other important holdings work in progress, including a project at the Deutschen National Bibliothek (DNB), the ONIX for Serials Coverage Statement, the current proposals for, to a brief report on a project my group is considering that would do for MARC Holdings (sample) what we’ve already set up for MARC Bibliographic data at

What struck me when I was setting up the presentation was the amazing variety of work going on in this area. I really didn’t expect that, I confess. But by immersing myself in holdings as I hadn’t done for many moons, I found I was looking at an awful lot of very recent work. And it wasn’t just the diversity of approaches that surprised me, but the varied results as well. The efforts ran the gamut from very complex and comprehensive approaches (ONIX and MFHD) to much simpler approaches. The functions anticipated for each colored the diverse outcomes to a great extent. The ONIX XML schema was easily the most complex–with some ideas based on the MFHD work.

The effort is, like BibFrame, still in the process of jelling. When looking for the evidence of holdings, I found myself on a path that had already been abandoned (though it then showed no signs of abandonment). Richard Wallis pointed me at the right place, and the slides have been corrected to fix that problem.

Sunday, January 26, 2014, 8:30-10:00 a.m., “The Other Side of Linked Data: Managing Metadata Aggregation” [Slides]

This session also included two presentations, mine was first this time. My focus was that most people think Linked Open Data (LOD) is about libraries exposing their data to the world, but that’s only half of LOD. The other half is taking advantage of the data others (libraries and non-libraries) are exposing openly. The two fundamental things about the LOD world are both ideas that tend to explode minds. First is the realization that we’re not talking about highly OCLC-curated MARC records, pre-aggregated for easy ingest into traditional library systems. Instead, we are talking about management of statements (which may indeed be records as originally ingested, but to be useful in this multiple choice world must be shredded on the way in and re-aggregated on the way out.) There are many new skills we’ll have to learn (and an awful lot of assumptions that we’ll need to examine closely and maybe toss out the window). This is daunting, but hardly rocket surgery, and the sooner we get going, the better off we’ll be.

The second presentation was from a group working at the Digital Public Library of America (DPLA), which is confronting many of these issues. Their announcement stated:

“This talk will introduce and outline the challenges of aggregating disparate metadata flavors from the perspective of both DPLA staff and representative hubs. We will review next steps and emerging frontiers as well, including improvements to normalization at the hub level and wider adoption of controlled vocabularies and formats for geospatial metadata and usage rights statements.”

And this was exactly what they did. They provided a very juicy look at the real world that faces anyone attempting to deal with the current metadata chaos. This is definitely work to follow, because where they are now will change over time and with experience, providing the rest of us with some really useful insights. Their slides are available from the IG site.

Sunday, January 26, 2014, 10:30-11:30 a.m., “Mapmakers” [Slides]

The Mapmakers presentation was designed to highlight some research I’ve been involved in, along with my colleagues Jon Phipps and Gordon Dunsire. This topic has not received a vast following, but should as experience with new schemas and value vocabularies expands. As is usual, there was another presentation just before ours that gave an exciting view of innovative work in expanding our notion of authority, in particular gathering and managing data from a broad variety of sources. Their work is encountering very similar challenges as the DPLA, though in some ways even more challenging since they often have to develop the sources and bring them into the LOD world.

That presentation focused on work done in the ProMusicaDB project, with founder Christy Crowl and metadata librarian Kimmy Szeto sharing the podium. There was a feast of slides and stories, all of them illustrating the new ways that we’ll all be operating in the very near future. Their slides (including a demo) should be available on the ALCTS Cataloging and Classification Research IG site very shortly (though not yet as of this writing).

By Diane Hillmann, February 3, 2014, 12:14 pm (UTC-5)
Saturday, January 25, 2014, 3:00-4:00 p.m., A Consideration of Holdings in the World Beyond MARC [PCC 203B] Sunday, January 26, 2014, 8:30-10:00 a.m., The Other Side of Linked Data: Managing Metadata Aggregation [PCC 102A] Sunday, January 26, 2014, 10:30-11:30 a.m., Mapmakers [PCC 102A]

Most ALA watchers have noticed a shift from ‘invited talks’ at Interest Group and Committee meetings to requests for proposals from the chairs, from which pool the speakers are chosen. This is, of course, in parallel with changes going on with other professional conferences, and it’s an interesting shift for a number of reasons.

There’s a democratization aspect to this change–the chairs are no longer limited in their choice to people they already know about, thereby potentially increasing the possibility that new and different ideas will get an airing. Maybe this Midwinter someone will come up with an absolutely wonderful and unexpected presentation that rockets the speaker from the unknown mob to the smaller roster of interesting known speakers. This is a good thing, I believe, even though the chance of witnessing such a rocket launch are dauntingly small.

As someone who has been around long enough (and noisily, it must be said) this shift means that I don’t need to wait for invitations to do presentations based on some chair’s idea of what might interest their group (but may no longer interest me), I can go ahead and respond to the calls that are appealing to me. I’d like to think that the result is something fresh enough to be interesting for me to prepare and an audience to listen to, without being totally divorced from prior talks that represent earlier phases. An odd result of this shift in process is that speakers who submit proposals to various committees don’t generally know who else will be speaking at a particular program until after their proposal has been approved, and maybe not even then. This particular aspect has already led to some very interesting lineups at meetings across the conference.

Because I take seriously the idea of not re-using previous talks to the extent that I could become horribly boring, I tend to apply for things that allow me to explore something that isn’t unrelated to what I’ve done before, but at least requires that I rethink something or try a different approach than I’ve used before to expose what I (and the people I work with) are thinking about. I think that’s pretty much what most audiences are looking for, right?

So below are my talks for ALA Midwinter. I may be accompanied by one or another of my colleagues on a couple of these, and will surely have their help building the presentations.

A Consideration of Library Holdings in the World Beyond MARC

Of all the MARC 21 formats, Holdings was the one most clearly designed for machine manipulation. It is granular, flexible, and intended to be used at either a detailed or summary level. It has sometimes frightened potential users because it looks complex (even where it isn’t), and in its ‘native’ form is not particularly human friendly. Some of the complexity arises because there are both display and prediction aspects in the encoding, and not all library systems have developed predictive serial check-in systems supported by MARC Holdings.

Some of the bibliographic metadata efforts now going forward ignore the existing MARC Holdings, sometimes in favor of simpler solutions based on the perception of the waning need for predictive check-in for digital subscriptions. Not much effort has been expended to bring the MARC Holdings format forward into the discussions about changing requirements and re-use of existing standards.

For the ALCTS CRS Committee on Holdings Information, Saturday, January 25, 2014, 3:00-4:00 p.m., PCC 203B.

Holdings has been an interest of mine since I was a law librarian representing the American Association of Law Libraries on MARBI. In the early computer era in libraries, where digital publication was the exception, law publishers demonstrated a great deal of creativity in their publication of updating services, from loose-leaf services and regular republication of standard tools, and law catalogers always had the best examples of holdings problems. These days, most of those materials have been subsumed by various digital tools, which have their own complexities, particularly in the context of versions, republication and compilation.

But the question remains–has what we learned from the pre-digital world of holdings functionality have relevance in the digital era?

The Other Side of Linked Data: Managing Metadata Aggregation

Most of the current activity in the library LOD world has been on publishing library data out of current silos. But part of the point of linked data for libraries is that it opens up data built by others for use within libraries, and has the potential for greater integration of library data within the larger data world. The sticking point for most librarians is that data building and distribution outside the familiar world of MARC seems like a black box, the key held by others.

Traditionally, libraries have relied on specialized system vendors to build the functionality they needed to manage their data. But the discussions I’ve heard too often result in librarians wanting vendors to tell them what they’re planning, and vendors asking librarians what they need and want. In the context of this stalemate, it behooves both library system vendors and librarians to explore the issues around management of more fine-grained metadata so that an informed dialogue around requirements can begin.

For the ALCTS Metadata Interest Group, Sunday, January 26, 2014, 8:30-10:00 a.m., PCC 102A

Transitioning from a rigidly record-based system to a more flexible environment where statement level information can be aggregated and managed is difficult to envision from the vantage point of our current MARC-based world. This has lead to a gap between what we know, and the wider world of linked open data we’d like to participate in. One of the critical steps is to understand how such a world might look, and what it requires of us and our systems. The goal is to be able to move some of that improved understanding to the point of innovation and development.


It’s very clear that there will be no single answer to moving bibliographic metadata into the world beyond MARC, no direct ‘replacement’ for the simple walled garden we all have lived in for 40+ years. While it’s certainly true that the emerging global universe of bibliographic description has continued to expand and seems more chaotic than ever, there are still commonalities of understanding with the world beyond our garden walls that we’re only beginning to identify. How then can we begin to expose our understanding to that universe and develop some consensus paths forward? Specifically, what are the possibilities for using semantic mapping to provide us with the flexibility and extensibility we need to build our common future.

For the ALCTS CaMMS Cataloging & Classification Research Interest Group, Sunday, Jan. 26, 10:30-11:30, PCC 102A.

Librarians too often see ‘mapping’ and think ‘crosswalking’, but the reality is that these are quite different strategies. Crosswalking was a natural fit for the MARC environment, where the ‘one, best’ crosswalk would logically be developed centrally and implemented as part of current application needs. But the limitations of crosswalking make much less sense as we transition into a world where the Semantic Web has begun to take hold (of our heads, if not our systems!).

In the Semantic Web world, maps can contain a variety of relationships (not just the crosswalk ‘same as’), and central development and control is neither necessary nor very useful. This doesn’t mean that we’re all on our own and that collaboration isn’t still our best strategy.

By Diane Hillmann, December 9, 2013, 3:26 pm (UTC-5)

[Continuing from a post earlier today]

The second, and not unrelated, announcement had to do with the end of printed versions of the Red Books, which have traditionally represented LCSH in its most official form. In the LC report to CC:DA the cessation of publication of the Red Books was announced:

In 2012, LC conducted an extensive study on the impact and opportunities of changes in the bibliographic framework and the technological environment on the future distribution of its cataloging data and products. LC’s transition from print to online-only for cataloging documentation is a response to a steadily declining customer base for print and the availability of alternatives made possible by advances in technology. This shift will enable the Library to achieve a more sustainable financial model and better serve its mission in the years ahead.

Certainly there’s not much to argue with here–consumers have spoken, and LC, like every institution and service provider, needs to pay attention. But more troubling is what the online-only policy really means. The announcement includes some information on a planned PDF version of LCSH, and points to that PDF, plus the Cataloger’s Desktop and Classification Web products (both behind paywalls) as the remaining complete and up-to-date options.

Notable for its absence in that announcement is any comment on LCSH on Many of us are well aware of the gaps that make this version less than complete and up-to-date, and indeed the introduction to the service points out that:

LCSH in this service includes all Library of Congress Subject Headings, free-floating subdivisions (topical and form), Genre/Form headings, Children’s (AC) headings, and validation strings* for which authority records have been created. The content includes a few name headings (personal and corporate), such as William Shakespeare, Jesus Christ, and Harvard University, and geographic headings that are added to LCSH as they are needed to establish subdivisions, provide a pattern for subdivision practice, or provide reference structure for other terms. This content is expanded beyond the print issue of LCSH (the “red books”) with inclusion of validation strings. *Validation strings: Some authority records are for headings that have been built by adding subdivisions. These records are the result of an ongoing project to programmatically create authority records for valid subject strings from subject heading strings found in bibliographic records. The authority records for these subject strings were created so the entire string could be machine-validated. The strings do not have broader, narrower, or related terms.

It’s not clear to me that the caveats in this introduction are either widely read or completely understood, but I think a survey of random catalogers to ask how useful the service is and what it includes and doesn’t, you’d get a wide variety of responses. And of course, the updating strategy for the subject headings operates under the same ‘versioning’ pattern as the relators: files are reloaded periodically and there isn’t much in the way of the sort of versioning that could support any kind of notifications to users or support for updating linked data using LCSH outside of ILS’s or traditional central services like OCLC.

What LC has done could serve as a case study in how not to handle versioning of semantics in a public vocabulary. If we accept the premise that vocabulary semantics will change, there are very few methods to create stable systems that can rely on linked data. One option (preferred) is to use vocabularies from systems that provide stable URIs for past, present, and future versions of the vocabulary or (not preferred) to create a local, stable shadow vocabulary and map the local vocabulary to the public vocabulary over which you have little or no control. Mapping vocabularies in this way gives you the opportunity to maintain the semantic stability of your own system, your own ‘knowledge base’, while still providing the ability to maintain semantic integration with the global pool of linked data. Clearly, this is an expensive proposition. And it’s not as if these issues of reuse vs. extension are not currently under heavy discussion in a number of contexts: on the public discussion lists, for instance.

There are a number of related issues here that would also benefit from broader discussion. Large public vocabularies have tended to make an incomplete transition from print to online, getting stuck, like LC, attempting to use the file management processes of the print era to manage change behind a ‘service’ front end that isn’t really designed to do the job it’s being asked to do. What needs to be examined, soon and in public, is what the relationship is between these files and the legacy data which hangs over our heads like a boulder of Damocles. Clearly, we’re not just in need of access to files (whether one at a time or in batches) but require more of the kinds of services that support libraries in managing and improving their data. These needs are especially critical to those organizations engaged in the important work of integrating legacy and project data, and trying to figure out a workflow that allows them to make full use of the legacy public vocabularies.

Ignoring or denying these issues as important changes are made to the vocabularies that LC manages, on behalf of the cultural heritage communities across the globe, does a disservice to everyone. No one expects LC to come up with all the answers, just as they could not be expected (in the past, or now) to build the vocabularies themselves without the help of the community. NACO, SACO and PCC were, and are, models of collaboration. Why not build on that strength and push more of the discussion about needs and solutions into that same eager, and very competent, community?

By Diane Hillmann, July 23, 2013, 3:20 pm (UTC-5)

The Library of Congress recently made a couple of announcements, which I’ve been thinking about in the context of the provision of linked data services.

In May, LC announced that a ‘reconciliation’ had been done on the LC relators, in part to bring them into conformance with RDA role terms. This is not at all a bad thing, but the manner in which the revisions were accomplished and presented on points up some serious issues with the strategy LC is currently using to manage these vocabularies.

As part of this ‘reconciliation’ LC made a variety of changes to the old list. Some definitions were changed, but in most cases the code and derived URI remained the same, creating a situation where the semantics become unreliable. It’s not easy to determine which ones have changed, because the old file was overwritten, the previous version can’t be accessed through the service, and as far as I can tell, there’s no definitive list of changes available. The only clue in the new file are ‘Change Notes’–textual notes with dates of changes–though what changes are not specified. An example can be found under the term ‘Binder’ (code bnd), where the change note has two dates:

2013-05-15: modified
1970-01-01: new

In another example, the term ‘Film editor’, the definition now starts: “A person who, following the script and in creative cooperation with the Director, selects, arranges, and assembles the filmed material, …” whereas the old usage note referred to “… a person or organization who is an editor of a motion picture film …”. This is a clear and significant change of definition because the reference to the organization entity has been dropped. Curiously the definition for the term ‘Scenarist’ continues to refer to “A person or organization who is the author of a motion picture screenplay …”, although the definition was changed at the same time. Perhaps the difference occurs because the change note for ‘Film editor’ refers to “FIAF”, which is probably the International Federation of Film Archives (the announcement refers to FIAT, a probable typo).

This M.O. may be perfectly satisfactory to support most human uses of the vocabulary, but it is clearly not all that useful for machines operating in a linked data environment. I was alerted to some of these issues by a colleague building a map based on the prior version, which now needs to be completely revised (and without a list of changes, this becomes a very laborious process). It’s also my understanding that the JSC just recently updated some of the relationship definitions for the most recent update of the RDA Toolkit, which are now out-of-sync with the ‘reconciled’ relator terms.

A number of questions arise as a result of this, perhaps chief among them the basic one of whether it makes sense to reconcile these vocabularies at all. Because this work was not discussed publicly before the reconciled vocabulary was unveiled (I might be wrong about this, but I’m sure someone will correct me if I missed something), the potential effect on legacy data is unknown, as are any other options for dealing with the issues created by lack of established process or opportunity for public comment. If you accept the premise that we will continue to live in an environment of multiple vocabularies for multiple uses, there are other strategies–mapping and extension, for instance–that might have a better chance to improve usefulness while avoiding the kinds of reliability and synchronization problems these changes bring to the fore.

In addition to the process issues, a strong case could be made that the current services presented under the umbrella might benefit from some discussion about how the data is intended to be used and managed. Not everyone is tied to traditional ILSs now, and perhaps fewer will be in future, if current interest in linked data continues. Are all users of these vocabularies going to be expected to flush their caches of data every time a new ‘version’ of the underlying file is loaded? How would they know of change happening behind the scenes (unless, of course, they are careful readers of LC’s announcements)? If LC expects to provide services for linked data users, these issues must be discussed openly and use cases defined so that appropriate decisions are enabled. At a minimum, these practices need to be examined in the context of linked data principles that call for careful change to definitions and URIs to minimize surprises and loss of backward compatibility.

[To be continued]

By Diane Hillmann, July 23, 2013, 2:35 pm (UTC-5)

Many of you have heard me say “Time flies, whether you’re having fun or not”–and that has certainly been the case since I got back from the NISO Roadmap meeting a few weeks ago. Somehow, with my head down, I missed part 1 of Roy Tennant’s post “The Post-MARC Era, Part 1: “If It’s Televised, It Can’t Be the Revolution”. I’m old enough to remember the 60’s and the call to revolution that Gil Scott-Heron referred to, and in fact had a small part in it–but since it WAS live, I’ve no evidence to present about my participation, you’ll just have to believe me.

On the other hand, I’ve been very involved in the revolution under discussion in the remainder of his post, and there’s quite a bit of video to confirm that, including at the beginning of the NISO Roadmap meeting, where Gordon Dunsire and I tossed a few thought-bombs out before the conversation got going. I think it validates Roy’s point about participation to say that the points we made came up frequently in the subsequent small group sessions, which were not, I believe, on the video feed. What I observed as a participant was that more than a few folks left with some new information and (I hope) some expanded thinking about what the revolution was about; more than they came in with.

Despite the fact that I’ve acquired an undeserved reputation for being a MARC hater, I actually think that we should continue to use the semantics of MARC, and get rid of the ancient encoding standard. It’s in some ways a Dr. Jekyll and Mr. Hyde problem we have here, and we’re about to kill the ‘wrong MARC’ in our exasperated search for something simpler, because we can’t seem to get clear about what MARC is and isn’t. The reality is that the MARC semantics represent the accumulated experience in library description from the days of the 3 x 5 card with the hole in the bottom (see Gordon Dunsire’s presentation on that evolution). We’ll clearly need to map the semantics of our legacy data forward, but that doesn’t require that we carry along the ‘classic’ MARC encoding. Consider the old days of the telegraph, where messages were encoded using dots and dashes. Those messages were translated into written English for end users, who didn’t need to know Morse Code to read them. Now we use telephone messaging and email for those kinds of communications, and Morse Code doesn’t figure in there anywhere.

In addition, we need to look past all those rarely used MARC fields, and recognize that they are only irrelevant in an environment that looks very much like our current one, with artisanal catalog records records and top-down standards development. That’s not really what we’re hoping for, as we wrap our minds around what an environment based on linked open data might free us to do differently. When systems were built to process MARC-encoded records, those systems needed to be updated at regular frequencies and all the sharing partners moved in lockstep. It was very expensive to manage the code that was the plumbing of those systems and the specialized fields didn’t add much value. But remember that each of the proposals for change were extensively discussed and formally accepted. I was there for many of those discussions, and recognize that not all of them were accepted, but a considerable number were, and then not always (or often) used after they were included in MARC. Before we label all that effort wasted, and attempt to re-litigate all those decisions, let’s take a closer look at the real costs of moving those forward, in the very different environment we’re envisioning, where the costs are differently distributed and everyone need not move in lockstep. It’s entirely possible that some new communities will find these specialized fields very relevant, even though libraries have not.

Roy quotes from the BibFrame announcement, which states:

“A major focus of the initiative will be to determine a transition path for the MARC 21 exchange format in order to reap the benefits of newer technology while preserving a robust data exchange that has supported resource sharing and cataloging cost savings in recent decades.”

It’s still unclear to me (and I’m not alone here), that we really needed a ‘transition path for the MARC 21 exchange format’. Why can’t we join the rest of the world, which is tootling along quite nicely, thank you, without a bespoke exchange format? We have several useful sets of semantics, built collaboratively over the past half century–why would we need to start over? I generally read the BibFrame discussions, but rarely participate, mostly because it all seems like a reinvention of something that doesn’t need reinventing, and I have no time for that. Whatever the BibFrame people come up with will be mappable to and from the other ongoing bibliographic standards, and whoever wants to use it for exchange can certainly do that, but it will never have the penetration in the library market that MARC has.

It’s also a bit mysterious what ‘preserving a robust data exchange’ actually means. Are we talking about maintaining the current exchange of records using OCLC as the centralized node through which everything passes? What part of that ‘preservation’ is about preserving the income streams inherent in the current distribution model? What is it about linked open data, without a central node, that isn’t robust enough?

Roy ends his post with something that I didn’t expect, but definitely applaud:

“Watching the NISO event over the last two days crystallized for me that I had fallen into the trap of thinking that the Library of Congress or NISO or OCLC (my employer) would come along and save us all. I forgot that for a revolution to occur it can’t come from the seats of the existing power structure. True change only happens when everyone is involved. Those organizations may implement and support what the changes that the revolution produces, but anything dictated from on high will not be a revolution. The revolution will not be piped into our cubicles, ready for easy consumption. The revolution will be live.”

We could start by no longer waiting for LC to deliver an RDF version of MARC 21, unencumbered by 50 year old encoding standards. We already have that, at Yeah, it needs some work, but it’ll get done a lot faster if we can get some help from the 99% of the library world. Give us a holler if you’re interested.

Clearly the revolution is not happening on the BibFrame discussion list, it is happening elsewhere.

By Diane Hillmann, May 14, 2013, 4:34 pm (UTC-5)

I saw the announcement a few weeks ago about the demise of MARBI and the creation of the new ALCTS/LITA Metadata Standards Committee. My first reaction was ‘uh oh,’ and I flashed back to the beginnings of the DCMI Usage Board. The DCUB still exists, but in a sort of limbo, as DCMI reorganizes itself after the recent change of leadership.

I was a charter member, and, with Rebecca Guenther, wrote up the original proposal for the organization of the group. It was based to some extent on MARBI–not a surprise, since Rebecca and I were veterans of that group. But there were some ambiguities in the plan for the UB that came back to bite us over the next few years–primarily having to do with essential questions about what the group was supposed to be doing, and how to accomplish its goals. These difficulties had little to do with the organizational aspects–how many members, questions of voting (which changed over time), or issues of documentation and dissemination, all of which were settled fairly easily when the group was set up (and can be found here.)

It struck me as I was reading the announcement, that it might be useful for me to revisit some of the issues that came up with the DCMI Usage Board while I was a member, and think about whether they are relevant to the new ALCTS/LITA Metadata Standards Committee. I hope this perspective may be useful for ALCTS and LITA as they get this committee going, because, frankly, I see dragons all over the place. [I should emphasize here that these are personal opinions, and don’t represent any position of the DCMI Executive group, of which I am a member.]

So, here’s a quote from the announcement describing the Committee’s responsibilities:

“The ALCTS/LITA Metadata Standards Committee will play a leadership role in the creation and development of metadata standards for bibliographic information. The Committee will review and evaluate proposed standards; recommend approval of standards in conformity with ALA policy; establish a mechanism for the continuing review of standards (including the monitoring of further development); provide commentary on the content of various implementations of standards to concerned agencies; and maintain liaison with concerned units within ALA and relevant outside agencies.”

I see a lot of big and important words in this paragraph, and would like to see some of them defined more carefully for this new context. For instance, what does ‘a leadership role in the creation and development of metadata standards’ really mean? The prospective committee members are folks who have day jobs and are likely to meet in person twice a year (perhaps in multiple meetings) at each ALA meeting, but they have been given an enormous brief, or so it seems.

First of all, what is a ‘standard’? Are ‘standards’ in this context only those which have been vetted by a standards body like NISO or ISO? Some ‘standards’ that are in relatively broad use in the bibliographic environment are in fact developed within the walls of just one institution (e.g., LC’s MODS, MADS, etc.) and though they may eventually acquire some mechanism for user participation, their definition as standards is largely self-declared by their managing institution. For that matter, how about metadata element sets developed by international bodies, like IFLA, or W3C, or Dublin Core? ALA is a voting member of NISO, which suggests to me that a clear definition of what a standard IS will be an essential step, even before an examination of the notion of what a ‘leadership role’ might be.

Then there’s the notion that standards (however defined) will be proposed to this committee for review and evaluation. Proposed by whom? Reviewed by what criteria, and evaluated by what mechanism?

For the DCUB, the brief of the group changed over time, as DCMI grew and shifted focus. At first, the UB’s brief was the review of proposals for new metadata terms. That turned out to be far more difficult than it seemed on the surface, because in order to evaluate those proposals, there first needed to be criteria for evaluation. Eventually it became clear that there were an infinite number of elements desired by an ever increasing number of communities, and whether any or all of these should be part of what was supposed to be a general set of properties became an issue. Finally, after much discussion, it was determined that the Dublin Core was not going to be the arbiter of all terms desired by all people, and the UB stopped reviewing proposals for new terms.

Another historical tidbit illustrates a possible pitfall. At one point (I’m afraid I can’t remember the timing on this), the UB was approached by a public broadcasting group that was developing a metadata schema based on Dublin Core, and they wanted us to review what they’d done and give them some feedback. So, the UB looked over what they’d done, and provided them with feedback–mostly about how they’d structured their schema, rather than the specific terms they used.

Some time later, it was pointed out to me that the Wikipedia entry on PBCore said that the UB had ‘reviewed’ their schema, in a manner implying that we’d given some stamp of approval, which we had certainly not done. Wikipedia being what it is, I went in and clarified the statement. You can probably see what I added by checking out the Wikipedia entry, and you might want to look at some of the PBCore vocabularies in the Open Metadata Registry Sandbox (this is a good example, but you’ll note that they didn’t get beyond “A”)

The RDA effort is a classic case of how much more difficult it is to develop standards than it seems at the start–and also how important process and timeliness are to the eventual determination of who will actually use the standard. The RDA development effort was started long enough ago that during the long process of development — originally begun as a classic closed-door-experts-only effort — the whole world changed.

In 2007, as part of that process, I got involved in the effort to build the vocabularies necessary for RDA to be used in a Semantic Web environment, in parallel with the continuing development of the guidance instruction and under the aegis of the DCMI/RDA Task Group (now the DCMI Bibliographic Metadata Task Group). The completion of that work (since 2009 in the hands of the JSC for review and publication), has stalled, as the JSC spends their limited time entertaining proposals for changing the guidelines that they just recently finished. Meanwhile, time continues to march ever onward, and many of those who were once waiting for the RDA vocabularies to be completed have concluded that they may never be, and have started looking elsewhere for metadata element sets.

In the meantime LC itself began it’s BibFrame project roughly two years ago. That effort, as it’s been described so far, seems unlikely to consider RDA as a significant part of its ‘solution’. Various other large users and purveyors of bibliographic data have begun to use a variety of build-your-own schemas to expose their data as linked data, the (somewhat) New Big Thing. It’s illustrative to note that these don’t tend to use RDA properties.

There was a time that MARC ruled the library world, and there’s still a nostalgia in some quarters for that world of many certainties and fewer choices. That time isn’t coming back, no matter how many new committees we set up to try to control the new, chaotic world of bibliographic data. The fact is that our world is moving too fast, and in our anxiety to get things ‘right’ we continue to build and maintain cumbersome ‘standards’ using complex processes that no longer work for us. We’re still trying to insist that the ‘continuing review’, ‘evaluation’ and ‘recommendation’ processes have clear value, but a realistic look at the current environment suggests that they may no longer be of value, or even possible.

I have no inside knowledge of how all this will come out, but I’d be much happier if the new ALCTS/LITA Metadata Standards Committee either receives or builds for itself a much clearer and achievable set of goals and tasks than they seem to have been given.

It’s a jungle out there.

By Diane Hillmann, October 26, 2012, 11:01 am (UTC-5)