A few days ago, while catching up with list traffic on RDA-L, I stumbled on a conversation between two librarians that got me thinking. They were talking about the myriad of changes in their ILS’s designed to make MARC usable with RDA. It’s a topic I still see a lot of on the lists, and it always makes me grind my teeth.

One reason for the dental destruction is that invariably the changes are small and niggly ones, the sort that make life annoying for those catalogers trying to apply RDA in a world still defined by MARC-based systems. It’s hardly news that those systems are often inflexible, but that’s primarily because they were built to supply services in an environment where data sharing was very centralized, change happened no more frequently than every six months, and everyone was prepared and in sync by the time records started flowing containing updated records. This world is either gone or on its last legs, depending on your perspective.

What I saw underlying that conversation was the assumption that the only way change could happen was if the ILS’s themselves changed; in other words if the ILS vendors decided to lead rather than follow. The situation now is that system vendors say they’ll build RDA compliant systems when their customers ask for them, and libraries say that they’ll use ‘real’ RDA when there are systems that can support it. This is a dance of death, and nobody wins.

For years I’ve been telling librarians that they need to bug their vendors about this state of affairs, but I’m not sure there’s much of a future in that strategy, given that most of the librarians are not yet able to tell the vendors what they want in any detail, and the vendors have been unwilling to build their expertise or invest in any substantive way until they think they have customers ready to buy. This strategy of ‘wait and see’ undoubtedly has its attractions–given that it’s cheap–and the vendors don’t yet see that their current customer base has any alternatives.

But there are alternatives, albeit ones that require some initiative and investment by either libraries or vendors willing to step forward and perhaps gain a competitive advantage. In essence this strategy is based on the notion that if vendors won’t smarten up their systems, a solution that treats the vendor systems like ‘dumb consumers’ might be the best bet for moving forward. If indeed the tight coordination necessary in the past to share data and manage change is no longer optimal, distributed data need not use an ILS as its primary ‘node’ for storage and management of data. It could just sit there, waiting for some other machine that wished to share data (in or out), and still run its functions for the OPAC and pass data to OCLC.

But I suggest it’s no longer necessary for that ILS to be the center of our concern and attention, particularly if we see value in participating in the linked data world. The functions of creating and maintaining data could be accomplished elsewhere, preferably in a ‘system’ (maybe just a cache with services) designed to ingest, manage, and expose statement-based data in a variety of formats–including MARC, for as long as we want. Thus, the export of library data and the serving of it to users via an OPAC could be accomplished without giving up the MARC-based ILS, cutting the cord to OCLC, or upsetting current partners and collaborators.

Yes, we’d still have some work to do, but not nearly as much as we think. We have long understood that the old ideal of a fully ‘integrated’ system is unnecessarily cumbersome and limits our ability to improve our services. Some libraries have bolted discovery platforms on their OPAC that meet their needs better than their ILS does. Others use ERM systems to manage electronic subscriptions. Maybe it’s time to do the same with some of the other backend services that have passed their sell-by date.

Let the dis-integration begin!

By Diane Hillmann, February 18, 2014, 4:33 pm (UTC-5)

Presentations on innovative ways to gather data outside the library silo are happening all over ALA–generally hosted by committees and interest groups using speakers already planning to be at the conference. A great example of the kind of presentation I’m talking about was the Sunday presentation sponsored by the ALCTS CaMMS Cataloging & Classification Research Interest Group produced by the ProMusicaDB project, with founder Christy Crowl and metadata librarian Kimmy Szeto. They provided a veritable feast of slides and stories, all of them illustrating the new ways that we’ll all be operating in the very near future. Their slides should be available on the ALCTS Cataloging and Classification Research IG site sometime soon. [Full disclosure: I spoke at that session too--see previous blog post for more details.]

On the Saturday of Midwinter, I attended 2 parts of the CC:DA meeting (I had to leave to do a presentation to another group in the middle), but I dutifully returned for the last part. It was probably a mistake–my return occurred during the last gasp of a perfectly awful discussion. I had a brief chat with Peter Rolla (the current chair) after the meeting, and continued to think about why I was so appalled during the last part of the meeting. Later, when held hostage in a meeting by a conversation in which I had little interest, I wrote up some of my thoughts.

I would describe the discussion as one of the endless number of highly detailed conversations on improving the RDA rules that have been a “feature” of CC:DA meetings for the past few years. To be honest, I have a limited tolerance for such discussions, though I usually enjoy some of the ones at a less excruciating level of detail.

Somehow this discussion struck me as even more circular than most, and seemed to be aimed at “improving” the rules by limiting the choices allowed to catalogers–in a sense by mechanizing the descriptive process to an extreme degree. Now, I’m no foe of using automated means to create descriptive metadata, either as a sole technique or (preferably) for submission to catalogers or editors to complete. I think we ought to know a lot more about what can be done using technology rather than continue to flog any remaining potential for rule changes intended to push catalogers to supply a level of consistency that isn’t really achievable for humans. If you want consistency–particularly in transcription–use machines. Humans are far better utilized for reviewing the product and correcting errors and adding information to improve its usefulness.

But in cataloging circles, discussing the use of automated methods is generally considered off-topic. When the [technological] revolution comes, catalogers will be the first to go, or so it is too often believed. Copy cataloging and other less ‘professional’ means of cutting costs and increasing productivity is not a happy topic of conversation for this group.

But, looking ahead, I see no letup in this trajectory without some changes. Catalogers love rules, and rules are endlessly improvable, no? Maybe, maybe not, but just put a tech services administrator in the room for some of these discussions, and you’re likely to get a reaction pretty close to mine. But to my mind, the total focus on rules rather than a more practical approach to address the inevitability of change in the business of cataloging is doing more towards ensuring that the human role in the process will be limited in ways that make little sense, except monetarily.
What we need here is to change the conversation, and no group is more qualified to do that than CC:DA. To do that it’s absolutely necessary that its membership become more knowledgeable about what is now possible in automating metadata creation. Without that kind of awareness, it’s impossible to start thinking and discussing how to focus less of CC:DA’s efforts on that part of the cataloging process which should be done by machines, and more on what still needs humans to accomplish. There are several ways to do this. One is by dedicating some of CC:DA’s conference time to bringing in those folks who understand the technology issues to demonstrate, discuss, and collaborate.

Catalogers and their roles have been changing greatly over the past few years, and promises of more change must be taken seriously. Then the ultimate question might be asked: if resistance is futile (and it surely is), how can catalogers learn enough to help frame that change?

By Diane Hillmann, February 5, 2014, 4:23 pm (UTC-5)

Prior to Midwinter I posted the list of presentations I was doing over the course of Midwinter. It seemed only fair to report on some of those sessions, and to share my slides. I thought about posting them separately, since my posts tend to balloon fairly significantly once I get writing (those of you who know me are free to point out that I talk like that, too)–but given that I’ve decided to post more often, y’all will have to live with length.

Saturday, January 25, 2014, 3:00-4:00 p.m., “A Consideration of Holdings in the World Beyond MARC” [Slides]

There were two speakers at this session, at which I spoke second. The first speaker was Rebecca Guenther, who spoke on BibFrame generally as well as the BibFrame approach to holdings. BibFrame currently has a fairly simple approach, for now limited to the simpler holdings needs for non-serials. This is the easy [easier?] part of course, and it will be interesting to see how serial holdings will be integrated with the model.

My presentation briefly surveyed other important holdings work in progress, including a project at the Deutschen National Bibliothek (DNB), the ONIX for Serials Coverage Statement, the current proposals for schema.org, to a brief report on a project my group is considering that would do for MARC Holdings (sample) what we’ve already set up for MARC Bibliographic data at marc21rdf.info.

What struck me when I was setting up the presentation was the amazing variety of work going on in this area. I really didn’t expect that, I confess. But by immersing myself in holdings as I hadn’t done for many moons, I found I was looking at an awful lot of very recent work. And it wasn’t just the diversity of approaches that surprised me, but the varied results as well. The efforts ran the gamut from very complex and comprehensive approaches (ONIX and MFHD) to much simpler approaches. The functions anticipated for each colored the diverse outcomes to a great extent. The ONIX XML schema was easily the most complex–with some ideas based on the MFHD work.

The schema.org effort is, like BibFrame, still in the process of jelling. When looking for the evidence of schema.org holdings, I found myself on a path that had already been abandoned (though it then showed no signs of abandonment). Richard Wallis pointed me at the right place, and the slides have been corrected to fix that problem.

Sunday, January 26, 2014, 8:30-10:00 a.m., “The Other Side of Linked Data: Managing Metadata Aggregation” [Slides]

This session also included two presentations, mine was first this time. My focus was that most people think Linked Open Data (LOD) is about libraries exposing their data to the world, but that’s only half of LOD. The other half is taking advantage of the data others (libraries and non-libraries) are exposing openly. The two fundamental things about the LOD world are both ideas that tend to explode minds. First is the realization that we’re not talking about highly OCLC-curated MARC records, pre-aggregated for easy ingest into traditional library systems. Instead, we are talking about management of statements (which may indeed be records as originally ingested, but to be useful in this multiple choice world must be shredded on the way in and re-aggregated on the way out.) There are many new skills we’ll have to learn (and an awful lot of assumptions that we’ll need to examine closely and maybe toss out the window). This is daunting, but hardly rocket surgery, and the sooner we get going, the better off we’ll be.

The second presentation was from a group working at the Digital Public Library of America (DPLA), which is confronting many of these issues. Their announcement stated:

“This talk will introduce and outline the challenges of aggregating disparate metadata flavors from the perspective of both DPLA staff and representative hubs. We will review next steps and emerging frontiers as well, including improvements to normalization at the hub level and wider adoption of controlled vocabularies and formats for geospatial metadata and usage rights statements.”

And this was exactly what they did. They provided a very juicy look at the real world that faces anyone attempting to deal with the current metadata chaos. This is definitely work to follow, because where they are now will change over time and with experience, providing the rest of us with some really useful insights. Their slides are available from the IG site.

Sunday, January 26, 2014, 10:30-11:30 a.m., “Mapmakers” [Slides]

The Mapmakers presentation was designed to highlight some research I’ve been involved in, along with my colleagues Jon Phipps and Gordon Dunsire. This topic has not received a vast following, but should as experience with new schemas and value vocabularies expands. As is usual, there was another presentation just before ours that gave an exciting view of innovative work in expanding our notion of authority, in particular gathering and managing data from a broad variety of sources. Their work is encountering very similar challenges as the DPLA, though in some ways even more challenging since they often have to develop the sources and bring them into the LOD world.

That presentation focused on work done in the ProMusicaDB project, with founder Christy Crowl and metadata librarian Kimmy Szeto sharing the podium. There was a feast of slides and stories, all of them illustrating the new ways that we’ll all be operating in the very near future. Their slides (including a demo) should be available on the ALCTS Cataloging and Classification Research IG site very shortly (though not yet as of this writing).

By Diane Hillmann, February 3, 2014, 12:14 pm (UTC-5)
Saturday, January 25, 2014, 3:00-4:00 p.m., A Consideration of Holdings in the World Beyond MARC [PCC 203B] Sunday, January 26, 2014, 8:30-10:00 a.m., The Other Side of Linked Data: Managing Metadata Aggregation [PCC 102A] Sunday, January 26, 2014, 10:30-11:30 a.m., Mapmakers [PCC 102A]

Most ALA watchers have noticed a shift from ‘invited talks’ at Interest Group and Committee meetings to requests for proposals from the chairs, from which pool the speakers are chosen. This is, of course, in parallel with changes going on with other professional conferences, and it’s an interesting shift for a number of reasons.

There’s a democratization aspect to this change–the chairs are no longer limited in their choice to people they already know about, thereby potentially increasing the possibility that new and different ideas will get an airing. Maybe this Midwinter someone will come up with an absolutely wonderful and unexpected presentation that rockets the speaker from the unknown mob to the smaller roster of interesting known speakers. This is a good thing, I believe, even though the chance of witnessing such a rocket launch are dauntingly small.

As someone who has been around long enough (and noisily, it must be said) this shift means that I don’t need to wait for invitations to do presentations based on some chair’s idea of what might interest their group (but may no longer interest me), I can go ahead and respond to the calls that are appealing to me. I’d like to think that the result is something fresh enough to be interesting for me to prepare and an audience to listen to, without being totally divorced from prior talks that represent earlier phases. An odd result of this shift in process is that speakers who submit proposals to various committees don’t generally know who else will be speaking at a particular program until after their proposal has been approved, and maybe not even then. This particular aspect has already led to some very interesting lineups at meetings across the conference.

Because I take seriously the idea of not re-using previous talks to the extent that I could become horribly boring, I tend to apply for things that allow me to explore something that isn’t unrelated to what I’ve done before, but at least requires that I rethink something or try a different approach than I’ve used before to expose what I (and the people I work with) are thinking about. I think that’s pretty much what most audiences are looking for, right?

So below are my talks for ALA Midwinter. I may be accompanied by one or another of my colleagues on a couple of these, and will surely have their help building the presentations.

A Consideration of Library Holdings in the World Beyond MARC

Of all the MARC 21 formats, Holdings was the one most clearly designed for machine manipulation. It is granular, flexible, and intended to be used at either a detailed or summary level. It has sometimes frightened potential users because it looks complex (even where it isn’t), and in its ‘native’ form is not particularly human friendly. Some of the complexity arises because there are both display and prediction aspects in the encoding, and not all library systems have developed predictive serial check-in systems supported by MARC Holdings.

Some of the bibliographic metadata efforts now going forward ignore the existing MARC Holdings, sometimes in favor of simpler solutions based on the perception of the waning need for predictive check-in for digital subscriptions. Not much effort has been expended to bring the MARC Holdings format forward into the discussions about changing requirements and re-use of existing standards.

For the ALCTS CRS Committee on Holdings Information, Saturday, January 25, 2014, 3:00-4:00 p.m., PCC 203B.

Holdings has been an interest of mine since I was a law librarian representing the American Association of Law Libraries on MARBI. In the early computer era in libraries, where digital publication was the exception, law publishers demonstrated a great deal of creativity in their publication of updating services, from loose-leaf services and regular republication of standard tools, and law catalogers always had the best examples of holdings problems. These days, most of those materials have been subsumed by various digital tools, which have their own complexities, particularly in the context of versions, republication and compilation.

But the question remains–has what we learned from the pre-digital world of holdings functionality have relevance in the digital era?

The Other Side of Linked Data: Managing Metadata Aggregation

Most of the current activity in the library LOD world has been on publishing library data out of current silos. But part of the point of linked data for libraries is that it opens up data built by others for use within libraries, and has the potential for greater integration of library data within the larger data world. The sticking point for most librarians is that data building and distribution outside the familiar world of MARC seems like a black box, the key held by others.

Traditionally, libraries have relied on specialized system vendors to build the functionality they needed to manage their data. But the discussions I’ve heard too often result in librarians wanting vendors to tell them what they’re planning, and vendors asking librarians what they need and want. In the context of this stalemate, it behooves both library system vendors and librarians to explore the issues around management of more fine-grained metadata so that an informed dialogue around requirements can begin.

For the ALCTS Metadata Interest Group, Sunday, January 26, 2014, 8:30-10:00 a.m., PCC 102A

Transitioning from a rigidly record-based system to a more flexible environment where statement level information can be aggregated and managed is difficult to envision from the vantage point of our current MARC-based world. This has lead to a gap between what we know, and the wider world of linked open data we’d like to participate in. One of the critical steps is to understand how such a world might look, and what it requires of us and our systems. The goal is to be able to move some of that improved understanding to the point of innovation and development.

Mapmakers

It’s very clear that there will be no single answer to moving bibliographic metadata into the world beyond MARC, no direct ‘replacement’ for the simple walled garden we all have lived in for 40+ years. While it’s certainly true that the emerging global universe of bibliographic description has continued to expand and seems more chaotic than ever, there are still commonalities of understanding with the world beyond our garden walls that we’re only beginning to identify. How then can we begin to expose our understanding to that universe and develop some consensus paths forward? Specifically, what are the possibilities for using semantic mapping to provide us with the flexibility and extensibility we need to build our common future.

For the ALCTS CaMMS Cataloging & Classification Research Interest Group, Sunday, Jan. 26, 10:30-11:30, PCC 102A.

Librarians too often see ‘mapping’ and think ‘crosswalking’, but the reality is that these are quite different strategies. Crosswalking was a natural fit for the MARC environment, where the ‘one, best’ crosswalk would logically be developed centrally and implemented as part of current application needs. But the limitations of crosswalking make much less sense as we transition into a world where the Semantic Web has begun to take hold (of our heads, if not our systems!).

In the Semantic Web world, maps can contain a variety of relationships (not just the crosswalk ‘same as’), and central development and control is neither necessary nor very useful. This doesn’t mean that we’re all on our own and that collaboration isn’t still our best strategy.

By Diane Hillmann, December 9, 2013, 3:26 pm (UTC-5)

[Continuing from a post earlier today]

The second, and not unrelated, announcement had to do with the end of printed versions of the Red Books, which have traditionally represented LCSH in its most official form. In the LC report to CC:DA the cessation of publication of the Red Books was announced:

In 2012, LC conducted an extensive study on the impact and opportunities of changes in the bibliographic framework and the technological environment on the future distribution of its cataloging data and products. LC’s transition from print to online-only for cataloging documentation is a response to a steadily declining customer base for print and the availability of alternatives made possible by advances in technology. This shift will enable the Library to achieve a more sustainable financial model and better serve its mission in the years ahead.

Certainly there’s not much to argue with here–consumers have spoken, and LC, like every institution and service provider, needs to pay attention. But more troubling is what the online-only policy really means. The announcement includes some information on a planned PDF version of LCSH, and points to that PDF, plus the Cataloger’s Desktop and Classification Web products (both behind paywalls) as the remaining complete and up-to-date options.

Notable for its absence in that announcement is any comment on LCSH on id.loc.gov. Many of us are well aware of the gaps that make this version less than complete and up-to-date, and indeed the introduction to the service points out that:

LCSH in this service includes all Library of Congress Subject Headings, free-floating subdivisions (topical and form), Genre/Form headings, Children’s (AC) headings, and validation strings* for which authority records have been created. The content includes a few name headings (personal and corporate), such as William Shakespeare, Jesus Christ, and Harvard University, and geographic headings that are added to LCSH as they are needed to establish subdivisions, provide a pattern for subdivision practice, or provide reference structure for other terms. This content is expanded beyond the print issue of LCSH (the “red books”) with inclusion of validation strings. *Validation strings: Some authority records are for headings that have been built by adding subdivisions. These records are the result of an ongoing project to programmatically create authority records for valid subject strings from subject heading strings found in bibliographic records. The authority records for these subject strings were created so the entire string could be machine-validated. The strings do not have broader, narrower, or related terms.

It’s not clear to me that the caveats in this introduction are either widely read or completely understood, but I think a survey of random catalogers to ask how useful the service is and what it includes and doesn’t, you’d get a wide variety of responses. And of course, the updating strategy for the subject headings operates under the same ‘versioning’ pattern as the relators: files are reloaded periodically and there isn’t much in the way of the sort of versioning that could support any kind of notifications to users or support for updating linked data using LCSH outside of ILS’s or traditional central services like OCLC.

What LC has done could serve as a case study in how not to handle versioning of semantics in a public vocabulary. If we accept the premise that vocabulary semantics will change, there are very few methods to create stable systems that can rely on linked data. One option (preferred) is to use vocabularies from systems that provide stable URIs for past, present, and future versions of the vocabulary or (not preferred) to create a local, stable shadow vocabulary and map the local vocabulary to the public vocabulary over which you have little or no control. Mapping vocabularies in this way gives you the opportunity to maintain the semantic stability of your own system, your own ‘knowledge base’, while still providing the ability to maintain semantic integration with the global pool of linked data. Clearly, this is an expensive proposition. And it’s not as if these issues of reuse vs. extension are not currently under heavy discussion in a number of contexts: on the public schema.org discussion lists, for instance.

There are a number of related issues here that would also benefit from broader discussion. Large public vocabularies have tended to make an incomplete transition from print to online, getting stuck, like LC, attempting to use the file management processes of the print era to manage change behind a ‘service’ front end that isn’t really designed to do the job it’s being asked to do. What needs to be examined, soon and in public, is what the relationship is between these files and the legacy data which hangs over our heads like a boulder of Damocles. Clearly, we’re not just in need of access to files (whether one at a time or in batches) but require more of the kinds of services that support libraries in managing and improving their data. These needs are especially critical to those organizations engaged in the important work of integrating legacy and project data, and trying to figure out a workflow that allows them to make full use of the legacy public vocabularies.

Ignoring or denying these issues as important changes are made to the vocabularies that LC manages, on behalf of the cultural heritage communities across the globe, does a disservice to everyone. No one expects LC to come up with all the answers, just as they could not be expected (in the past, or now) to build the vocabularies themselves without the help of the community. NACO, SACO and PCC were, and are, models of collaboration. Why not build on that strength and push more of the discussion about needs and solutions into that same eager, and very competent, community?

By Diane Hillmann, July 23, 2013, 3:20 pm (UTC-5)

The Library of Congress recently made a couple of announcements, which I’ve been thinking about in the context of the provision of linked data services.

In May, LC announced that a ‘reconciliation’ had been done on the LC relators, in part to bring them into conformance with RDA role terms. This is not at all a bad thing, but the manner in which the revisions were accomplished and presented on id.loc.gov points up some serious issues with the strategy LC is currently using to manage these vocabularies.

As part of this ‘reconciliation’ LC made a variety of changes to the old list. Some definitions were changed, but in most cases the code and derived URI remained the same, creating a situation where the semantics become unreliable. It’s not easy to determine which ones have changed, because the old file was overwritten, the previous version can’t be accessed through the service, and as far as I can tell, there’s no definitive list of changes available. The only clue in the new file are ‘Change Notes’–textual notes with dates of changes–though what changes are not specified. An example can be found under the term ‘Binder’ (code bnd), where the change note has two dates:

2013-05-15: modified
1970-01-01: new

In another example, the term ‘Film editor’, the definition now starts: “A person who, following the script and in creative cooperation with the Director, selects, arranges, and assembles the filmed material, …” whereas the old usage note referred to “… a person or organization who is an editor of a motion picture film …”. This is a clear and significant change of definition because the reference to the organization entity has been dropped. Curiously the definition for the term ‘Scenarist’ continues to refer to “A person or organization who is the author of a motion picture screenplay …”, although the definition was changed at the same time. Perhaps the difference occurs because the change note for ‘Film editor’ refers to “FIAF”, which is probably the International Federation of Film Archives (the announcement refers to FIAT, a probable typo).

This M.O. may be perfectly satisfactory to support most human uses of the vocabulary, but it is clearly not all that useful for machines operating in a linked data environment. I was alerted to some of these issues by a colleague building a map based on the prior version, which now needs to be completely revised (and without a list of changes, this becomes a very laborious process). It’s also my understanding that the JSC just recently updated some of the relationship definitions for the most recent update of the RDA Toolkit, which are now out-of-sync with the ‘reconciled’ relator terms.

A number of questions arise as a result of this, perhaps chief among them the basic one of whether it makes sense to reconcile these vocabularies at all. Because this work was not discussed publicly before the reconciled vocabulary was unveiled (I might be wrong about this, but I’m sure someone will correct me if I missed something), the potential effect on legacy data is unknown, as are any other options for dealing with the issues created by lack of established process or opportunity for public comment. If you accept the premise that we will continue to live in an environment of multiple vocabularies for multiple uses, there are other strategies–mapping and extension, for instance–that might have a better chance to improve usefulness while avoiding the kinds of reliability and synchronization problems these changes bring to the fore.

In addition to the process issues, a strong case could be made that the current services presented under the id.loc.gov umbrella might benefit from some discussion about how the data is intended to be used and managed. Not everyone is tied to traditional ILSs now, and perhaps fewer will be in future, if current interest in linked data continues. Are all users of these vocabularies going to be expected to flush their caches of data every time a new ‘version’ of the underlying file is loaded? How would they know of change happening behind the scenes (unless, of course, they are careful readers of LC’s announcements)? If LC expects to provide services for linked data users, these issues must be discussed openly and use cases defined so that appropriate decisions are enabled. At a minimum, these practices need to be examined in the context of linked data principles that call for careful change to definitions and URIs to minimize surprises and loss of backward compatibility.

[To be continued]

By Diane Hillmann, July 23, 2013, 2:35 pm (UTC-5)

Many of you have heard me say “Time flies, whether you’re having fun or not”–and that has certainly been the case since I got back from the NISO Roadmap meeting a few weeks ago. Somehow, with my head down, I missed part 1 of Roy Tennant’s post “The Post-MARC Era, Part 1: “If It’s Televised, It Can’t Be the Revolution”. I’m old enough to remember the 60’s and the call to revolution that Gil Scott-Heron referred to, and in fact had a small part in it–but since it WAS live, I’ve no evidence to present about my participation, you’ll just have to believe me.

On the other hand, I’ve been very involved in the revolution under discussion in the remainder of his post, and there’s quite a bit of video to confirm that, including at the beginning of the NISO Roadmap meeting, where Gordon Dunsire and I tossed a few thought-bombs out before the conversation got going. I think it validates Roy’s point about participation to say that the points we made came up frequently in the subsequent small group sessions, which were not, I believe, on the video feed. What I observed as a participant was that more than a few folks left with some new information and (I hope) some expanded thinking about what the revolution was about; more than they came in with.

Despite the fact that I’ve acquired an undeserved reputation for being a MARC hater, I actually think that we should continue to use the semantics of MARC, and get rid of the ancient encoding standard. It’s in some ways a Dr. Jekyll and Mr. Hyde problem we have here, and we’re about to kill the ‘wrong MARC’ in our exasperated search for something simpler, because we can’t seem to get clear about what MARC is and isn’t. The reality is that the MARC semantics represent the accumulated experience in library description from the days of the 3 x 5 card with the hole in the bottom (see Gordon Dunsire’s presentation on that evolution). We’ll clearly need to map the semantics of our legacy data forward, but that doesn’t require that we carry along the ‘classic’ MARC encoding. Consider the old days of the telegraph, where messages were encoded using dots and dashes. Those messages were translated into written English for end users, who didn’t need to know Morse Code to read them. Now we use telephone messaging and email for those kinds of communications, and Morse Code doesn’t figure in there anywhere.

In addition, we need to look past all those rarely used MARC fields, and recognize that they are only irrelevant in an environment that looks very much like our current one, with artisanal catalog records records and top-down standards development. That’s not really what we’re hoping for, as we wrap our minds around what an environment based on linked open data might free us to do differently. When systems were built to process MARC-encoded records, those systems needed to be updated at regular frequencies and all the sharing partners moved in lockstep. It was very expensive to manage the code that was the plumbing of those systems and the specialized fields didn’t add much value. But remember that each of the proposals for change were extensively discussed and formally accepted. I was there for many of those discussions, and recognize that not all of them were accepted, but a considerable number were, and then not always (or often) used after they were included in MARC. Before we label all that effort wasted, and attempt to re-litigate all those decisions, let’s take a closer look at the real costs of moving those forward, in the very different environment we’re envisioning, where the costs are differently distributed and everyone need not move in lockstep. It’s entirely possible that some new communities will find these specialized fields very relevant, even though libraries have not.

Roy quotes from the BibFrame announcement, which states:

“A major focus of the initiative will be to determine a transition path for the MARC 21 exchange format in order to reap the benefits of newer technology while preserving a robust data exchange that has supported resource sharing and cataloging cost savings in recent decades.”

It’s still unclear to me (and I’m not alone here), that we really needed a ‘transition path for the MARC 21 exchange format’. Why can’t we join the rest of the world, which is tootling along quite nicely, thank you, without a bespoke exchange format? We have several useful sets of semantics, built collaboratively over the past half century–why would we need to start over? I generally read the BibFrame discussions, but rarely participate, mostly because it all seems like a reinvention of something that doesn’t need reinventing, and I have no time for that. Whatever the BibFrame people come up with will be mappable to and from the other ongoing bibliographic standards, and whoever wants to use it for exchange can certainly do that, but it will never have the penetration in the library market that MARC has.

It’s also a bit mysterious what ‘preserving a robust data exchange’ actually means. Are we talking about maintaining the current exchange of records using OCLC as the centralized node through which everything passes? What part of that ‘preservation’ is about preserving the income streams inherent in the current distribution model? What is it about linked open data, without a central node, that isn’t robust enough?

Roy ends his post with something that I didn’t expect, but definitely applaud:

“Watching the NISO event over the last two days crystallized for me that I had fallen into the trap of thinking that the Library of Congress or NISO or OCLC (my employer) would come along and save us all. I forgot that for a revolution to occur it can’t come from the seats of the existing power structure. True change only happens when everyone is involved. Those organizations may implement and support what the changes that the revolution produces, but anything dictated from on high will not be a revolution. The revolution will not be piped into our cubicles, ready for easy consumption. The revolution will be live.”

We could start by no longer waiting for LC to deliver an RDF version of MARC 21, unencumbered by 50 year old encoding standards. We already have that, at marc21rdf.info. Yeah, it needs some work, but it’ll get done a lot faster if we can get some help from the 99% of the library world. Give us a holler if you’re interested.

Clearly the revolution is not happening on the BibFrame discussion list, it is happening elsewhere.

By Diane Hillmann, May 14, 2013, 4:34 pm (UTC-5)

I saw the announcement a few weeks ago about the demise of MARBI and the creation of the new ALCTS/LITA Metadata Standards Committee. My first reaction was ‘uh oh,’ and I flashed back to the beginnings of the DCMI Usage Board. The DCUB still exists, but in a sort of limbo, as DCMI reorganizes itself after the recent change of leadership.

I was a charter member, and, with Rebecca Guenther, wrote up the original proposal for the organization of the group. It was based to some extent on MARBI–not a surprise, since Rebecca and I were veterans of that group. But there were some ambiguities in the plan for the UB that came back to bite us over the next few years–primarily having to do with essential questions about what the group was supposed to be doing, and how to accomplish its goals. These difficulties had little to do with the organizational aspects–how many members, questions of voting (which changed over time), or issues of documentation and dissemination, all of which were settled fairly easily when the group was set up (and can be found here.)

It struck me as I was reading the announcement, that it might be useful for me to revisit some of the issues that came up with the DCMI Usage Board while I was a member, and think about whether they are relevant to the new ALCTS/LITA Metadata Standards Committee. I hope this perspective may be useful for ALCTS and LITA as they get this committee going, because, frankly, I see dragons all over the place. [I should emphasize here that these are personal opinions, and don’t represent any position of the DCMI Executive group, of which I am a member.]

So, here’s a quote from the announcement describing the Committee’s responsibilities:

“The ALCTS/LITA Metadata Standards Committee will play a leadership role in the creation and development of metadata standards for bibliographic information. The Committee will review and evaluate proposed standards; recommend approval of standards in conformity with ALA policy; establish a mechanism for the continuing review of standards (including the monitoring of further development); provide commentary on the content of various implementations of standards to concerned agencies; and maintain liaison with concerned units within ALA and relevant outside agencies.”

I see a lot of big and important words in this paragraph, and would like to see some of them defined more carefully for this new context. For instance, what does ‘a leadership role in the creation and development of metadata standards’ really mean? The prospective committee members are folks who have day jobs and are likely to meet in person twice a year (perhaps in multiple meetings) at each ALA meeting, but they have been given an enormous brief, or so it seems.

First of all, what is a ‘standard’? Are ‘standards’ in this context only those which have been vetted by a standards body like NISO or ISO? Some ‘standards’ that are in relatively broad use in the bibliographic environment are in fact developed within the walls of just one institution (e.g., LC’s MODS, MADS, etc.) and though they may eventually acquire some mechanism for user participation, their definition as standards is largely self-declared by their managing institution. For that matter, how about metadata element sets developed by international bodies, like IFLA, or W3C, or Dublin Core? ALA is a voting member of NISO, which suggests to me that a clear definition of what a standard IS will be an essential step, even before an examination of the notion of what a ‘leadership role’ might be.

Then there’s the notion that standards (however defined) will be proposed to this committee for review and evaluation. Proposed by whom? Reviewed by what criteria, and evaluated by what mechanism?

For the DCUB, the brief of the group changed over time, as DCMI grew and shifted focus. At first, the UB’s brief was the review of proposals for new metadata terms. That turned out to be far more difficult than it seemed on the surface, because in order to evaluate those proposals, there first needed to be criteria for evaluation. Eventually it became clear that there were an infinite number of elements desired by an ever increasing number of communities, and whether any or all of these should be part of what was supposed to be a general set of properties became an issue. Finally, after much discussion, it was determined that the Dublin Core was not going to be the arbiter of all terms desired by all people, and the UB stopped reviewing proposals for new terms.

Another historical tidbit illustrates a possible pitfall. At one point (I’m afraid I can’t remember the timing on this), the UB was approached by a public broadcasting group that was developing a metadata schema based on Dublin Core, and they wanted us to review what they’d done and give them some feedback. So, the UB looked over what they’d done, and provided them with feedback–mostly about how they’d structured their schema, rather than the specific terms they used.

Some time later, it was pointed out to me that the Wikipedia entry on PBCore said that the UB had ‘reviewed’ their schema, in a manner implying that we’d given some stamp of approval, which we had certainly not done. Wikipedia being what it is, I went in and clarified the statement. You can probably see what I added by checking out the Wikipedia entry, and you might want to look at some of the PBCore vocabularies in the Open Metadata Registry Sandbox (this is a good example, but you’ll note that they didn’t get beyond “A”)

The RDA effort is a classic case of how much more difficult it is to develop standards than it seems at the start–and also how important process and timeliness are to the eventual determination of who will actually use the standard. The RDA development effort was started long enough ago that during the long process of development — originally begun as a classic closed-door-experts-only effort — the whole world changed.

In 2007, as part of that process, I got involved in the effort to build the vocabularies necessary for RDA to be used in a Semantic Web environment, in parallel with the continuing development of the guidance instruction and under the aegis of the DCMI/RDA Task Group (now the DCMI Bibliographic Metadata Task Group). The completion of that work (since 2009 in the hands of the JSC for review and publication), has stalled, as the JSC spends their limited time entertaining proposals for changing the guidelines that they just recently finished. Meanwhile, time continues to march ever onward, and many of those who were once waiting for the RDA vocabularies to be completed have concluded that they may never be, and have started looking elsewhere for metadata element sets.

In the meantime LC itself began it’s BibFrame project roughly two years ago. That effort, as it’s been described so far, seems unlikely to consider RDA as a significant part of its ‘solution’. Various other large users and purveyors of bibliographic data have begun to use a variety of build-your-own schemas to expose their data as linked data, the (somewhat) New Big Thing. It’s illustrative to note that these don’t tend to use RDA properties.

There was a time that MARC ruled the library world, and there’s still a nostalgia in some quarters for that world of many certainties and fewer choices. That time isn’t coming back, no matter how many new committees we set up to try to control the new, chaotic world of bibliographic data. The fact is that our world is moving too fast, and in our anxiety to get things ‘right’ we continue to build and maintain cumbersome ‘standards’ using complex processes that no longer work for us. We’re still trying to insist that the ‘continuing review’, ‘evaluation’ and ‘recommendation’ processes have clear value, but a realistic look at the current environment suggests that they may no longer be of value, or even possible.

I have no inside knowledge of how all this will come out, but I’d be much happier if the new ALCTS/LITA Metadata Standards Committee either receives or builds for itself a much clearer and achievable set of goals and tasks than they seem to have been given.

It’s a jungle out there.

By Diane Hillmann, October 26, 2012, 11:01 am (UTC-5)

A couple of weeks ago I made a short presentation at a linked data session at the American Association of Law Libraries (AALL). Many of the audience members were people I’ve known since I was a baby librarian (this is the group where I started my career as a presenter, and they invite me back every couple of years to talk about this kind of stuff.) One of the questions from the audience was one I hear fairly often: “Who’s paying for this?” I always assume, perhaps wrongly, that the questioner is responding to pressure from an administrator, but in fact anyone with with administrative and budget responsibilities–particularly in Technical Services, which has been under budget siege for decades–does and should think about costs.

What I said to her was that we were all going to pay for it, and it seems to me that this isn’t just a platitude–given that the culture of collaboration we have developed assures us that many (though certainly not all) of the costs associated with the change we contemplate will be shared, as in the effort noted in my previous post. But the costs are difficult to assess at this stage, because we don’t know how long the transition will be nor exactly what the ‘end result’ will look like. If indeed we have three options for change—metamorphosis, evolution, and revolution—it seems we’ve not yet made the decision on what it’s going to be. If there are still some hoping for metamorphosis—where everything happens inside the pupa and the result is much more attractive (but the DNA hasn’t changed), well, it may be too late for that option.

Evolution–defined as creative destruction and adaptation leading to a stronger result, with similar (but not identical) DNA–is much more what I’d like to see, and particularly if we look carefully at what we have–the MARC semantics, the SemWeb friendly RDA vocabularies, and the strong community culture in particular–and build on those assets, we have a fighting chance for a good result. The trick is that we don’t have millenia to accomplish this, we have a couple of years at best, if we work really quickly and keep our wits about us. The interesting thing about evolution is that when the environment changes and the affected species either adapt or disappear, it’s never entirely clear what that adaptation will look like prior to the point-of-no-return.

As for revolution–perhaps that’s the possible result where Google and its partners take over the things we used to do in libraries when we brought users and resources together. They’re doing metadata now (not as well as we do, I’m thinking) but if we keep trying to make our ‘catalogs’ work better instead of getting ourselves out there on the Web, I don’t think the result will be pretty.

By Diane Hillmann, August 7, 2012, 3:39 pm (UTC-5)

A few years ago, I wrote an article for a collection of writings in honor of Tom Turner, a talented metadata librarian at Cornell who sadly died too young. That article, “Looking back—looking forward: reflections of a transitional librarian” (Metadata and Digital Collections: Festschrift in honor of Tom Turner), although it meanders around a bit (she said after reading it over for the first time in a couple of years) is a pretty good view of where my head has been these past couple of decades. A lot has happened, and I’ve been lucky enough to have been in the thick of a lot of it.

One question that occasionally comes up is about what was in that kool-aid I drank that caused me to jump ship from many of the traditional library ways of thinking to something quite different. There’s not a simple answer to that. It was a combination of things, certainly, but perhaps best expressed towards the end of the article:

“In October of 2004, I attended a panel presentation where three experts were asked to inform library practitioners by providing “evaluation” information about a large Dublin Core-based metadata repository. It was, in a small way, yet another version of the blind men and the elephant. The first presenter provided large numbers of tables giving simple numeric tallies of elements used in the repository, with no more analysis than a relational database might reveal in ten minutes. The second provided results of a research project where users were carefully observed and questioned about what information they used when making decisions about what they were given in search results—i.e. useful data, and a good start on determining the usefulness of metadata, but with no attention paid to the metadata that was used behind the scenes, well before any user display was generated. The third presenter, a young computer scientist, relied almost entirely on tools developed for textual indexing, and, concluding that the diversity of the metadata was a problem, suggested that the leaders of the project should insist that all data providers follow stricter standards.

These presentations seemed sadly reflective of most attempts to approach the problems of creating and sharing metadata in the world beyond MARC. Traditional libraries built a strong culture of metadata sharing and an enormous shared investment in training and documentation around the MARC standard. The MARC development process codified the body of knowledge and practice that supported this culture of sharing and collaboration, building, in the process, a community of metadata experts who took their expertise into a number of specialized domains. We clearly are now at a critical juncture. Moving forward in both realms, traditional and “new” metadata requires that we understand clearly where we have been and what has been the basis for our past success. To do that we need much better research and evaluation of our legacy and current models, a clearer articulation of short term and long term goals, and a strategy for attaining those goals that is openly endorsed and supported by stakeholders in the library community.”

Looking back, I’m not sure I can exactly pinpoint the moment when we stopped thinking clearly about the road ahead, but until very recently, it sure looked like we had. It was probably somewhere around the time when it seemed like RDA would never be finished in our lifetimes, and that even if finished, would be too much like AACR2 to even consider implementing. Somewhere around that time, CC:DA drafted a ‘no-confidence’ memo to the JSC, and I showed Don Chatham (ALA Publishing) a late draft of it while he was at the DC-2006 conference in Mexico. At that meeting, a few of us suggested to Don that it might be a good idea to have the JSC get together with DCMI and see what could be done, which ended up being the ‘London Meeting’ of five years ago that changed the conversation significantly. This spring a five year anniversary celebration was held in London, where a big part of the discussion was about just how big that impact was, in terms of where the library community is today. As usual, nothing like a good crisis to get things moving.

It will be no surprise to readers of this blog, but I spend a fair bit of time thinking and talking about what comes next, and some of the focus of that question, particularly lately, has to do with MARC. A big part of the problem of answering that “what’s next?” question is that we tend to use ‘MARC’ to refer not just to the specification but as a stand-in for traditional practices and library standards as a whole, and this muddies the conversation considerably. What too often happens is that the ‘good stuff’ of MARC, the semantics that represent years of debate about use cases and descriptive issues, doesn’t get separately, or properly, discussed. Those semantics ought really to be seen as a distillation of all those things we know about bibliographic description, tested by time and generations of catalogers, and very much worth retaining. How we express those semantics needs to be updated, for sure (many are already expressed in the RDA vocabularies), but differentiating between baby and bathwater is clearly a necessary part of moving ahead.

One of the things about the library community that few outsiders understand is the extent to which “the library community” has developed a culture of collaboration–to the extent I’ve never seen anywhere else. Librarians collectively build and contribute bibliographic and name authority records, subject and classification proposals, participate (passionately) in endless debates about interpretation of rules and policies, and most remarkably–write this stuff up and share it extensively, asking for comments (and most often getting them). Participating in other communities after having been part of the one built by librarians is very often a frustrating thing–my expectations of participation are clearly too high.

A great example of how this works is NACO (Name Authority Cooperative Project), for those who participate in building and maintaining good authority records. In my days as Authorities Librarian at Cornell, I helped increase Cornell’s participation in the program by a significant amount, an accomplishment of which I’m still quite proud. Recently I noted a new post to the RDA-L list about some PCC (Program on Cooperative Cataloging) committee work around authority records:

“The PCC Acceptable Headings Implementation Task Group (PCCAHITG) has successfully tested the programming code for Phase 1 in preparation for the LC/PCC Phased Implementation of RDA as described in the document entitled “The phased conversion of the LC/NACO Authority File to RDA” found at the Task Group’s page

The activities of this group display some of the important characteristics of successful community activity: the goals are clearly stated, and timelines spelled out; assumptions are tested and results analysed; conclusions and recommendations are written, explained and exposed for comment. This particular activity is noteworthy because it is a collaboration between the Library of Congress and a group of PCC member institutions, and is Phase 1 of a set of planned activities designed to move cooperatively built and maintained authority data into compliance with RDA.

As for the old broads? I was recently reminded of this wonderful quote from Bette Davis: “If you want a thing well done, get a couple of old broads to do it.” So true, so true …

By Diane Hillmann, August 5, 2012, 12:56 pm (UTC-5)