A few weeks ago I attended the opening of an amber exhibition at our wonderful Museum of the Earth which is only about 6 miles from my house. The exhibit had a little of everything: science, history, geography … and jewelry. I have to admit (and this will surprise no one who knows me) that the jewelry was a big draw, and I went laden (literally), with a varied selection of my own collection of amber. Hey, laugh if you will, but these days I work at home, and have very few opportunities to wear jewelry of any kind—so this opening was irresistible.
But, enough about jewelry, I want to talk about bugs and bibs! As you might expect in a science museum, there was far more emphasis on amber as a carrier (so to speak) of bits and pieces of the past, particularly the biological past. As a preservation medium, amber is hard to beat, though, of course, there are limitations in terms of the size of the biological specimen. I didn’t realize it, but apparently fake amber is everywhere, and one way to recognize the bio fakes is that they include specimens too big to be slowed down by sticky tree sap. The exhibit had some nice fakes, including a small snake in plastic colored to look like amber.
The interest of the scientist in amber is that it stops the process of decay for those creatures lucky (or unlucky) enough to be captured in its grasp. The amber captures a moment in a bug’s short life in a way that allows us to examine it closely and in detail in our own time, millions of years later. In much the same way, the Study of the North American MARC Records Marketplace by R2 Consulting captures a moment in time, very likely too late to have much of an effect on the future, but just in time to capture the state of the cataloging world before the tsunami arrives.
But the R2 report is as fascinating to a metadata maven as a bug in amber is to a biologist. It describes in detail the current world of cataloging distribution, focusing on the “dysfunctional market” that has grown like Topsy around distribution of MARC records. It gets exactly right the disconnect between the librarian sense that “records want to be free” and the business approach that production costs must be recouped and profit margins maintained for there to be any point in participation at all, and comes down predictably in support for the latter view.
It’s a fascinating read, particularly if the fact that LC commissioned the report is kept in mind—because this is hardly a context-free analysis. I was particularly interested in the description of the businesses outside of libraries supplying MARC records either as contractors or as part of a materials supply chain. As a former denizen of one of the large academics that R2 identifies as part of the “green tier” (more about that later), I was aware of the fact of that portion of the MARC marketplace, but had little contact with it.
The gist of the report is that the MARC distribution network is a dysfunctional hybrid, partly librarianishly “free” and part commercial marketplace. The authors feel that it should be possible to increase the supply of MARC records from “the community” without relying on poor beleaguered LC to supply them, and they give us a multitude of statistics to support that assertion. They believe that there’s enough time to accomplish this and save everybody money before the promised changes come to pass, and all must be re-thought.
My comments on this report, informed by my well-known biases, fall into a few convenient categories:
Dysfunctional? Probably …
Much of the first portion of the report is devoted to a description of the current “marketplace” and a discussion of the survey results that illuminate and inform the description. It’s here that R2 makes the case that LC is subsidizing the whole shebang, to the benefit of everyone else.
“Both libraries and vendors (at least the good ones) rely on “service” to their respective clienteles to distinguish themselves, but there are important distinctions in their respective definitions of the term. In the commercial world, service must exist within a context of profitability, in which all costs are covered and some additional increment is contributed to the company’s continued growth and as a return on the capital initially invested. The library service ethic is much more open‐ended and less directly constrained by costs.”
The report contains much interesting description of what the authors perceive as the bifurcated market, one which, in their view, inhibits the growth of useful marketplace incentives to increase output:
“This tension ‐‐ between community values and commercial values, between idealism and pragmatism, between social responsibility and private benefit – has deeply affected some aspects of the library market. Cataloging, regarded by many as the heart of librarianship, is one of those areas.”
It’s pretty clear where the authors come down in this conflict between “community” and “commercial” values:
“The impulse to share records for which the costs have not been fully recovered may make sense as a form of community good, but is not sustainable without some form of subsidy or exchange. From the commercial viewpoint, it’s simply bad business.”
And, perhaps more to the point:
“It should not go unnoticed that LC itself provides open access to its MARC records via multiple channels. The prevalence of open databases is a key factor in the economic confusion that plagues the MARC Record Market … “
The report goes on to a rather interesting and revealing categorization of the complex MARC marketplace into three tiers. The “Green Tier” includes the “ … oldest, most traditional segment of the market, in which nearly all MARC records originate.” This tier includes both libraries and businesses, as well as OCLC, and is, as such, a mix of the “community” and “commercial” as described earlier. The big thing is that they’re contributors to the marketplace, even if also consumers. According to R2’s statistics, this tier includes 97% of academic libraries, 63% of public libraries and a similar proportion of school libraries.
The next tier down (and it’s clearly down, in this categorization) is called the “Blue” or “opportunistic” tier, including by the author’s definition “ … non-OCLC libraries and underfunded libraries without adequate cataloging capacity.” More interestingly, this tier “ … is also home to open database providers, and the pervasive (did they mean to say “pernicious”?) Z39.50 protocols used to locate and obtain MARC records free of charge.” But R2 makes note of the shifting borders between tiers: “Both in Canada and in the US, historically ‘green libraries’ are adopting ‘blue tier’ practices and expectation, as library budgets are cut and as Z39.50 targets proliferate. Nearly all libraries, regardless of size or type are strategically patient, periodically re-searching the ‘blue tier’ for certain records to become ‘available’; but for ‘blue tier’ libraries, this is the primary approach to cataloging … Open Access and Open Archives Initiatives reside in the blue tier, strongly supported by the basic philosophical stance that access to information should be free.”
The “bottom” tier is the non-library “purple” tier, and this description clearly defines the real threat to the current MARC world, not just the fuzzy-wuzzy library community notion of sharing: “The non-library (purple) tier operates to a large extent without appreciation for or experience with MARC records, and without much regard for the library market in general. It is important to remain aware of activity in this segment, of course, because developments here pose the most significant competitive threats to the traditional values and economic structures of the ‘traditional green tier,’ and even the ‘opportunistic blue tier.’ This is the place where newer technologies and non-MARC data formats are used and developed.”
Obviously, we have met the enemy of libraries, and according to R2 it happens to be us. But wait, there are some unexpected companions in the nasty “purple” tier. In addition to the usual suspects, like Google and Amazon, we find … “OCLC pro-actively operates within the “traditional green tier” and within the “purple non-library” tier. OCLC member libraries, however, are also very active in the “opportunistic blue tier,” sharing records in ways that may conflict with OCLC’s proprietary intent.”
The battle lines seem clearly drawn here, with the “information wants to be free” crowd clearly the enemy, whether in sheep’s clothing as traditional librarians or explicitly displaying wolfish teeth as a member of that unappreciative crowd that cares little about the current MARC marketplace and would like to see the library data silo dismantled brick by brick. No matter that we seek these changes for the benefit of libraries struggling to live within their budgets and to innovate to serve their users as well–shame, shame!
The R2 Solution
The report’s authors actually manage to ask THE most relevant question that should be (and often is) on our minds, but only to dismiss it as out of scope:
“The practice of cataloging has never before faced the level of scrutiny it now enjoys … or endures. Two types of question predominate. First, are traditional cataloging and the MARC record—even after modernization by RDA and FRBR—still necessary in an era of full‐text indexing, OpenURL linking, and other discovery options? While this is a worthy question, it is fortunately not within the purview of this report.”
Leaving aside the odd assumption that RDA and FRBR represent the “modernization” of the traditional MARC record, they couch the issue only in the context of a limited number of technologies, never mentioning the gorilla in the room, the data being built by others outside our comfy and bounded silo. Then they go on to pose the questions they would rather address:
“How do we as a profession understand and explain the costs and benefits of producing and distributing cataloging records? Where and by whom are most original records produced? What incentives exist to stimulate production? What are the barriers that discourage production? How does the library market assign value to the work of cataloging? What is the return on any organization’s investment in producing original catalog records? How does shared cataloging and free or low‐cost distribution of records affect the market? To what degree is market activity subsidized by LC and by the work of individual libraries?”
The problem is, that without an answer to question #1, the other questions seem hardly relevant.
“As noted there, the market is in need of adjustment, if it is to create an incentive for producers while retaining the community ethic of free sharing of data. The ethic of the cooperative can only be sustained if the full costs of production are borne by the community.”
It seems to me that the market will be adjusted, and the recognition of the full costs of traditional cataloging and the plunging ROI as we address Question #1 will hasten that readjustment, but probably not in the direction R2 predicts or that those seeking compensation for their MARC record production might want.
The authors provide some telling glimpses into their world view in their discussion about crosswalks:
“ONIX to MARC record translations and fully operable MARC to non‐MARC metadata crosswalks could dramatically alter this three‐tiered landscape. To date, major players in the blue and purple tiers have failed to buy into the concept of shared bibliographic and authority data. While some efforts to encourage cross‐market cooperation are underway (notably the OCLC/NISO forum), fierce competition flourishes within and between each tier of the market. Even more problematic, each tier has distinctly different needs and incentives, making it difficult to establish an adequate degree of shared urgency and/or investment in new solutions.” [RIN]
Clearly, in a world where the only relevant data one can see “out there” is ONIX, crosswalks seem a no-brainer, but to call this view “limited” seems far too kind.
Ultimately, R2 thinks we still have time to tweak the marketplace and flog out more MARC records by identifying and marshaling unused capacity (e.g., hidden catalogers) and providing economic incentives. In my view, this is a flawed argument, and takes away from the need to plan for the transition to a much different future. I agree that MARC will indeed be used by libraries for some time, but as a lossy exchange format, not the lynchpin of the library data world. R2’s strategy prolongs the old world, jeopardizing the possibilities of moving forward in a timely manner.
The Sacred Cow Effect
Sadly, the whole report, interesting though it is as a biological specimen, fails utterly to examine the data activity outside libraries except to demonize it and its proponents. In making the Library of Congress into Poor Nell, they also deny the innovations in creating and reusing data that LC itself has accomplished, for instance, the American Memory Project, the LC Flickr Project, and many other digital initiatives that have proactively (and openly) pushed the metadata envelope in ways that inspire and engage us. The report fails also to understand that the changes they fear, the ones that they rightly expect to undermine the current marketplace completely, are already nibbling ravenously around the edges of MARC and its traditional marketplace in ways that will hardly take the 5-10 years to make change become real that R2 predicts.
Last summer at ALA in Chicago, a small group of us pulled together a linked data program, hearteningly well attended, where Eric Miller persuasively predicted that the return on investment for integrating “free” metadata from “the cloud” will trump traditional concerns about quality. [Miller] Mainstream entitles like the New York Times are moving aggressively into the linked data space, seeking to merge their data with the likes of DBpedia and FreeBase. [Sandhaus]
Consider this from MMA partner Jon Phipps: “The future cataloging marketplace will have to compete with ‘free and more than good enough’. Like the people who initially sneered at Google for being too simplistic and ignoring metadata when it came to searching, the professional cataloging community ignores (or tries to fend off) the enormous future output of Linked-Data-enabled systems at its peril. By opening up a clear relationship between the semantic web and library data sets, the RDA vocabularies represent a threat to the hegemony of catalogers. The RDA vocabularies are a a disruptive, game-changing technology.” [Phipps]
The reality is that it’s not just the marketplace that’s changing, it’s also the profession. As part of the analysis of why the numbers of catalogers reported in their survey doesn’t lead to the expected output levels, R2 speculates that “These data lead us to ask what catalogers are doing. Bob Wolven and others suggest that catalogers are being called upon to apply their knowledge of cataloging principles to new initiatives; and specifically to creating metadata for digital and archival collections.” [Wolven] R2 seems to imply that this is a bad thing, taking away resources from the business of actually churning out MARC records, but certainly these newer roles are critical to the survival and renewal of libraries, far more than shoring up current MARC record production.
The solutions the R2 report poses, from paying more attention to recouping cataloging costs and re-centralizing creation of cataloging records, if taken up, would actively undermine a transition to participation in a more open, linked data world. They represent a step backward, in a community that has already internalized the values of sharing and decentralized data critical to seeing value in the world of openly accessible data lying on our doorstep.
Oddly enough, the report ends with a quote from my old friend Sherman Clarke (unattributed, so most likely as a comment to the survey):
“We collectively need to have a model that allows us to do some of the building of BIBCO records mechanically or through accretion of metadata from institutional records or other record loads. OCLC already does considerable building of the master record from incoming records; what we need is something more like the metadata that is becoming usual in NewGen environments. If someone adds a tag or review or picture, that becomes available in the master cluster. Not a BIBCO record, but a BIBCO cloud of metadata for a particular manifestation of a work/expression.”
Yup, you got it, Sherman. The change we need is not really about records, or catalogers; it’s a new way to think about information and added value.
[Miller] Miller, Eric. “Linked Data and Libraries: Grassroots Program: From Legacy Data to Linked Data, Preparing Libraries for Web 3.0. Available at: http://zepheira.com/talks/ala-em-lod.pdf
[R2] Study (for the Library of Congress) of the North American MARC Records Marketplace, October 2009, R2 Consulting LLC, Ruth Fischer, Rick Lugg. Available at: http://www.loc.gov/bibliographic-future/news/MARC_Record_Marketplace_2009-10.pdf
[RIN] Research Information Network. (2009). Creating catalogues: bibliographic records in a networked world. Available at: http://www.rin.ac.uk/files/creating_catalogues_REPORT_June09.pdf
[Sandhaus] Sandhaus, Evan. “150 Years of Semantic Technology.” Presentation at the Cornell University Libraries Metadata Working Group Forum, Nov. 13, 2009. Slides will be available from: http://metadata-wg.mannlib.cornell.edu/forum/index.php?date=2009-11-13
[Wolven] Wolven, Robert. (2008). In search of a new model: Columbia University Libraries: Robert Wolven reflects on what’s next for cooperative cataloging. netConnect, 1/15/2008. Available at: http://www.libraryjournal.com/article/CA6514925.htm