Recently I retweeted the following:
“nice quote “your data ages like fine wine, whereas your software applications age like fish” in @mattwall’s j.mp/o8zsQG (via @edsu)”
Since then I’ve been thinking about the important lesson encapsulated in those less-than-140 characters, and how we’ve not really internalized this lesson in LibraryLand, no matter how many times we’ve migrated data. I remember many years ago, when I was working in the Cornell Law Library in the catalog card era, we were told by a university official that in case of fire, everyone should grab a shelf list drawer or two and head out the door. We were pretty stunned by this instruction, but they’d worked it all out—that catalog was the biggest investment the library had, and the only way to re-create it after a fire (if for nothing else than to determine the insurance to be paid for all those lost books), was via that shelf list.
Although a lot has changed since then, and most of those catalog cards were long ago recycled as scrap paper, the data they contained is (are?) still around, and still powering the online catalogs at Cornell. The catalog card drawers themselves were part of an ancient (and esthetically pleasing) piece of furniture, rescued from Boardman Hall, which was torn down in the 1950s to make way for Olin Library, a move many believe was a terrible mistake (Olin is the only modern building on Cornell’s Arts Quad). But I digress.
Like most libraries Cornell used OCLC’s services to create catalog cards, not paying much attention to the data being created as part of that process until well down the road. Also like many, Cornell actually had a clutch of ‘holding libraries,’ physical spaces associated with particular schools and programs, each creating what was effectively it’s own database via OCLC. But unlike most, Cornell bit that multiple-records bullet early and when the data was loaded into NOTIS, there was only one iteration of a bibliographic record, with all the local ‘holding libraries’ attached to it. A mini-version of OCLC’s ‘master record’, is one way to look at it, I suppose. It was a sensible, if not particularly popular move, and we all had occasion later to thank our lucky stars we had crossed that bridge as a group, rather than as individuals, when we saw the headaches our comrades were coping with.
My last data migration for Cornell was the one that moved data from the old NOTIS system to Voyager, and it was a year-long project that, if nothing else, reaffirmed my biases towards standard data. Although, like everyone else, we had some standard data (MARC bibs and authorities) and a lot of non-standard data (acquisitions and circulation), the bibliographic portion was, we agreed, the most important part, because everything else ‘hung off’ that bib record. Clearly, the data remained where our investment lay—by the end we weren’t even installing new versions of NOTIS in all the modules we used (and the ones we did install turned out to be mistakes). NOTIS was very old fish indeed by the time we moved to Voyager, and Voyager now, like most of the so-called ‘new generation’ of integrated library systems based on relational databases, is fast becoming a pungent geriatric fish as well.
Enough of looking back (interesting as that can be). The questions now revolve around how different we think our future will look. Will we continue to use/reuse our considerable legacy of data to build the services we want moving forward? If so, what are the steps we need to take, to transform our legacy data to RDA or any other more modern packaging for our data? We have a large number of value vocabularies as well as the MARC 21 schema we still rely on, which we will need to consider part of that plan for re-use.
I’ve seen a lot of ‘new rules for data’, but these are mine:
–Data should be able to be encoded in a variety of ways, to suit a variety of functions, uses, and systems
–Data should be managed at a granular, statement level, but also be available in a variety of record ‘formats’ (with records being understood as primarily an on-the-fly method of aggregating data for a variety of downstream users)
–Although current data is expressed mostly as text strings, data improvement strategies will be designed to change most of them to URIs as soon as practicable.
—Data definitions and specifications will be easily available on the web, allowing mapping to be simpler and easier to tweak
And the most important rule:
—Never, never make data decisions to fit the system flavor of the month, and ‘out’ any system that degrades our data as the price of functionality
This is not to say that the transition of our old data to what we need for a newer environment is going to be seamless, lossless or even easy. It will be none of those things. But I would contend that it’s not rocket science either, and we’d be well advised not to indulge in needless hand-wringing until we’ve explored the issues more fully. Stay tuned …