I’ve been back from Chicago for just over a week now, but still reflecting on a very successful Jane-athon pre-conference the Friday before Midwinter. And the good news is that our participant survey responses agree with the “successful” part, plus contain a lot of food for thought going forward. More about that later …

There was a lot of buzz in the Jane-athon room that day, primarily from the enthusiastic participants, working together at tables, definitely having the fun we promised. Afterwards, the buzz came from those who wished they’d been there (many on Twitter #Janeathon) and others that wanted us to promise to do it again. Rest assured–we’re planning on another one in San Francisco at ALA Annual, but it will probably be somewhat different because by then we’ll have a better support infrastructure and will be able to be more concrete about the question of ‘what do you do with the data once you have it?’ If you’re particularly interested in that question, keep an eye on the rballs.info site, where new resources and improvements will be announced.

Rballs? What the heck are those? Originally they were meant to be ‘RIMMF-balls’, but then we started talking about ‘resource-balls’, and other such wanderings. The ‘ball’ part was suggested by ‘tar-balls’ and ‘mudballs’ (mudball was a term of derision in the old MARBI days, but Jon and I started using it more generally when we were working on aggregated records in NSDL).

So, how did we come up with such a crazy idea as a Jane-athon anyway? The idea came from Deborah Fritz, who’d been teaching about RDA for some time, plus working with her husband Richard on the RIMMF (RDA In Many Metadata Formats) tool, which is designed to allow creation of RDA data and export to RDF. The tool was upgraded to version 3 for the Jane-athon, and Deborah added some tutorials so that Jane-athon participants could get some practice with RIMMF beforehand (she also did online sessions for team leaders and coaches).

Deborah and I had discussed many times the frustration we shared with the ‘sage on the stage’ model of training, which left attendees to such events unhappy with the limitations of that model. They wanted something concrete–they usually said–something they could get their teeth into. Something that would help them visualize RDA out of the context of MARC. The Jane-athon idea promised to do just that.

I had done a prototype session of the Jane-athon with some librarians from the University of Hawaii (Nancy Sack did a great job organizing everything, even though a dodgy plane made me a day late to the party!) We got some very useful evaluations from that group, and those contributed to the success of the official Chicago debut.

So a crazy idea, bolstered by a lot of work and a whole lot of organizational effort, actually happened, and was even better than we’d dared to hope. There was a certain chaos on the day, which most people accepted with equanimity, and an awful lot of learning of the best kind. The event couldn’t have happened without Deborah and Richard Fritz, Gordon Dunsire, and Jon Phipps, each of whom had a part to play. Jamie Hennelly from ALA Publishing was instrumental in making the event happen, despite his reservations about herding the organizer cats.

And, as the cherry on top: After the five organizers finished their celebratory dinner later in the evening after the Jane-athon, we were all out on the sidewalk looking for cabs. A long black limousine pulled up, and asked us if we wanted a ride. Needless to say, we did, and soon pulled up in style in front of the Hyatt Regency on Wacker. Sadly, there was no one we knew at the front of the hotel, but many looked askance at the somewhat scruffy mob who piled out of the limo, no doubt wondering who the heck we were.

What’s up next? We think we’re on the path of a new data sharing paradigm, and we’ll run with that for the next few months, and maybe riff on that in San Francisco. Stay tuned! And do download a copy of RIMMF and play–there are rballs to look at and use for your purposes.

P.S. A report of the evaluation survey will be on RDA-L sometime next week.

By Diane Hillmann, February 14, 2015, 2:43 pm (UTC-5)

The planning for the Midwinter Jane-athon pre-conference has been taking up a lot of my attention lately. It’s a really cool idea (credit to Deborah Fritz) to address the desire we’ve been hearing for some time for a participatory, hands on, session on RDA. And lets be clear, we’re not talking about the RDA instructions–this is about the RDA data model, vocabularies, and RDA’s availability for linked data. We’ll be using RIMMF (RDA in Many Metadata Formats) as our visualization and data creation tool, setting up small teams with leaders who’ve been prepared to support the teams and a wandering phalanx of coaches to give help on the fly.

Part of the planning has to do with building a set of RIMMF ‘records’ to start with, for participants to add on their own resources and explore the rich relationships in RDA. We’re calling these ‘r-balls’ (a cross between RIMMF and tarballs). These zipped-up r-balls will be available for others to use for their own homegrown sessions, along with instructions for using RIMMF and setting up a Jane-athon (or other themed -athon), and also how to contribute their own r-balls for the use of others. In case you’ve not picked it up, this is a radically different training model, and we’d like to make it possible for others to play, too.

That’s the plan for the morning. After lunch we’ll take a look at what we’ve done, and prise out the issues we’ve encountered, and others we know about. The hope is that the participants will walk out the door with both an understanding of what RDA is (more than the instructions) and how it fits into the emerging linked data world.

I recently returned from a trip to Honolulu, where I did a prototype Jane-athon workshop for the Hawaii Library Association. I have to admit that I didn’t give much thought to how difficult it would be to do solo, but I did have the presence of mind to give the organizer of the workshop some preliminary setup instructions (based on what we’ll be doing in Chicago) to ensure that there would be access to laptops with software and records pre-loaded, and a small cadre of folks who had been working with RIMMF to help out with data creation on the day.

The original plan included a day before the workshop with a general presentation on linked data and some smaller meetings with administrators and others in specialized areas. It’s a format I’ve used before and the smaller meetings after the presentation generally bring out questions that are unlikely to be asked in a larger group.

What I didn’t plan for was that I wouldn’t be able to get out of Ithaca on the appointed day (the day before the presentation) thanks not to bad weather, but instead to a non-functioning plane which couldn’t be repaired. So after a phone discussion with Hawaii, I tried again the next day, and everything went smoothly. On the receiving end there was lots of effort expended to make it all work in the time available, with some meetings dribbling into the next day. But we did it, thanks to organizer Nancy Sack’s prodigious skills and the flexibility of all concerned.

Nancy asked the Jane-athon participants to fill out an evaluation, and sent me the anonymized results. I really appreciated that the respondents added many useful (and frank) comments to the usual range of questions. Those comments in particular were very helpful to me, and were passed on to the other MW Jane-athon organizers. One of the goals of the workshop was to help participants visualize, using RIMMF, how familiar MARC records could be automatically mapped into the FRBR structure of RDA, and how that process might begin to address concerns about future workflow and reuse of MARC records. Another goal was to illustrate how RDA’s relationships enhanced the value of the data, particularly for users. For the most part, it looked as if most of the participants understood the goals of the workshop and felt they had gotten value from it.

But there were those who provided frank criticism of the workshop goals and organization (as well as the presenter, of course!). Part of these criticisms involved the limitations of the workshop, wanting more information on how they could put their new knowledge to work, right now. The clearest expression of this desire came in as follows:

“I sort of expected to be given the whole road map for how to take a set of data and use LOD to make it available to users via the web. In rereading the flyer I see that this was not something the presenter wanted to cover. But I think it was apparent in the afternoon discussion that we wanted more information in the big picture … I feel like I have an understanding of what LOD is, but I have no idea how to use it in a meaningful way.”

Aside from the time constraints–which everyone understood–there’s a problem inherent in the fact that very few active LOD projects have moved beyond publishing their data (a good thing, no doubt about it) to using the data published by others. So it wasn’t so much that I didn’t ‘want’ to present more about the ‘bigger picture’, there wasn’t really anything to say aside from the fact that the answer to that question is still unclear (and I probably wasn’t all that clear about it either). If I had a ‘road map’ to talk about and point them to, I certainly would have shared it, but sadly I have nothing to share at this stage.

But I continue to believe that just as progress in this realm is iterative, it is hugely important that we not wait for the final answers before we talk about the issues. Our learning needs to be iterative too, to move along the path from the abstract to the concrete along with the technical developments. So for MidWinter, we’ll need to be crystal clear about what we’re doing (and why), as well as why there are blank areas in the road-map.

Thanks again to the Hawaii participants, and especially Nancy Sack, for their efforts to make the workshop happen, and the questions and comments that will improve the Jane-athon in Chicago!

For additional information, including a link to register, look here. Although I haven’t seen the latest registration figures, we’re expecting to fill up, so don’t delay!

[these are the workshop slides]

[these are the general presentation slides]

By Diane Hillmann, December 19, 2014, 10:22 am (UTC-5)

I know many of you are puzzled by this event, so do take a look at a rundown of the plans on the RDA Toolkit Blog.

Not so surprisingly, we were inspired by the notion of a hackathon, but it had to be focused on something other than computer code and application building. All of us have heard conflicting opinions about whether RDA can be fully functional, whether FRBR works and will benefit users, or whether it’s just all too complicated. The big gap in addressing these questions has been the challenge in doing something hands-on instead of the usual sage-on-the-stage doling out large piles of handouts. There are still realities that need to be recognized, as we take a hands-on look at RDA and build some real RDA data.

First of these realities is that RDA has been in development for a hell of a long time, and the rules (the part that gets the most attention, and some think really IS the whole of RDA) started out as AACR3. As one who’s been watching this space (from the outside and the inside) since the beginning, I can confirm that the the notion of AACR3 is a historical artifact, nothing to do with what RDA has become.

I’ve been ranting and railing for years (too many to count) that RDA must be more than rules. And it is–see the RDA Registry for evidence of that. This leads me to the second reality: all of us are learning as we go. The first iteration of the RDA Vocabularies, developed by the DCMI/RDA Task Group after a famous meeting in London in the Spring of 2007, were never published. The published version, much improved, was released early in 2013 along with the new RDA Registry. The learning-by-doing was happening in a lot of other standards-focused groups: IFLA and W3C for example. FRBR, an essential part of the RDA model, was evolving along with RDA, and that fact led to a couple of interesting compromises, still working themselves out.

I can promise you that the Jane-athon will reflect all of those realities, and in addition build out the community familiar with the lessons yet to be learned. There won’t be any papering over of gaps, downplaying of issues, or anything like that. At the Jane-athon we will demonstrate that building real RDA records in the context of FRBR is not a future dream, it’s happening now. What you will see as a participant is the reality–the ability to work within a FRBR flow, to import MARC records and see the system map them into FRBR constructs, to create links with NAF information, and view the results as a tree that highlights the relationships.

Perhaps most important, we want to have fun with this. There will be no quizzes, no grades, no transcripts. That’s why we chose to focus on two sets of materials with great potential to benefit from a FRBR-based approach. Early in the day you’ll walk through the business of creating cataloging for Blade Runner resources (original book by Philip K. Dick), translations, film, etc. After that we’ll turn the group loose on Jane Austen (with some made-ahead basic data). After the flurry of data creation, we’ll be looking at the results, highlighting issues that come up, and not incidentally, getting some feedback from the participants about the tools, processes, and the beta-Jane-athon in its entirety.

We know (and welcome the fact) that not everyone attending will be a cataloger, much less all that familiar with RDA. There will be a place and a role for everyone who wants to learn more, and to dig in and get their hands [virtually] dirty. There is no need to cram for this event, or to study the RDA rules or cataloging before you come. If you want to get a bit familiar with RIMMF before you come, by all means take a look at the site, download the software, and play. The only requirement is an open mind and some excitement about the possibilities (some trepidation is okay too).

Once you have registered for ALA Midwinter in Chicago, you can sign up for the Jane-athon.The Jane-athon is already available as a paid addition of the full registration for ALA Midwinter in Chicago.

Please feel free to use the comments portion of this post to ask questions, or use the RDA-L list to bring up questions and concerns.

We hope to see you there!

By Diane Hillmann, November 30, 2014, 2:59 pm (UTC-5)

Everyone is getting tired of the sage-on-the-stage style of preconferences, so when Deborah Fritz suggested a hackathon (thank you Deborah!) to the RDA Dev Team, we all climbed aboard and started thinking about what that kind of event might look like, particularly in the ALA Midwinter context. We all agreed: there had to be a significant hands-on aspect to really engage those folks who were eager to learn more about how the RDA data model could work in a linked data environment, and, of course, in their own home environment.

We’re calling it a Jane-athon, which should give you a clue about the model for the event: a hackathon, of course! The Jane Austen corpus is perfect to demonstrate the value of FRBR, and there’s no lack of interesting material to look at– media materials, series, spin-offs of every description–in addition to the well known novels. So the Jane-athon will be partially about creating data, and partially about how that data fits into a larger environment. And did you know there is a Jane Austen bobblehead?

We think there will be a significant number of people who might be interested in attending, and we figured that getting the world out early would help prospective participants make their travel arrangements with attendance in mind. Sponsored by ALA Publishing, the Jane-athon will be on the Friday before the midwinter conference (the traditional pre-conference day), and though we don’t yet have registration set up, we’ll make sure everyone knows when that’s available. If you think, as we do that this event will be the hit of Midwinter, be sure to watch for that announcement, and register early! If the event is successful, you’ll be seeing others in subsequent ALA conferences.

So, what’s the plan and what will participants get out of it?

The first thing to know is that there will be tables and laptops to enable small groups to work together for the ‘making data’ portion of the event. We’ll be asking folks who have laptops they can bring to Chicago to plan on bringing theirs. We’ll be using the latest version of a new bibliographic metadata editor called RIMMF (“RDA In Many Metadata Formats”–not yet publicly available–but soon. Watch for it on the TMQ website). We encourage interested folks to download the current beta version and play with it–it’s a cool tool and really is a good one to learn about.

In the morning, we’ll form small cataloging groups and use RIMMF to do some FRBRish cataloging, starting from MARC21 and ending up with RDA records exported as RDF Linked Data. In the afternoon we’ll all take a look at what we’ve produced, share our successes and discoveries, and discuss the challenges we faced. In true hackathon tradition we’ll share our conclusions and recommendations with the rest of the library community on a special Jane-athon website set up to support this and subsequent Jane-athons.

Who should attend?

We believe that there will be a variety of people who could contribute important skills and ideas to this event. Catalogers, of course, but also every flavor of metadata people, vendors, and IT folks in libraries would be warmly welcomed. But wouldn’t tech services managers find it useful? Oh yes, they’d be welcomed enthusiastically, and I’m sure their participation in the discussion portion of the event in the afternoon will bring out issues of interest to all.

Keep in mind, this is not cataloging training, nor Toolkit training, by any stretch of the imagination. Neither will it be RIMMF training or have a focus on the RDA Registry, although all those tools are relevant to the discussion. For RIMMF, particularly, we will be looking at ways to ensure that there will be a cadre of folks who’ve had enough experience with it to make the hands-on portion of the day run smoothly. For that reason, we encourage as many as possible to play with it beforehand!

Our belief is that the small group work and the discussion will be best with a variety of experience informing the effort. We know that we can’t provide the answers to all the questions that will come up, but the issues that we know about (and that come up during the small group work) will be aired and discussed.

By Diane Hillmann, October 27, 2014, 1:57 pm (UTC-5)

In my post last week, I mentioned a paper that Gordon Dunsire, Jon Phipps and I had written for the IFLA Satellite Meeting in Paris last month “Linked Data in Libraries: Let’s make it happen!” (note the videos!). I wanted to talk about the paper and why we wrote it, but I’m not just going to summarize it–I wouldn’t want to spoil the paper for anyone!

The paper, “Versioning Vocabularies in a Linked Data World”, was written in part because we’d seen far too many examples of vocabulary management and distribution that paid little or no attention to the necessity to maintain vocabularies over time and to make them available (over and over again, of course) to the data providers using them. It goes without saying that the vocabularies were expected to change over time, but in too many cases, vocabulary owners distributed changes in document form, or as files with new data embedded but no indication of what had changed, or worse: nothing.

We have been thinking about this problem for a long time. Even the earliest instance of the NSDL Registry (precursor of the current Open Metadata Registry, or OMR, as we like to call it) incorporated a ‘history’ view of the data, basically the ‘who, what, when’ of every change made in every vocabulary. Later on, we added the ability to declare ‘versions’ of the vocabularies themselves, taking advantage of that granular history data, for those trying to manage the updating of their ‘product’ in a rational manner. Sadly enough, not very many of our users took advantage of that feature, and we’re not entirely sure why not, but there it was. Jon has always been frustrated with our first passes at this problem, and after Gordon and I discussed the problem with others at DC-2013 last year, and my rant about the lack of version control on id.loc.gov came out, it seemed time to think about the issue again.

At that point we were also planning our own big time versioning event: the unpublished first version of the RDA Element Sets were about to make their re-debut in ‘published’ form, reorganized, and with new URIs. Jon was also working on the GitHub connection with the OMR underlying the new RDA Registry site, working in a more automated mode as planned. He and Gordon and I had been discussing a new approach for some time, based on the way software is versioned and distributed, which is well-supported in Git and GitHub. So, as we drove back from ALA Midwinter in Philadelphia in January of last year, Jon and I blocked out the paper we’d agreed to do with Gordon on how we thought versioning should work in the semantic vocabulary world.

Consider: how do all of us computer nerds update our applications? Do we have to go to all sorts of websites (sometimes, but not always, prompted by an email) to determine which applications have changed and invoke an update? Well, sure, sometimes we do (particularly when they want more money!), but since the advent of the App Store and Google Play, we can do our updates much more easily, and for the most part those updates are ‘pushed’ to us for decisions on whether we want to update or not, we are told in a general way what has changed, and we click … and it’s done.

This is the way updates should happen in the Semantic Web data world, increasingly dependent on element sets and value vocabularies to provide descriptions of products of all kinds in order to provide access, drive sales or eyeballs, or support effective connections between resources. Now that we’re all reconciled to using URIs instead of text (even if our data hasn’t yet made that transition), shouldn’t we consider an important upside of that change, a simpler and more useful way to update our data?

So, I’ll quit there–go read the paper and let us know what you think. Don’t miss Gordon’s slides from Paris, available on his website. Note especially the last question on his final slide: “Is it time to get serious about linked data management?” We think it’s past time. After all, ‘management’ is our middle name.

Note: As of this week the video of Gordon’s presentation in Paris is now available.

By Diane Hillmann, September 22, 2014, 12:01 pm (UTC-5)

Some of you have probably noted that we’ve been somewhat quiet recently, but as usual, it doesn’t mean nothing is going on, more that we’ve been too busy to come up for air to talk about it.

A few of you might have noticed a tweet from the PBCore folks on a conversation we had with them recently. There’s a fuller note on their blog, with links to other posts describing what they’ve been thinking about as they move forward on upgrading the vocabularies they already have in the OMR.

Shortly after that, a post from Bernard Vatant of the Linked Open Vocabularies project (LOV) came over the W3C discussion list for Linked Open Data. Bernard is a hero to those of us toiling in this vineyard, and LOV one of the go-to places for those interested in what’s available in the vocabulary world and the relationships between those vocabularies. Bernard was criticizing the recent release of the DBpedia Ontology, having seen the announcement and, as is his habit, going in to try and add the new ontology to LOV. His gripes fell into a couple of important categories:

* the ontology namespace was dereferenceable, but what he found there was basically useless (his word)
* finding the ontology content itself required making a path via the documentation at another site to get to the goods
* the content was available as an archive that needed to be opened to get to the RDF
* there was no versioning available, thus no way to determine when and where changes were made

I was pretty stunned to see that a big important ontology was released in that way–so was Bernard apparently, although since that release there has apparently been a meeting of the minds, and the DBpedia Ontology is now resident in LOV. But as I read the post and its critique my mind harkened back to the conversation with PBCore. The issues Bernard brought up were exactly the ones we were discussing with them–how to manage a vocabulary, what tools were available to distribute the vocabulary to ensure easy re-use and understanding, the importance of versioning, providing documentation, etc.

These were all issues we’d been working hard on for RDA, and are still working on behind the RDA Registry. Clearly, there are a lot of folks out there looking for help figuring out how to provide useful access to their vocabularies and to maintain them properly. We’re exploring how we might do similar work for others (so ask us!).

Oh, and if you’re interested on our take on vocabulary versioning, take a look at our recent paper on the subject, presented at the IFLA satellite meeting on LOD in Paris last month.

I plan on posting more about that paper and its ideas later this week.

By Diane Hillmann, September 15, 2014, 2:31 pm (UTC-5)

A few days ago, while catching up with list traffic on RDA-L, I stumbled on a conversation between two librarians that got me thinking. They were talking about the myriad of changes in their ILS’s designed to make MARC usable with RDA. It’s a topic I still see a lot of on the lists, and it always makes me grind my teeth.

One reason for the dental destruction is that invariably the changes are small and niggly ones, the sort that make life annoying for those catalogers trying to apply RDA in a world still defined by MARC-based systems. It’s hardly news that those systems are often inflexible, but that’s primarily because they were built to supply services in an environment where data sharing was very centralized, change happened no more frequently than every six months, and everyone was prepared and in sync by the time records started flowing containing updated structure. This world is either gone or on its last legs, depending on your perspective.

What I saw underlying that conversation was the assumption that the only way change could happen was if the ILS’s themselves changed; in other words if the ILS vendors decided to lead rather than follow. The situation now is that system vendors say they’ll build RDA compliant systems when their customers ask for them, and libraries say that they’ll use ‘real’ RDA when there are systems that can support it. This is a dance of death, and nobody wins.

For years I’ve been telling librarians that they need to bug their vendors about this state of affairs, but I’m not sure there’s much of a future in that strategy, given that most of the librarians are not yet able to tell the vendors what they want in any detail, and the vendors have been unwilling to build their expertise or invest in any substantive way until they think they have customers ready to buy. This strategy of ‘wait and see’ undoubtedly has its attractions–given that it’s cheap–and the vendors don’t yet see that their current customer base has any alternatives.

But there are alternatives, albeit ones that require some initiative and investment by either libraries or vendors willing to step forward and perhaps gain a competitive advantage. In essence this strategy is based on the notion that if vendors won’t smarten up their systems, a solution that treats the vendor systems like ‘dumb consumers’ might be the best bet for moving forward. If indeed the tight coordination necessary in the past to share data and manage change is no longer optimal, distributed data need not use an ILS as its primary ‘node’ for storage and management of data. It could just sit there, waiting for some other machine that wished to share data (in or out), and still run its functions for the OPAC and pass data to OCLC.

But I suggest it’s no longer necessary for that ILS to be the center of our concern and attention, particularly if we see value in participating in the linked data world. The functions of creating and maintaining data could be accomplished elsewhere, preferably in a ‘system’ (maybe just a cache with services) designed to ingest, manage, and expose statement-based data in a variety of formats–including MARC, for as long as we want. Thus, the export of library data and the serving of it to users via an OPAC could be accomplished without giving up the MARC-based ILS, cutting the cord to OCLC, or upsetting current partners and collaborators.

Yes, we’d still have some work to do, but not nearly as much as we think. We have long understood that the old ideal of a fully ‘integrated’ system is unnecessarily cumbersome and limits our ability to improve our services. Some libraries have bolted discovery platforms on their OPAC that meet their needs better than their ILS does. Others use ERM systems to manage electronic subscriptions. Maybe it’s time to do the same with some of the other backend services that have passed their sell-by date.

Let the dis-integration begin!

By Diane Hillmann, February 18, 2014, 4:33 pm (UTC-5)

Presentations on innovative ways to gather data outside the library silo are happening all over ALA–generally hosted by committees and interest groups using speakers already planning to be at the conference. A great example of the kind of presentation I’m talking about was the Sunday presentation sponsored by the ALCTS CaMMS Cataloging & Classification Research Interest Group produced by the ProMusicaDB project, with founder Christy Crowl and metadata librarian Kimmy Szeto. They provided a veritable feast of slides and stories, all of them illustrating the new ways that we’ll all be operating in the very near future. Their slides should be available on the ALCTS Cataloging and Classification Research IG site sometime soon. [Full disclosure: I spoke at that session too–see previous blog post for more details.]

On the Saturday of Midwinter, I attended 2 parts of the CC:DA meeting (I had to leave to do a presentation to another group in the middle), but I dutifully returned for the last part. It was probably a mistake–my return occurred during the last gasp of a perfectly awful discussion. I had a brief chat with Peter Rolla (the current chair) after the meeting, and continued to think about why I was so appalled during the last part of the meeting. Later, when held hostage in a meeting by a conversation in which I had little interest, I wrote up some of my thoughts.

I would describe the discussion as one of the endless number of highly detailed conversations on improving the RDA rules that have been a “feature” of CC:DA meetings for the past few years. To be honest, I have a limited tolerance for such discussions, though I usually enjoy some of the ones at a less excruciating level of detail.

Somehow this discussion struck me as even more circular than most, and seemed to be aimed at “improving” the rules by limiting the choices allowed to catalogers–in a sense by mechanizing the descriptive process to an extreme degree. Now, I’m no foe of using automated means to create descriptive metadata, either as a sole technique or (preferably) for submission to catalogers or editors to complete. I think we ought to know a lot more about what can be done using technology rather than continue to flog any remaining potential for rule changes intended to push catalogers to supply a level of consistency that isn’t really achievable for humans. If you want consistency–particularly in transcription–use machines. Humans are far better utilized for reviewing the product and correcting errors and adding information to improve its usefulness.

But in cataloging circles, discussing the use of automated methods is generally considered off-topic. When the [technological] revolution comes, catalogers will be the first to go, or so it is too often believed. Copy cataloging and other less ‘professional’ means of cutting costs and increasing productivity is not a happy topic of conversation for this group.

But, looking ahead, I see no letup in this trajectory without some changes. Catalogers love rules, and rules are endlessly improvable, no? Maybe, maybe not, but just put a tech services administrator in the room for some of these discussions, and you’re likely to get a reaction pretty close to mine. But to my mind, the total focus on rules rather than a more practical approach to address the inevitability of change in the business of cataloging is doing more towards ensuring that the human role in the process will be limited in ways that make little sense, except monetarily.

What we need here is to change the conversation, and no group is more qualified to do that than CC:DA. To do that it’s absolutely necessary that its membership become more knowledgeable about what is now possible in automating metadata creation. Without that kind of awareness, it’s impossible to start thinking and discussing how to focus less of CC:DA’s efforts on that part of the cataloging process which should be done by machines, and more on what still needs humans to accomplish. There are several ways to do this. One is by dedicating some of CC:DA’s conference time to bringing in those folks who understand the technology issues to demonstrate, discuss, and collaborate.

Catalogers and their roles have been changing greatly over the past few years, and promises of more change must be taken seriously. Then the ultimate question might be asked: if resistance is futile (and it surely is), how can catalogers learn enough to help frame that change?

By Diane Hillmann, February 5, 2014, 4:23 pm (UTC-5)

Prior to Midwinter I posted the list of presentations I was doing over the course of Midwinter. It seemed only fair to report on some of those sessions, and to share my slides. I thought about posting them separately, since my posts tend to balloon fairly significantly once I get writing (those of you who know me are free to point out that I talk like that, too)–but given that I’ve decided to post more often, y’all will have to live with length.

Saturday, January 25, 2014, 3:00-4:00 p.m., “A Consideration of Holdings in the World Beyond MARC” [Slides]

There were two speakers at this session, at which I spoke second. The first speaker was Rebecca Guenther, who spoke on BibFrame generally as well as the BibFrame approach to holdings. BibFrame currently has a fairly simple approach, for now limited to the simpler holdings needs for non-serials. This is the easy [easier?] part of course, and it will be interesting to see how serial holdings will be integrated with the model.

My presentation briefly surveyed other important holdings work in progress, including a project at the Deutschen National Bibliothek (DNB), the ONIX for Serials Coverage Statement, the current proposals for schema.org, to a brief report on a project my group is considering that would do for MARC Holdings (sample) what we’ve already set up for MARC Bibliographic data at marc21rdf.info.

What struck me when I was setting up the presentation was the amazing variety of work going on in this area. I really didn’t expect that, I confess. But by immersing myself in holdings as I hadn’t done for many moons, I found I was looking at an awful lot of very recent work. And it wasn’t just the diversity of approaches that surprised me, but the varied results as well. The efforts ran the gamut from very complex and comprehensive approaches (ONIX and MFHD) to much simpler approaches. The functions anticipated for each colored the diverse outcomes to a great extent. The ONIX XML schema was easily the most complex–with some ideas based on the MFHD work.

The schema.org effort is, like BibFrame, still in the process of jelling. When looking for the evidence of schema.org holdings, I found myself on a path that had already been abandoned (though it then showed no signs of abandonment). Richard Wallis pointed me at the right place, and the slides have been corrected to fix that problem.

Sunday, January 26, 2014, 8:30-10:00 a.m., “The Other Side of Linked Data: Managing Metadata Aggregation” [Slides]

This session also included two presentations, mine was first this time. My focus was that most people think Linked Open Data (LOD) is about libraries exposing their data to the world, but that’s only half of LOD. The other half is taking advantage of the data others (libraries and non-libraries) are exposing openly. The two fundamental things about the LOD world are both ideas that tend to explode minds. First is the realization that we’re not talking about highly OCLC-curated MARC records, pre-aggregated for easy ingest into traditional library systems. Instead, we are talking about management of statements (which may indeed be records as originally ingested, but to be useful in this multiple choice world must be shredded on the way in and re-aggregated on the way out.) There are many new skills we’ll have to learn (and an awful lot of assumptions that we’ll need to examine closely and maybe toss out the window). This is daunting, but hardly rocket surgery, and the sooner we get going, the better off we’ll be.

The second presentation was from a group working at the Digital Public Library of America (DPLA), which is confronting many of these issues. Their announcement stated:

“This talk will introduce and outline the challenges of aggregating disparate metadata flavors from the perspective of both DPLA staff and representative hubs. We will review next steps and emerging frontiers as well, including improvements to normalization at the hub level and wider adoption of controlled vocabularies and formats for geospatial metadata and usage rights statements.”

And this was exactly what they did. They provided a very juicy look at the real world that faces anyone attempting to deal with the current metadata chaos. This is definitely work to follow, because where they are now will change over time and with experience, providing the rest of us with some really useful insights. Their slides are available from the IG site.

Sunday, January 26, 2014, 10:30-11:30 a.m., “Mapmakers” [Slides]

The Mapmakers presentation was designed to highlight some research I’ve been involved in, along with my colleagues Jon Phipps and Gordon Dunsire. This topic has not received a vast following, but should as experience with new schemas and value vocabularies expands. As is usual, there was another presentation just before ours that gave an exciting view of innovative work in expanding our notion of authority, in particular gathering and managing data from a broad variety of sources. Their work is encountering very similar challenges as the DPLA, though in some ways even more challenging since they often have to develop the sources and bring them into the LOD world.

That presentation focused on work done in the ProMusicaDB project, with founder Christy Crowl and metadata librarian Kimmy Szeto sharing the podium. There was a feast of slides and stories, all of them illustrating the new ways that we’ll all be operating in the very near future. Their slides (including a demo) should be available on the ALCTS Cataloging and Classification Research IG site very shortly (though not yet as of this writing).

By Diane Hillmann, February 3, 2014, 12:14 pm (UTC-5)
Saturday, January 25, 2014, 3:00-4:00 p.m., A Consideration of Holdings in the World Beyond MARC [PCC 203B] Sunday, January 26, 2014, 8:30-10:00 a.m., The Other Side of Linked Data: Managing Metadata Aggregation [PCC 102A] Sunday, January 26, 2014, 10:30-11:30 a.m., Mapmakers [PCC 102A]

Most ALA watchers have noticed a shift from ‘invited talks’ at Interest Group and Committee meetings to requests for proposals from the chairs, from which pool the speakers are chosen. This is, of course, in parallel with changes going on with other professional conferences, and it’s an interesting shift for a number of reasons.

There’s a democratization aspect to this change–the chairs are no longer limited in their choice to people they already know about, thereby potentially increasing the possibility that new and different ideas will get an airing. Maybe this Midwinter someone will come up with an absolutely wonderful and unexpected presentation that rockets the speaker from the unknown mob to the smaller roster of interesting known speakers. This is a good thing, I believe, even though the chance of witnessing such a rocket launch are dauntingly small.

As someone who has been around long enough (and noisily, it must be said) this shift means that I don’t need to wait for invitations to do presentations based on some chair’s idea of what might interest their group (but may no longer interest me), I can go ahead and respond to the calls that are appealing to me. I’d like to think that the result is something fresh enough to be interesting for me to prepare and an audience to listen to, without being totally divorced from prior talks that represent earlier phases. An odd result of this shift in process is that speakers who submit proposals to various committees don’t generally know who else will be speaking at a particular program until after their proposal has been approved, and maybe not even then. This particular aspect has already led to some very interesting lineups at meetings across the conference.

Because I take seriously the idea of not re-using previous talks to the extent that I could become horribly boring, I tend to apply for things that allow me to explore something that isn’t unrelated to what I’ve done before, but at least requires that I rethink something or try a different approach than I’ve used before to expose what I (and the people I work with) are thinking about. I think that’s pretty much what most audiences are looking for, right?

So below are my talks for ALA Midwinter. I may be accompanied by one or another of my colleagues on a couple of these, and will surely have their help building the presentations.

A Consideration of Library Holdings in the World Beyond MARC

Of all the MARC 21 formats, Holdings was the one most clearly designed for machine manipulation. It is granular, flexible, and intended to be used at either a detailed or summary level. It has sometimes frightened potential users because it looks complex (even where it isn’t), and in its ‘native’ form is not particularly human friendly. Some of the complexity arises because there are both display and prediction aspects in the encoding, and not all library systems have developed predictive serial check-in systems supported by MARC Holdings.

Some of the bibliographic metadata efforts now going forward ignore the existing MARC Holdings, sometimes in favor of simpler solutions based on the perception of the waning need for predictive check-in for digital subscriptions. Not much effort has been expended to bring the MARC Holdings format forward into the discussions about changing requirements and re-use of existing standards.

For the ALCTS CRS Committee on Holdings Information, Saturday, January 25, 2014, 3:00-4:00 p.m., PCC 203B.

Holdings has been an interest of mine since I was a law librarian representing the American Association of Law Libraries on MARBI. In the early computer era in libraries, where digital publication was the exception, law publishers demonstrated a great deal of creativity in their publication of updating services, from loose-leaf services and regular republication of standard tools, and law catalogers always had the best examples of holdings problems. These days, most of those materials have been subsumed by various digital tools, which have their own complexities, particularly in the context of versions, republication and compilation.

But the question remains–has what we learned from the pre-digital world of holdings functionality have relevance in the digital era?

The Other Side of Linked Data: Managing Metadata Aggregation

Most of the current activity in the library LOD world has been on publishing library data out of current silos. But part of the point of linked data for libraries is that it opens up data built by others for use within libraries, and has the potential for greater integration of library data within the larger data world. The sticking point for most librarians is that data building and distribution outside the familiar world of MARC seems like a black box, the key held by others.

Traditionally, libraries have relied on specialized system vendors to build the functionality they needed to manage their data. But the discussions I’ve heard too often result in librarians wanting vendors to tell them what they’re planning, and vendors asking librarians what they need and want. In the context of this stalemate, it behooves both library system vendors and librarians to explore the issues around management of more fine-grained metadata so that an informed dialogue around requirements can begin.

For the ALCTS Metadata Interest Group, Sunday, January 26, 2014, 8:30-10:00 a.m., PCC 102A

Transitioning from a rigidly record-based system to a more flexible environment where statement level information can be aggregated and managed is difficult to envision from the vantage point of our current MARC-based world. This has lead to a gap between what we know, and the wider world of linked open data we’d like to participate in. One of the critical steps is to understand how such a world might look, and what it requires of us and our systems. The goal is to be able to move some of that improved understanding to the point of innovation and development.


It’s very clear that there will be no single answer to moving bibliographic metadata into the world beyond MARC, no direct ‘replacement’ for the simple walled garden we all have lived in for 40+ years. While it’s certainly true that the emerging global universe of bibliographic description has continued to expand and seems more chaotic than ever, there are still commonalities of understanding with the world beyond our garden walls that we’re only beginning to identify. How then can we begin to expose our understanding to that universe and develop some consensus paths forward? Specifically, what are the possibilities for using semantic mapping to provide us with the flexibility and extensibility we need to build our common future.

For the ALCTS CaMMS Cataloging & Classification Research Interest Group, Sunday, Jan. 26, 10:30-11:30, PCC 102A.

Librarians too often see ‘mapping’ and think ‘crosswalking’, but the reality is that these are quite different strategies. Crosswalking was a natural fit for the MARC environment, where the ‘one, best’ crosswalk would logically be developed centrally and implemented as part of current application needs. But the limitations of crosswalking make much less sense as we transition into a world where the Semantic Web has begun to take hold (of our heads, if not our systems!).

In the Semantic Web world, maps can contain a variety of relationships (not just the crosswalk ‘same as’), and central development and control is neither necessary nor very useful. This doesn’t mean that we’re all on our own and that collaboration isn’t still our best strategy.

By Diane Hillmann, December 9, 2013, 3:26 pm (UTC-5)