Some of you have already seen the live feed or the recordings for last week’s Code4Lib conference. If you have, you might already know that I was the keynote speaker for that conference. (The archive page is here, my part is about 90 minutes into session 1; slides are available too). The whole story of how I got there is interesting, and beyond that I’d like to talk about what I took away from it. I attended all of Tuesday and Wednesday, and left Thursday morning (after my return from ALA Midwinter in January, I’ve developed a strong disincentive to book the last flights into Ithaca from anywhere), thus missing the Thursday morning events. I’ve since caught up with those recordings.

The invitation came from conference host Robert McDonald, and was totally out of the blue. Code4Lib has an admirable process for choosing keynoters–they have a wiki and backchannel list (that anyone can join), which keeps the voting off the main discussion list. I’ve never attended Code4Lib before, though I’ve been a lurker and an occasional participant in the discussions on the list for some years, and I know many of the regulars. As someone who hadn’t attended the conference before, it never dawned on me to participate in the voting. I didn’t get the most votes, but when the high vote getter turned them down, I was asked. At first I was pretty intimidated by the whole idea, but that passed fairly quickly, and I started to get excited by the challenge it represented, both for me personally, and as a representative of a whole host of librarians who never get a chance to talk to a room full of library programmers. It was clearly not an opportunity to be wasted.

I gave a lot of thought to what I wanted to talk about, and started and abandoned several topics before settling on one. It clicked for me when I participated in a discussion at ALA Midwinter amongst attendees at the organizing meeting for the Linked Library Data IG. The discussion was about the discouraging fact that programmers and librarians (particularly catalogers) don’t seem to be connecting on the important issues of our libraries, instead we talk past one another. I think the general assumption is that this is a cultural divide, and it is on a superficial level, but a much more important reason is that we almost never gather together to discuss where we’re going. We all work for institutions that we believe are critically important in today’s society, but we’re not working together to solve the problems we can see in front of us.

So my talk for C4L covered a number of areas, including advice to programmers on how to find and connect with librarians/catalogers in their institutions who might be ready to work with them more closely, and what the priorities should be for that work. Despite a fairly rough start to the talk (the IU laptop I was using had a new version of PowerPoint that behaved quite oddly in presenter mode), it went fairly well and the response was wonderful. During the rest of that day and the following one, I had some great conversations with other attendees about the issues I brought up, and there will be some follow through on several of those. I was very pleased in particular that my plea for building demonstration projects that would show how the RDA Vocabularies can be used was taken very seriously, and I will be following up on that one.

One question I threw out to the audience was whether anyone had read our article in DLib, ‘RDA Vocabularies: Process, Outcome, Use’. About a half dozen had, but probably twice that many tweeted the URL, so perhaps some more have read it subsequently. I’m not sure why such a disappointing number have seen the article, but I hope that some who are interested in moving away from the frustrating parsing of MARC data will see the light.

I also talked a bit about how the library world had been ill-served by the narrow marketing of RDA as primarily the guidance text (it’s still happening, unfortunately), as well as the whole RDA testing regime. Because the tests crammed RDA data into MARC, it really doesn’t operate as a test of RDA itself, or of the usefulness of FRBR. What we’ve ended up with is a vast amount of misunderstanding: many traditionalists still believe that RDA is not that different from AACR2, while those who believe that RDA isn’t enough change (or the change we need, to coin a phrase) believe the same thing but come to a different conclusion. As I said to the C4L group: “I get why catalogers like MARC, but I don’t get why you guys aren’t all over the RDA Vocabs.”

After my own time in the spotlight, I became just another participant (the difference was that everybody knew who I was and I had to squint at their badges to see who they were). Thankfully nobody got freaked out that I was knitting socks while listening to other people’s presentations (and at least one pulled out her half-knitted sock to show me). With a laptop in front of me (not to mention IRC and Twitter), I wouldn’t have heard a thing. But, listening to the wide variety of presentations, I was very impressed by the amount of creativity, and the diversity of projects presented. I understood most of it, at least at a general level (though not perhaps on an operational one), and took some notes about a few insights I wanted to think about as I work on various projects. It was really a great conference, and the organizers did a fabulous job with everything. Do take a look at the video, and think about how you might make some connections with the catalogers or programmers in your life. We are all in this together, and we need to find better ways to converse and collaborate to make our ideas real.

Oh, and lest I forget, thanks to all the folks who shared their wonderful and special beer with me during the after hours social time in the hospitality suite. You just may have turned me from an always-wine to a sometimes-beer broad. (And don’t worry, Declan, the beer washed out of my jacket just fine!)

By Diane Hillmann, February 15, 2011, 8:41 am (UTC-5)

Some of you may already have seen the announcement from the Cornell Legal Information Institute about our new project for the Library of Congress, where we propose to build some new ways to access legislative information. If you stumbled upon the original announcement (I’m betting few of you did, except for the odd law librarian in the bunch), you’ve perhaps been waiting with bated breath for me to spill more of the details of this, which Tom promised I would in his announcement. I’ve been distracted by a few other things in the meantime (like ALA Midwinter) but figured I’d better fulfill Tom’s promise before I get too far into the project and forget what we were thinking about when we wrote the proposal.

As is usual for those kinds of things, we looked around at what other people were building in other jurisdictions and noticed that a lot of people were using FRBR to model legislative information, including the UK. (For more about that project see this blog post). This decision made no sense to me, in particular (I’m not sure how much my LII colleagues know about FRBR, but I’ve been immersed in it for a while now) and I was pretty adamant that we shouldn’t go down that road. It’s unclear to me what the reasons would be to adopt the FRBR model in a legislative context, but I could speculate that part of it is that there’s been a lot of buzz around FRBR in the bibliographic community, and if you’re trying to do something sooner rather than later, reusing something that’s already there seems attractive. The Library of Congress, in its solicitation, was strongly in favor of reusing not only FRBR, but also the standards they have developed over the years for bibliographic data. In the normal course of things, that all makes a lot of sense, but for us it made much more sense not to adopt an explicitly bibliographic model which worked reasonably well for literary works but not so well for the kind of shape-shifting that goes on in the life of legislation.

From the proposal:

“Traditionally, libraries have approached the question of incorporating specialized kinds of materials into their descriptive workflow by focusing on the similarities between the new materials and the materials for which they normally provide descriptive metadata. In the past, this worked well–materials in newer formats and those for use in special communities were able to be incorporated into existing tools with a minimum of fuss. In the area of legal materials, the treatises, standard monographic materials, and standard serial titles were in general easily incorporated, while loose-leaf services and other materials with updating services were not. Primary legal materials were treated either as collections, as serials, or, in the case of most legislative materials, as standard monographs. Now that the digital revolution is well upon us, with full text more available and users’ experience with search engines generating more pressure to look beyond the simple access to printed materials, we’re starting to see more clearly how limiting our traditional approaches have been.

There are several areas where the traditional bibliographic approaches fail:
* the model of ‘stand-alone’ monographs with few (if any) relationships are
insufficient to provide the functionality desired for specialized legal materials
* the new bibliographic approaches, such as RDA, are based on a FRBR model of published works, which, while rich in relationships, provide neither element sets nor
relationships particularly useful for primary legal materials, legislation in particular
* primary legal materials have traditionally been entered under jurisdiction with collective
uniform titles which are often meaningless to users
* insufficient distinction is made between jurisdiction and place”

Once we decided that we needed to start from the beginning on a model that worked specifically with legislation, and we thought about what kinds of materials we needed to cover, it seemed fairly clear that we had to think about an events-based model. Well sure, you say, but isn’t FRBR about events, too? Definitely, but those events have to do with traditional publications, not legislation, where events like ‘House vote’ constitute the kinds of events we need to think about.

One question that came up was whether this approach would end up building a silo from which legislative materials would never emerge to play well with related legal materials. One way we hope to forestall that possibility is to build the descriptions around these legislative events in ways that they can be reused in other environments, even bibliographic environments. Not too surprisingly for those of you who follow this blog, we’re talking about using some of the strategies for building the legislative data that we used to build the RDA Vocabularies.

We’re going to use the process defined by the DCMI Singapore Framework, which expects us to build use cases to figure out what people want to do with the data, functional requirements from those use cases with which to test our model, and a model that grows from those solid foundations. From there we will define description set profiles for our data, and, we hope, have something useful to talk about. I’m guessing we’ll be talking about it all along the road, if for no other reason that we’re very excited by this project. We think we’ll learn a lot and enjoy the process, so do wish us well with it!

By Diane Hillmann, January 18, 2011, 9:07 am (UTC-5)

At the Friday ‘Big Heads’ meeting much of the conversation revolved around Incrementalism vs. Revolution, as have so many conversations, about so many things. Someone quoted David Mamet (I can’t find the quote) that what we need is sledge hammers, not chisels, and I thought it was a notion too good to pass up as a jumping off point to discuss that meeting.

There were a lot of interesting topics discussed at the meetings, but as is my habit I’m going to focus only on the topics of interest to me. As usual there were a number of vendors in the audience, and when a few of the ‘heads’ at the main table voiced the expectation that they would be depending on the vendor community for help as they experienced additional staff reductions and resource constraints in general, the vendors came up to the microphones to respond. A couple of vendors expressed their concern that the library community in general has not been able to articulate what they want from vendors, and this has made it difficult for them to develop business plans. I hear a variation of this line when I stroll the exhibit halls and talk to vendors about what their plans are for RDA implementation. Almost always I hear that they have not heard from their customers about what they want, and they’re waiting for that before making plans. As a result, when I’m presenting to librarians about RDA, I tell them that they should be talking to their vendors, asking when and how they will be implementing, etc., etc. The problem with that approach is that a) most of the time the librarians don’t know what to ask, beyond the when and how; and b) when they get an answer they often don’t know how to interpret it. Maybe I’m slow, but I’m coming to the conclusion that I should stop telling people to talk to their vendors about RDA. I’m not sure it matters.

I went up to the microphone for one of my usual rants, after hearing quite enough of this dancing around. Here’s the reality, as I see it:

1. Libraries are unlikely to agree on what they want (this has been true in the past, and will likely be true in the future)
2. Given the generally low level of understanding of RDA data issues across the library spectrum (and certainly the vendors), it’s unlikely that any articulation of needs to vendors would represent something that vendors could rely on to build a business plan
3. Vendors are still talking about the provision (e.g. sale) of bibliographic records as a basis for their services to libraries.

My rant included all three of those points and more. Little over a year ago, the R2 report on the marketplace for MARC records (upon which I blogged) assumed that there is a marketplace for MARC records which will continue and that a direct return on investment is possible (or desirable) for creators of data. I said then, and still believe, that such a viewpoint is both unrealistic and in fact destructive to the task of moving forward into a world where data is not the coin of the realm but freely available (this is the basis for linked open data) and the investment and return on investment is around data services, not data sale. After my rant to Big Heads, one of the vendors came up to talk to me and offered up some useful nuggets to support my view: a) they provide records, but don’t make much money on them; b) the realm of digital metadata is vastly more complicated than that for physical metadata. It’s a huge challenge for vendors to operate in this world, but clearly the usual answers are no longer working, even as the data revolution is not yet fully upon us. The inevitable conclusion is that vendors who wait for their customers to tell them what they want may not survive the coming revolution. This is no time for chisels.

In this context it’s good to meditate on Henry Ford’s famous statement: “If I had asked people what they wanted, they would have said faster horses.”

By Diane Hillmann, January 9, 2011, 5:25 pm (UTC-5)

Friday I attended the RDA Update, organized as the “Briefings From RDA Test Participants.” The room was full (overfull, actually), and I ended up sitting in the back on a chair pulled from the main seating area towards the back wall. Beacher Wiggins provided the background and updated the group on the plan and timetable. He suggested that there were three scenarios possible for the decision: one was that the group would agree to adopt RDA, another was that they would decide not to (either for now, or presumably ever), and a third was that they would decide to implement if and when the JSC made some specified changes in the rules. I was a bit taken aback by this last option, since it seemed very heavy handed and somewhat threatening. Of course, there will be options available for any implementing library or group of libraries (national or otherwise), but it seems a bit much to believe that among those options there might not be ways for LC/NLM/NAL to meet their specific or collective needs without holding the US RDA implementation hostage to their desires. If I were representing a non-US constituency (which in a small way I am, as the DCMI liaison to CC:DA) I would certainly take this possibility seriously, if nothing else as a gesture of US-centrism that should be repudiated by the rest of the US and international cataloging and metadata community. By all means, LC/NLM/NAL triumvirate, do what you think best, but don’t throw your considerable weight and credibility around in aid of getting what you think you want, or, just to prove you can. We look bad enough to the international community as it is, please don’t make that worse!

The presentations started out well, with Chris Cronin (U. Chicago) giving a useful summary of his group’s experience. He was followed by Penny Baker (Sterling and Francine Clark Art Institute) who had a very flashy set of slides that did not work well in a room with too much light, and too many people. While various people played with the lights, she tried to get through her slides, but was having trouble seeing the laptop when the lights were down (and her slides were visible), and lost her place a few times. Her main point (as far as I could tell)—that her group was able to show that the RDA relationships worked well in providing ways to link together the very interesting materials they chose to catalog—got lost in the shuffle. Towards the end, someone figured out how to dim the lights sufficient to see the slides without plunging the room into darkness, and the room burst into cheers. The speaker, misinterpreting the audience response, thought she was being cued to finish up, and did so, apologizing as she left, by saying: “Sorry it took so long and was so messy.” The group in the back with me agreed that this was basically the story of RDA, though we should probably not expect a similar apology from JSC.

The remainder of the speakers plodded on with little to say that interested me: they did their testing work, gave their feedback, and determined internally whether they would continue doing ‘RDA Cataloging’ until the big decision comes down from the LC/NLM/NAL triumvirate, presumably on stone tablets for which some poor schlumpf will have to create a preservation strategy.

I have been dubious since the beginning about the usefulness of this testing regime, lately going so far as to compare it with the ‘Security Theater’ we are subjected to at airports these days (I have metal knees, so am always treated to a full, and now even more intrusive ‘pat down’, something that makes me long for a naked scanner at my local airport). The analogy here is that ‘Security Theater’ is to real security as ‘RDA Testing Theater’ is to real testing, one that includes the FRBR part of RDA and not just a smattering of rules changes and some token relationships. I still think that it’s hard to justify the time and expense of the testing that has just concluded, which tests RDA only as used in a MARC environment, not RDA itself. The result of this from the point of the community has been useful insofar as it has provided an avenue for some initial training and participation, but not so useful from the point of view of really providing any understanding of RDA implementation. Far too many catalogers think (hope?) that RDA can be implemented without much change in what they do, which qualifies in my opinion as a very poor result indeed.

By Diane Hillmann, January 9, 2011, 9:33 am (UTC-5)

One continuing theme of the recently concluded DC-2010 is that of the perpetual search for consensus on what the hell DCMI should be doing. I know this continual search for identity is a common phenomenon with this sort of organization, as it is for the human adolescent hovering around the age of 15 years. Like with the teenager, it should be seen as a healthy thing, and as most of us older than 15 know, it pretty much goes on for the remainder of life.

For me the conference was preceded by a half day DCMI Advisory Board meeting, where one topic was the revision of the DCMI mission statement as well as the perennial topic of the conferences and how the conference series can be optimally funded and continued, and what exactly is its value for the organization. As usual, there was not much consensus, either at the beginning or the end of the discussion, though it must be said that the conference itself probably shifted some opinions about the value proposition. Generally the AB meeting has been scheduled after the conference, with the idea that this shift in perspective is a good thing for sparking discussion, but for logistical reasons the meeting was held prior to the conference this year.

As it turned out, this change in placement of the AB meeting was unfortunate, given that Mike Bergman’s keynote on Friday morning contributed some important outside opinion to that basic question of mission. (Mike’s post about the keynote is here.) The fact that Mike arrived in time to sample a good chunk of the conference and to talk to a variety of participants gave his opinion the credibility that only exposure to the culture of the organization and the personalities that affect that culture can bring to a keynote. It was clear to me, in talking to him during the conference, that his view of DCMI was not the insider view of a contentious, financially strapped and sometimes dysfunctional organization, but instead one that included recognition of the experience, knowledge and potential there as well. In a nutshell, Mike Bergman was telling us that DCMI’s role in the emerging linked data world was critical, and should be focused primarily on expanding the presence of useful semantics available to the the web world, closing the ‘semantic gap’ he sees limiting the growth of linked open data.

Later in the day, my task was to lead a discussion of the work of the DCMI/RDA Task Group, of which I’m co-chair with Gordon Dunsire. Gordon and I had both prepared slides for that meeting—mine covering the history and work of the task group, what we’d learned, and what remained, and Gordon’s covering the related important work he’d been doing in parallel with IFLA. I’ve been frustrated for some time with the lack of attention and traction we’ve received for this important work, both from DCMI and the Joint Steering Committee for the Development of RDA (JSC). We have found ourselves at the stage where DCMI is waiting for JSC to make some statement about the work done (in the form of approval), and the JSC is waiting for DCMI to endorse the Task Group’s assertions made about the work done and its usefulness for the Semantic Web—or at least this is the way it seems from the point of view of the co-chairs. It’s as if both groups are standing opposite one another in a middle school gym, the ‘boys’ and the ‘girls’ waiting for someone to move towards the middle. Nobody seems to want to make that first move, and though in the case of the TG, each conversation with representatives of both parties seems to be positive, resolution of the concrete issues moves at a frustratingly glacial pace.

But as I spoke about this work to various people, continuing to think about the ongoing conversation about what the role of DCMI should be, particularly in the context of Mike’s keynote, it struck me that the DCMI/RDA Task Group was in some sense a model for what DCMI could do to fulfill the role Mike saw for us in the world. In essence, the TG came into being because Don Chatham at ALA Publishing took the initiative to bring DCMI and the JSC together, where the DCMI message was “How can we help?” The rest is history, but we seem not to have learned from that how powerful that simple question is, and where it could lead. DC-2010 brought a number of new communities to the conference, representing a variety of groups interested in moving into the wider web world of information, but lacking in-house knowledge and skills necessary to make progress. Helping them move forward requires much more than attracting them to the conference and talking to them in the hall or after tutorials. We need to offer more concrete help, like we did for the JSC, and move the knowledge and experience that the DCMI Community has assembled into the broader world of information.

By Diane Hillmann, October 23, 2010, 11:26 am (UTC-5)

This morning’s highlight was Stu Weibel’s opening keynote address to the assembled conference attendees (yesterday included primarily workshop sessions and tutorials). Stu was asked to talk about DC’s past and future, and he gave many of us food for thought.

The first thing he did was typical Stu—he took a photo of the assembled group. I hope those will show up on his blog sometime soon (if so I’ll link to them). Will they look that much different from those he took 15 years ago, when Dublin Core began, or those taken at various points along the way? As is also typical, he asked how many people were returning participants in DC, and how many were brand new to the DC conference. Surprising to some of us, roughly one third of the group were new participants, a nicely healthy proportion of new people.

Stu asked some interesting questions, and gave DCMI some letter grades for performance in a number of areas. His first question, “Why didn’t we just stop after the 15 elements” suggests the possibility that nothing done since then (the mid-nineties) was worth the effort. He points out that a number of the assumptions made at that time have since been repudiated by experience—the Web is more than just a collection of document-like-objects that can be described in much the same way that we’ve traditionally described library materials. He got a laugh when he reminded us (including the three of us who were actually there in Mar. 1995 when DC was born) that we thought we could solve the syntax problem fairly easily—but of course our notion of syntax in those days was HTML.
Stu gave some personal assessments of the 15 years of DCMI in the form of grades:

For providing an international basis for the effort (A)
For including a diverse group of participants (B+)
For becoming a sustained and solvent organization (D)
Moving work from consensus to completion (C)
Establishing objectives and completing them (C-)
Documentation of decisions (B+) (I have to say I think he was somewhat over generous on this one, for reasons I’ll explain later)

Stu also pointed out the places DCMI got stuck along the way, among those were ‘tarpits’ (his word) of our own making—for example, data models (for which he gave DCMI an ‘F’). Those that were not our fault or not entirely of our making were things like the aforementioned syntax confusion (C+), which he believes stemmed from trying to do too much and getting overwhelmed (probably the rapid evolution of syntaxes had an effect, too). The ‘tarpits’ created by others included LOM (learning object metadata, promulgated primarily by IEEE), and INDECS (a now dead effort that was touted some years ago as a business model based approach).

Some other points Stu made which are hard to argue include the notion that cooperation with organizations with different business models is difficult, but such cooperation is critically important, for reasons around convergence of effort, identification of similar models and related technologies—all amplifying the network effect of what DCMI has done. Increasingly, cooperation is seen as an expensive value, particularly in terms of time and travel, and certainly DCMI has seen those issues having a big effect on conference planning and ability to build on past efforts. Stu also gave DCMI an ‘A’ for its standardization efforts, including the work to make DC a recognized standard via IETF, NISO, and ISO. He pointed out that these efforts were essential to allow DC to be adopted by government agencies and others that have requirements for such an endorsement.

On the Singapore Framework, Stu is dubious, in particular thinking of the four levels of interoperability [link] that include the Description Set Profile (DSC) and the Dublin Core Abstract Model (DCAM). He pointed out that lots of metadata is still used primarily at the lowest, human level, and is not yet useful to machines at the moment. This issue is particularly relevant given that the DCAM is currently under review, with opinions flying around (in the halls, on twitter, etc.) in ways they didn’t even when that document was new, and pretty much nobody understood it. Understanding is still an issue, to a great extent because specification and user documentation are not the same thing (something that the DCMI technical folks don’t seem to understand). Stu contends that the DCAM has failed, and felt that its authors still don’t agree about its motivations or implications. It will be interesting to see whether that assessment is widely held throughout the conference.

On linked data, Stu was clearly somewhat ambivalent, calling it “An aspirational target of great promise and unproven benefit”. Stu was around and in the fray when the RDF standard was still in diapers, and recalled for the audience how easy we all thought it would be then to bring its promise to fruition. At least a decade later, we’re still trying to do that.

In Stu’s opinion, the Web is the data model, and we shouldn’t deviate from it. He pointed out that with the issues of flexibility vs. constraints we are drawn in by the Siren Call of Flexibility but would be better off with more constraints.
On the positive side, Stu points out that the linked data bulge has brought us a strong commitment to identifiers, some useful conventions about vocabularies and syntax, some tools to build ontologies and models, and some expectations of utility due to broad adoption – network benefits, in other words. But we still need to worry about data quality, usefulness, and bridging the boundaries between the existing semantic communities.

Stu is skeptical of linked data as the new grail, but still thinks we’re on an exciting threshold—we need metadata more than ever, but we’re drowning in it. He quoted Tom Baker: “Data that cannot speak for itself will be more vulnerable to becoming irrelevant”.
Stu’s last points covered DCMI as an ongoing experiment in social engineering. He cited Malcolm Gladwell’s article in the Oct. 4 New Yorker where Gladwell asserts that social media are largely broadly disseminated, networked, weak tie activities, with low barriers, low commitment, low persistence. Gladwell contends that systemic change requires strong ties, hierarchical social structures, leadership organization, and f2f work. Stu believes that DCMI is a strong tie phenomenon, and its impact is amplified by this fact.

By Diane Hillmann, October 21, 2010, 4:20 pm (UTC-5)

Starting the end of the month, ALA TechSource is sponsoring a new webinar series about RDA, this one not entirely about the guidelines. It’s called Using RDA: Moving into the Metadata Future and ‘stars’ Karen Coyle, Chris Oliver and moi (in that order), talking about the fullness of the potential that RDA represents for libraries.

Those of you who have seen or listened to a variety of ‘What is RDA And Why Should We Care’ presentations over the past few years may find this different and refreshing. Those of you who’ve paid attention to me and Karen (and our non-mainstream approach to RDA) over the years may still find it interesting, because we keep adding stuff to our usual talks as we struggle through the issues that continue to bedevil us. Do take a look at the announcement here and see if the series might be of use to you and your colleagues.

And certainly, if you’ve got some great ideas for future webinars, let me know!

By Diane Hillmann, October 5, 2010, 1:04 pm (UTC-5)

When I was teaching my RDA course this past Spring, with my virtual students so very present in my thoughts for that time period, I found myself trying to explain to them why I still go to Dublin Core conferences, after all these years. I am one of those people who was around at the birth of DC, back in the dark ages of 1995, and aside from the conferences added retrospectively to the series and the one in Seoul last year, I’ve been to every one. This year, I’m co-chair of the program, something I’ve not done before, and more involved in the internal workings of the conference than ever. I’m sure that I didn’t make much sense to those students, who likely ascribed my long tenure to habit, loyalty, or some other factor.

This year the conference is finally back in the US after many years absence, and the program is looking pretty good, if I do say so myself. As usual there’s a mix of tutorials, papers, and working meetings, and this year the conference is a bit shorter than usual, in an attempt to keep costs down in these hard financial times. It’s also being held in conjunction with ASIS&T, which is being held in the same hotel right after DC-2010.

So why do I keep going? The other conferences I go to as a participant (as differentiated from the ones where I’m an invited speaker) are really down now to two: ALA Annual and ALA Midwinter. They’re big, sprawling, every-librarian-for-her/himself, with thousands and thousands of librarians taking over some hapless city, every restaurant in town, and just about every hotel. DC is a very different kind of thing—it’s small (usually no more than 200 people), generally not more than 50% librarians, if that, and the rest a mix of researchers, implementers and software folks. Quite a few people straddle more than one category, and the group provides an experience that I’ve never found anyplace else. The conference is very international in flavor, and the focus is metadata, and more metadata, unlike, say, JCDL or other conferences with a computer science or web focus, where metadata is a very small part of the program. It’s a place where the metadata geeks at the edges of other communities can feel at home, even when they feel marginalized in their own home communities. I remember one year when a guy I know who had never attended a DC conference before buttonholed me in a corridor to tell me this was the best metadata conference he’d ever attended. It still is.

I always learn new things at the DC conference, always meet a least one new and interesting person (and usually more than one), and there’s more good talk in the hallways and bars than can possibly be taken in. I’m an extrovert, so conferences jazz me up—I relish the intensity, the ongoing conversations from year to year, and the real sense of community, no matter the strong differences of opinion and approach.

I’ll be in Pittsburgh this year, and am hoping to blog a few times while I’m there, if I can manage to find time for it. I hope I’ll see a few of you there too.

By Diane Hillmann, October 5, 2010, 12:49 pm (UTC-5)

Ten days or so ago I took some time out to listen to a webcast by Jenn Riley, ‘RDF for Librarians’, which was well worth the effort. Jenn has been worth watching for a long time, and she’s done us all a service by putting out her slides and a bibliography as part of this effort. All these, plus a recording of the presentation, can be found at: www.dlib.indiana.edu/education/brownbags/ [I should note here that I’ve had trouble accessing the recording, and Jenn and her IT folks are trying to figure out why, but you may not have the same difficulty I had.]

The topic Jenn chose for this presentation is one that has challenged many of us who have been trying to talk about standards developing outside of libraries, like RDF, to traditional librarians. Jenn keeps her eye on the prize throughout, building on what librarians know in order to bring them across the great divide to an understanding of RDF, without getting bogged down in the technical language of the standard itself. She manages to cover a great deal of territory in the presentation, including such difficult topics as blank nodes, graphs, literals, the differences between XML and RDF, and the differences in terminology between the library world and the RDF world. Anyone who is interested in this topic, or has attempted to teach it, should take a look at this presentation and Jenn’s slides.

I was especially appreciative of Jenn’s message about all of us being ‘part of the process’ of the library transition to these new standards—and the fact that she included our RDA work as an example. It’s sometimes difficult to feel recognized as ‘at the table’, representing the interests of libraries, while being largely ignored by the big kahunas of libraryland. Jenn reminds us that each of us a responsibility to lead, and not to wait around for the usual parties to do that for us, and she doesn’t just talk the talk, she’s walking the walk.

Thanks Jenn, and I’m looking forward to seeing you at DC-2010 and talking with you about those thorny issues that still challenge us!

By Diane Hillmann, October 5, 2010, 12:42 pm (UTC-5)

Note: this is being posted simultaneously on two blogs: Metadata Matters and Coyle’s InFormation

“Why don’t libraries just use FOAF for their Person metadata? Why do they insist on creating their own?”

We don’t know how many times we have heard this on various lists. It often is not really posed as a question; in other words, it isn’t asking for an explanation of why libraries do not choose to use FOAF. It’s more rhetorical, along the lines of “Why can’t we all just get along?” But it is worthy of being asked as a real question, and of getting a real answer.

[Note first that the question of FOAF comes up not so much as we consider the current library standards, but in discussions of upcoming standards that will hopefully be based on the FR** family of standards (FRBR, FRAD, FRSAR). ]

A comparison of FOAF Person and the library Person entity (either in MARC authority files, or RDA, or FRAD) shows that there is not one defined element (or “property” as it is called in Semantic Web-ese) that the two have in common. This is not a coincidence; the two vocabularies serve significantly different communities and purposes. This does not mean that they are irreconcilable; the question therefore becomes: What keeps them apart? and can that be overcome?

The key is in the nature of the two communities.

FOAF stands for ‘Friend of a Friend’, which is a clue to its context: the schema is primarily for use in social networking situations. Its focus is on people who are alive and online, and it includes online contact information like email addresses, web sites, work web sites, Facebook IDs, Skype IDs, etc. The name of the person in FOAF is not an identifier, but presumes that the name of the person plus one or more of the contact IDs is enough to distinguish most humans from one another.

Library name data (which is a form of controlled vocabulary, called “name authority data” in library terms) is focused on creating a unique identifier that brings together the different forms of a name used in published materials under one form. Library users, therefore, can expect to find all of the works by or about a named person under a single entry regardless of the various forms of the name that exist in real data. Uniqueness of names is enforced by adding information to a non-unique name, usually the year of birth, but when that isn’t known (especially for persons of antiquity) titles or even areas of endeavor (“poet”) can be added.

To accommodate both the FOAF (social) function and the libraries’ identification function, at the very least the libraries would need to define a sub-property of FOAF Person, one that has a more strict definition and usage. However, for the library “Person” to be designated as more specific than FOAF:Person does not require that these two be in the same vocabulary. That is one of the important features of Semantic Web properties: like any other resource, they can be linked and related to any other resources on the Web.

Why not combine the library and FOAF properties into a single metadata vocabulary? The answer has little to do with technology, but instead relates to the functioning of communities. Metadata standards need to be developed by (and for) actual communities. The FOAF and library communities clearly have different needs, different goals, and are working with fundamentally different use cases. They also are significantly different as communities.

FOAF is being developed by an informal group of developers, and is quite recent in origin. The group is small: the FOAF development email list has about 350 members. Another 350 individuals are listed on the FOAF wiki pages as having a FOAF profile available on the Web. This is obviously not the full extent of FOAF usage, but these numbers reflect the recent development of this kind of metadata.

The library community has hundreds of years of investment in the creation of metadata (even though it was not called that when libraries began to create it). There are at least tens of thousands of libraries in the world, many of which have been in existence for centuries. Library data has its origins in early 19th century book catalogs but has been created in a machine-readable format since the late 1960’s. Library data is created following formal rules governed in part by international agreements, and there are many hundreds of millions of machine-readable bibliographic records in existence that were created based on these library cataloging principles.

Libraries have engaged in wide-spread data sharing for centuries, and with the global networking capabilities of today libraries are actually able to exchange and re-use data on a huge scale. Libraries do not each create metadata for the same book or item, but instead share the metadata created by one library in cooperative efforts oriented towards resource sharing and efficiency.

This sharing is built into the very core of library data management. The ability to use data created by others is supported by standards and those standards form the basis for the library systems. While most users see only the library catalog available to the public, that is only one function of a system that supports purchasing, fund accounting, inventory control, circulation and patron management, and collection analysis. In the Western world these systems are not created and maintained by libraries but by a small number of specialized commercial vendors whose products are specifically created for the library customers using agreed library standards. Thus the very same system can be sold to hundreds or thousands of libraries, creating a viable market base for system development.

A number of the 70,000 libraries contributing to OCLC are using a single standard, MARC21, and others are following international standards such as ISBD that produces standardized bibliographic description. The development of these standards is based on a large scale community process with international participation. It is not a perfect process by any means, and clearly must be updated to meet modern needs and new technologies that have changed the way we work, but the degree of data sharing libraries depend on requires that a formal process be in place to support the standards of this community.

Sharing of data on a large scale is necessitated by the economic reality of the library sector. Libraries face increasingly shrinking budgets while coping with an upswing in demand for their services. Realistically, this means that changes to library data must be carefully coordinated in order to minimize disruption to the complex network of data sharing that makes cost-effective library services management, based on this data, possible. Libraries may appear to be mistrustful of change agents, and in some cases they certainly are, but there is a real need to minimize risk for the community as a whole in order to assure the health of these often financially fragile institutions.

So we come back to the question of libraries and FOAF. In the final analysis, we’re not at all sure that there’s much gain in trying to combine these two approaches, with the differences in their communities and functions. It could be like trying to combine oil and water, requiring compromises that in the end would be less than satisfactory for both communities. One could argue that the difference between the vocabularies and their contexts is a positive, allowing more than one view of the Person entity. As two separately maintained metadata vocabularies, anyone creating metadata can choose from either as needed without sacrificing precision. One can also imagine other views that will arise, such as Persons in medical data or financial data, which would each carry data elements that are neither in FOAF nor library data, from blood type to bank balance. The important thing is to make sure that these vocabularies are properly described and related to each other where possible. That way, each community can manage its own process based on its needs for standards integration, but data can be shared where appropriate.

We could begin with a more detailed discussion between the FOAF and the library communities about their metadata needs. With hundreds of years of experience in representing names in library catalogs, we feel confident that the library community’s knowledge could contribute in general to the use of personal names in the Semantic Web.

By Diane Hillmann, September 10, 2010, 6:20 pm (UTC-5)