A decade ago, when the Open Metadata Registry (OMR) was just being developed as the NSDL Registry, the vocabulary world was a very different place than it is today. At that point we were tightly focussed on SKOS (not fully cooked at that point, but Jon was on the WG that was developing it, so we felt pretty secure diving in).

But we were thinking about versioning in the Open World of RDF even then. The NSDL Registry kept careful track of all changes to a vocabulary (who, what, when) and the only way to get data in was through the user interface. We ran an early experiment in making versions based on dynamic, timestamp-based snapshots (we called them ‘time slices’, Git calls them ‘commit snapshots’) available for value vocabularies, but this failed to gain any traction. This seemed to be partly because, well, it was a decade ago for one, and while it attempted to solve an Open World problem with versioned URIs, it created a new set of problems for Closed World experimenters. Ultimately, we left the versions issue to sit and stew for a bit (6 years!).

All that started to change in 2008 as we started working with RDA, and needed to move past value vocabularies into properties and classes, and beyond that into issues around uploading data into the OMR. Lately, Git and GitHub have started taking off and provide a way for us to make some important jumps in functionality that have culminated in the OMR/GitHub-based RDA Registry. Sounds easy and intuitive now, but it sure wasn’t at the time, and what most people don’t know is that the OMR is still where RDA/RDF data originates — it wasn’t supplanted by Git/Github, but is chugging along in the background. The OMR’s RDF CMS is still visible and usable by all, but folks managing larger vocabularies now have more options.

One important aspect of the use of Git and GitHub was the ability to rethink versioning.

Just about a year ago our paper on this topic (Versioning Vocabularies in a Linked Data World, by Diane Hillmann, Gordon Dunsire and Jon Phipps) was presented to the IFLA Satellite meeting in Paris. We used as our model the way software on our various devices and systems is updated–more and more these changes happen without much (if any) interaction with us.

In the world of vocabularies defining the properties and values in linked data, most updating is still very manual (if done at all), and the important information about what has changed and when is often hidden behind web pages or downloadable files that provide no machine-understandable connections identifying changes. And just solving the change management issue does little to solve the inevitable ‘vocabulary rot’ that can make published ‘linked data’ less and less meaningful, accurate, and useful over time.

Building stable change management practices is a very critical missing piece of the linked data publishing puzzle. The problem will grow exponentially as language versions and inter-vocabulary mappings start to show up as well — and it won’t be too long before that happens.

Please take a look at the paper and join in the conversation!

By Diane Hillmann, September 20, 2015, 6:41 pm (UTC-5)

Most of us in the library and cultural heritage communities interested in metadata are well aware of Tim Berners-Lee’s five star ratings for linked open data (in fact, some of us actually have the mug).

The five star rating for LOD, intended to encourage us to follow five basic rules for linked data is useful, but, as we’ve discussed it over the years, a basic question rises up: What good is linked data without (property) vocabularies? Vocabulary manager types like me and my peeps are always thinking like this, and recently we came across solid evidence that we are not alone in the universe.

Check out: “Five Stars of Linked Data Vocabulary Use”, published last year as part of the Semantic Web Journal. The five authors posit that TBL’s five star linked data is just the precondition to what we really need: vocabularies. They point out that the original 5 star rating says nothing about vocabularies, but that Linked Data without vocabularies is not useful at all:

“Just converting a CSV file to a set of RDF triples and linking them to another set of triples does not necessarily make the data more (re)usable to humans or machines.”

Needless to say, we share this viewpoint!

I’m not going to steal their thunder and list here all five star categories–you really should read the article (it’s short), but only note that the lowest level is a zero star rating that covers LD with no vocabularies. The five star rating is reserved for vocabularies that are linked to other vocabularies, which is pretty cool, and not easy to accomplish by the original publisher as a soloist.

These five star ratings are a terrific start to good practices documentation for vocabularies used in LOD, which we’ve had in our minds for some time. Stay tuned.

By Diane Hillmann, August 7, 2015, 1:50 pm (UTC-5)

Over the past weekend I participated in a Twitter conversation on the topic of meaning, data, transformation and packaging. The conversation is too long to repost here, but looking from July 11-12 for @metadata_maven should pick most of it up. Aside from my usual frustration at the message limitations in Twitter, there seemed to be a lot of confusion about what exactly we mean about ‘meaning’ and how it gets expressed in data. I had a skype conversation with @jonphipps about it, and thought I could reproduce that here, in a way that could add to the original conversation, perhaps clarifying a few things. [Probably good to read the Twitter conversation ahead of reading the rest of this.]

Jon Phipps: I think the problem that the people in that conversation are trying to address is that MARC has done triple duty as a local and global serialization (format) for storage, supporting indexing and display; a global data interchange format; and a focal point for creating agreement about the rules everyone is expected to follow to populate the data (AACR2, RDA). If you walk away from that, even if you don’t kill it, nothing else is going to be able to serve that particular set of functions. But that’s the way everyone chooses to discuss bibframe, or schema.org, or any other ‘marc replacement’.

Diane Hillmann: Yeah, but how does ‘meaning’ merely expressed on a wiki page help in any way? Isn’t the idea to have meaning expressed with the data itself?

Jon Phipps: It depends on whether you see RDF as a meaning transport mechanism or a data transport mechanism. That’s the difference between semantic data and linked data.

Diane Hillmann: It’s both, don’t you think?

Jon Phipps: Semantic data is the smart subset of linked data.

Diane Hillmann: Nice tagline :)

Jon Phipps: Zepheira, and now DC, seem to be increasingly looking at RDF as merely linked data. I should say a transport mechanism for ‘linked’ data.

Diane Hillmann: It’s easier that way.

Jon Phipps: Exactly. Basically what they’re saying is that meaning is up to the receiver’s system to determine. Dc:title of ‘Mr.’ is fine in that world–it even validates according to the ‘new’ AP thinking. It’s all easier for the data producers if they don’t have to care about vocabularies. But the value of RDF is that it’s brilliantly designed to transport knowledge, not just data. RDF data is intended to live in a world where any Thing can be described by any Thing, and all of those descriptions can be aggregated over time to form a more complete description of the Thing Being Described. Knowledge transfer really benefits from Semantic Web concepts like inferences and entailments and even truthiness (in addition to just validation). If you discount and even reject those concepts in a linked data world than you might as well ship your data around as CSV or even SQL files and be done with it.

One of the things about MARC is that it’s incredibly semantically rich (marc21rdf.info) and has also been brilliantly designed by a lot of people over a lot of years to convey an equally rich body of bibliographic knowledge. But throwing away even a small portion of that knowledge in pursuit of a far dumber linked data holy grail is a lot like saying that since most people only use a relatively limited number of words (especially when they’re texting) we have no need for a 50,000 word, or even a 5,000 word, dictionary.

MARC makes knowledge transfer look relatively easy because the knowledge is embedded in a vocabulary every cataloger learns and speaks fairly fluently. It looks like it’s just a (truly limiting) data format so it’s easy to think that replacing it is just a matter of coming up with a fresh new format, like RDF. But it’s going to be a lot harder than that, which is tacitly acknowledged by the many-faceted effort to permanently dumb-down bibliographic metadata, and it’s one of the reasons why I think bibframe.org, bibfra.me, and schema.org might end up being very destructive, given the way they’re being promoted (be sure to Park Your MARC somewhere).

[That’s why we’re so focused on the RDA data model (which can actually be semantically richer than MARC), why we helped create marc21rdf.info, and why we’re working at building out our RDF vocabulary management services.]

Diane Hillmann: This would be a great conversation to record for a podcast ;)

Jon Phipps: I’m not saying proper vocabulary management is easy. Look at us for instance, we haven’t bothered to publish the OMR vocabs and only one person has noticed (so far). But they’re in active use in every OMR-generated vocab.

The point I was making was that we we’re no better, as publishers of theoretically semantic metadata, at making sure the data was ‘meaningful’ by making sure that the vocabs resolved, had definitions, etc.

[P.S. We’re now working on publishing our registry vocabularies.]

By Diane Hillmann, July 16, 2015, 9:35 pm (UTC-5)

In the old days, when I was on MARBI as liaison for AALL, I used to write a fairly detailed report, and after that wrote it up for my Cornell colleagues. The gist of those reports was to describe what happened, and if there might be implications to consider from the decisions. I don’t propose to do that here, but it does feel as if I’m acting in a familiar ‘reporting’ mode.

In an early Saturday presentation sponsored by the Linked Library Data IG, we heard about BibFrame and VIVO. I was very interested to see how VIVO has grown (having seen it as an infant), but was puzzled by the suggestion that it or FOAF could substitute for the functionality embedded in authority records. For one thing, auth records are about disambiguating names, and not describing people–much as some believe that’s where authority control should be going. Even when we stop using text strings as identifiers, we’ll still need that function and should be thinking carefully whether adding other functions makes good sense.

Later on Saturday, at the Cataloging Norms IG meeting, Nancy Fallgren spoke on the NLM collaboration with Zepheira, GW, (and others) on BibFrame Lite. They’re now testing the Kuali OLE cataloging module for use with BF Lite, which will include a triple store. An important quote from Nancy: “Legacy data should not drive development.” So true, but neither should we be starting over, or discarding data, just to simplify data creation, thus losing the ability to respond to the more complex needs in cataloging, which aren’t going away, (a point demonstrated usefully in the recent Jane-athons).

I was the last speaker on that program, and spoke on the topic of “What Can We Do About Our Legacy Data?” I was primarily asking questions and discussing options, not providing answers. The one thing I am adamant about is that nobody should be throwing away their MARC records. I even came up with a simple rule: “Park the MARC”. After all, storage is cheap, and nobody really knows how the current situation will settle out. Data is easy to dumb down, but not so easy to smarten up, and there may be do-overs in store for some down the road, after the experimentation is done and the tradeoffs clearer.

I also attended the BibFrame Update, and noted that there’s still no open discussion about the ‘classic’ (as in ‘Classic Coke’) BibFrame version used by LC, and the ‘new’ (as in ‘New Coke’) BibFrame Lite version being developed by Zepheira, which is apparently the vocabulary they’re using in their projects and training. It seems like it could be a useful discussion, but somebody’s got to start it. It’s not gonna be me.

The most interesting part of that update from my point of view was hearing Sally McCallum talk about the testing of BibFrame by LC’s catalogers. The tool they’re planning on using (in development, I believe) will use RDA labels and include rule numbers from the RDA Toolkit. Now, there’s a test I really want to hear about at Midwinter! But of course all of that RDA ‘testing’ they insisted on several years ago to determine if the RDA rules could be applied to MARC21 doesn’t (can’t) apply to BibFrame Classic so … Will there be a new round of much publicized and eagerly anticipated shared institutional testing of this new tool and its assumptions? Just askin’.

By Diane Hillmann, July 10, 2015, 10:10 am (UTC-5)

The RDA Development Team started talking about developing training for the ‘new’ RDA, with a focus on the vocabularies, in the fall of 2014. We had some notion of what we didn’t want to do: we didn’t want yet another ‘sage on the stage’ event, we wanted to re-purpose the ‘hackathon’ model from a software focus to data creation (including a major hands-on aspect), and we wanted to demonstrate what RDA looked like (and could do) in a native RDA environment, without reference to MARC.

This was a tall order. Using RIMMF for the data creation was a no-brainer: the developers had been using the RDA Registry to feed new vocabulary elements into their their software (effectively becoming the RDA Registry’s first client), and were fully committed to FRBR. Deborah Fritz had been training librarians and other on RIMMF for years, gathering feedback and building enthusiasm. It was Deborah who came up with the Jane-athon idea, and the RDA Development group took it and ran with it. Using the Jane Austen theme was a brilliant part of Deborah’s idea. Everybody knows about JA, and the number of spin offs, rip-offs and re-tellings of the novels (in many media formats) made her work a natural for examining why RDA and FRBR make sense.

One goal stated everywhere in the marketing materials for our first Jane outing was that we wanted people to have fun. All of us have been part of the audience and on the dais for many information sessions, for RDA and other issues, and neither position has ever been much fun, useful as the sessions might have been. The same goes for webinars, which, as they’ve developed in library-land tend to be dry, boring, and completely bereft of human interaction. And there was a lot of fun at that first Jane-athon–I venture to say that 90% of the folks in the room left with smiles and thanks. We got an amazing response to our evaluation survey, and the preponderance of responses were expansive, positive, and clearly designed to help the organizers to do better the next time. The various folks from ALA Publishing who stood at the back and watched the fun were absolutely amazed at the noise, the laughter, and the collaboration in evidence.

No small part of the success of Jane-athon 1 rested with the team leaders at each table, and the coaches going from table to table helping out with puzzling issues, ensuring that participants were able to create data using RIMMF that could be aggregated for examination later in the day.

From the beginning we thought of Jane 1 as the first of many. In the first flush of success as participants signed up and enthusiasm built, we talked publicly about making it possible to do local Jane-athons, but we realized that our small group would have difficulty doing smaller events with less expertise on site to the same standard we set at Jane-athon 1. We had to do a better job in thinking through the local expansion and how to ensure that local participants get the same (or similar) value from the experience before responding to requests.

As a step in that direction CILIP in the UK is planning an Ag-athon on May 22, 2015 which will add much to the collective experience as well as to the data store that began with the first Jane-athon and will be an increasingly important factor as we work through the issues of sharing data.

The collection and storage of the Jane-athon data was envisioned prior to the first event, and the R-Balls site was designed as a place to store and share RIMMF-based information. Though a valuable step towards shareable RDA data, rballs have their limits. The data itself can be curated by human experts or available with warts, depending on the needs of the user of the data. For the longer term, RIMMF can output RDF statements based on the rball info, and a triple store is in development for experimentation and exploration. There are plans to improve the visualization of this data and demonstrate its use at Jane-athon 2 in San Francisco, which will include more about RDA and linked data, as well as what the created data can be used for, in particular, for new and improved services.

So, what are the implications of the first Jane-athon’s success for libraries interested in linked data? One of the biggest misunderstandings floating around libraryland in linked data conversations is that it’s necessary to make one and only one choice of format, and eschew all others (kind of like saying that everyone has to speak English to participate in LOD). This is not just incorrect, it’s also dangerous. In the MARC era, there was truly no choice for libraries–to participate in record sharing they had to use MARC. But the technology has changed, and rapidly evolving semantic mapping strategies [see: dcpapers.dublincore.org/pubs/article/view/3622] will enable libraries to use the most appropriate schemas and tools for creating data to be used in their local context, and others for distributing that data to partners, collaborators, or the larger world.

Another widely circulated meme is that RDA/FRBR is ‘too complicated’ for what libraries need; we’re encouraged to ‘simplify, simplify’ and assured that we’ll still be able to do what we need. Hmm, well, simplification is an attractive idea, until one remembers that the environment we work in, with evolving carriers, versions, and creative ideas for marketing materials to libraries is getting more complex than ever. Without the specificity to describe what we have (or have access to), we push the problem out to our users to figure out on their own. Libraries have always tried to be smarter than that, and that requires “smart” , not “dumb”, metadata.

Of course the corollary to the ‘too complicated’ argument lies the notion that a) we’re not smart enough to figure out how to do RDA and FRBR right, and b) complex means more expensive. I refuse to give space to a), but b) is an important consideration. I urge you to take a look at the Jane-athon data and consider the fact that Jane Austen wrote very few novels, but they’ve been re-published with various editions, versions and commentaries for almost two centuries. Once you add the ‘based on’, ‘inspired by’ and the enormous trail created by those trying to use Jane’s popularity to sell stuff (“Sense and Sensibility and Sea Monsters” is a favorite of mine), you can see the problem. Think of a pyramid with a very expansive base, and a very sharp point, and consider that the works that everything at the bottom wants to link to don’t require repeating the description of each novel every time in RDA. And we’re not adding notes to descriptions that are based on the outdated notion that the only use for information about the relationship between “Sense and Sensibility and Sea Monsters” and Jane’s “Sense and Sensibility” is a human being who looks far enough into the description to read the note.

One of the big revelations for most Jane-athon participants was to see how well RIMMF translated legacy MARC records into RDA, with links between the WEM levels and others to the named agents in the record. It’s very slick, and most importantly, not lossy. Consider that RIMMF also outputs in both MARC and RDF–and you see something of a missing link (if not the Golden Gate Bridge :-) .

Not to say there aren’t issues to be considered with RDA as with other options. There are certainly those, and they’ll be discussed at the Jane-In in San Francisco as well as at the RDA Forum on the following day, which will focus on current RDA upgrades and the future of RDA and cataloging. (More detailed information on the Forum will be available shortly).

Don’t miss the fun, take a look at the details and then go ahead and register. And catalogers, try your best to entice your developers to come too. We’ll set up a table for them, and you’ll improve the conversation level at home considerably!

By Diane Hillmann, May 18, 2015, 10:13 am (UTC-5)

I’ve been back from Chicago for just over a week now, but still reflecting on a very successful Jane-athon pre-conference the Friday before Midwinter. And the good news is that our participant survey responses agree with the “successful” part, plus contain a lot of food for thought going forward. More about that later …

There was a lot of buzz in the Jane-athon room that day, primarily from the enthusiastic participants, working together at tables, definitely having the fun we promised. Afterwards, the buzz came from those who wished they’d been there (many on Twitter #Janeathon) and others that wanted us to promise to do it again. Rest assured–we’re planning on another one in San Francisco at ALA Annual, but it will probably be somewhat different because by then we’ll have a better support infrastructure and will be able to be more concrete about the question of ‘what do you do with the data once you have it?’ If you’re particularly interested in that question, keep an eye on the rballs.info site, where new resources and improvements will be announced.

Rballs? What the heck are those? Originally they were meant to be ‘RIMMF-balls’, but then we started talking about ‘resource-balls’, and other such wanderings. The ‘ball’ part was suggested by ‘tar-balls’ and ‘mudballs’ (mudball was a term of derision in the old MARBI days, but Jon and I started using it more generally when we were working on aggregated records in NSDL).

So, how did we come up with such a crazy idea as a Jane-athon anyway? The idea came from Deborah Fritz, who’d been teaching about RDA for some time, plus working with her husband Richard on the RIMMF (RDA In Many Metadata Formats) tool, which is designed to allow creation of RDA data and export to RDF. The tool was upgraded to version 3 for the Jane-athon, and Deborah added some tutorials so that Jane-athon participants could get some practice with RIMMF beforehand (she also did online sessions for team leaders and coaches).

Deborah and I had discussed many times the frustration we shared with the ‘sage on the stage’ model of training, which left attendees to such events unhappy with the limitations of that model. They wanted something concrete–they usually said–something they could get their teeth into. Something that would help them visualize RDA out of the context of MARC. The Jane-athon idea promised to do just that.

I had done a prototype session of the Jane-athon with some librarians from the University of Hawaii (Nancy Sack did a great job organizing everything, even though a dodgy plane made me a day late to the party!) We got some very useful evaluations from that group, and those contributed to the success of the official Chicago debut.

So a crazy idea, bolstered by a lot of work and a whole lot of organizational effort, actually happened, and was even better than we’d dared to hope. There was a certain chaos on the day, which most people accepted with equanimity, and an awful lot of learning of the best kind. The event couldn’t have happened without Deborah and Richard Fritz, Gordon Dunsire, and Jon Phipps, each of whom had a part to play. Jamie Hennelly from ALA Publishing was instrumental in making the event happen, despite his reservations about herding the organizer cats.

And, as the cherry on top: After the five organizers finished their celebratory dinner later in the evening after the Jane-athon, we were all out on the sidewalk looking for cabs. A long black limousine pulled up, and asked us if we wanted a ride. Needless to say, we did, and soon pulled up in style in front of the Hyatt Regency on Wacker. Sadly, there was no one we knew at the front of the hotel, but many looked askance at the somewhat scruffy mob who piled out of the limo, no doubt wondering who the heck we were.

What’s up next? We think we’re on the path of a new data sharing paradigm, and we’ll run with that for the next few months, and maybe riff on that in San Francisco. Stay tuned! And do download a copy of RIMMF and play–there are rballs to look at and use for your purposes.

P.S. A report of the evaluation survey will be on RDA-L sometime next week.

By Diane Hillmann, February 14, 2015, 2:43 pm (UTC-5)

The planning for the Midwinter Jane-athon pre-conference has been taking up a lot of my attention lately. It’s a really cool idea (credit to Deborah Fritz) to address the desire we’ve been hearing for some time for a participatory, hands on, session on RDA. And lets be clear, we’re not talking about the RDA instructions–this is about the RDA data model, vocabularies, and RDA’s availability for linked data. We’ll be using RIMMF (RDA in Many Metadata Formats) as our visualization and data creation tool, setting up small teams with leaders who’ve been prepared to support the teams and a wandering phalanx of coaches to give help on the fly.

Part of the planning has to do with building a set of RIMMF ‘records’ to start with, for participants to add on their own resources and explore the rich relationships in RDA. We’re calling these ‘r-balls’ (a cross between RIMMF and tarballs). These zipped-up r-balls will be available for others to use for their own homegrown sessions, along with instructions for using RIMMF and setting up a Jane-athon (or other themed -athon), and also how to contribute their own r-balls for the use of others. In case you’ve not picked it up, this is a radically different training model, and we’d like to make it possible for others to play, too.

That’s the plan for the morning. After lunch we’ll take a look at what we’ve done, and prise out the issues we’ve encountered, and others we know about. The hope is that the participants will walk out the door with both an understanding of what RDA is (more than the instructions) and how it fits into the emerging linked data world.

I recently returned from a trip to Honolulu, where I did a prototype Jane-athon workshop for the Hawaii Library Association. I have to admit that I didn’t give much thought to how difficult it would be to do solo, but I did have the presence of mind to give the organizer of the workshop some preliminary setup instructions (based on what we’ll be doing in Chicago) to ensure that there would be access to laptops with software and records pre-loaded, and a small cadre of folks who had been working with RIMMF to help out with data creation on the day.

The original plan included a day before the workshop with a general presentation on linked data and some smaller meetings with administrators and others in specialized areas. It’s a format I’ve used before and the smaller meetings after the presentation generally bring out questions that are unlikely to be asked in a larger group.

What I didn’t plan for was that I wouldn’t be able to get out of Ithaca on the appointed day (the day before the presentation) thanks not to bad weather, but instead to a non-functioning plane which couldn’t be repaired. So after a phone discussion with Hawaii, I tried again the next day, and everything went smoothly. On the receiving end there was lots of effort expended to make it all work in the time available, with some meetings dribbling into the next day. But we did it, thanks to organizer Nancy Sack’s prodigious skills and the flexibility of all concerned.

Nancy asked the Jane-athon participants to fill out an evaluation, and sent me the anonymized results. I really appreciated that the respondents added many useful (and frank) comments to the usual range of questions. Those comments in particular were very helpful to me, and were passed on to the other MW Jane-athon organizers. One of the goals of the workshop was to help participants visualize, using RIMMF, how familiar MARC records could be automatically mapped into the FRBR structure of RDA, and how that process might begin to address concerns about future workflow and reuse of MARC records. Another goal was to illustrate how RDA’s relationships enhanced the value of the data, particularly for users. For the most part, it looked as if most of the participants understood the goals of the workshop and felt they had gotten value from it.

But there were those who provided frank criticism of the workshop goals and organization (as well as the presenter, of course!). Part of these criticisms involved the limitations of the workshop, wanting more information on how they could put their new knowledge to work, right now. The clearest expression of this desire came in as follows:

“I sort of expected to be given the whole road map for how to take a set of data and use LOD to make it available to users via the web. In rereading the flyer I see that this was not something the presenter wanted to cover. But I think it was apparent in the afternoon discussion that we wanted more information in the big picture … I feel like I have an understanding of what LOD is, but I have no idea how to use it in a meaningful way.”

Aside from the time constraints–which everyone understood–there’s a problem inherent in the fact that very few active LOD projects have moved beyond publishing their data (a good thing, no doubt about it) to using the data published by others. So it wasn’t so much that I didn’t ‘want’ to present more about the ‘bigger picture’, there wasn’t really anything to say aside from the fact that the answer to that question is still unclear (and I probably wasn’t all that clear about it either). If I had a ‘road map’ to talk about and point them to, I certainly would have shared it, but sadly I have nothing to share at this stage.

But I continue to believe that just as progress in this realm is iterative, it is hugely important that we not wait for the final answers before we talk about the issues. Our learning needs to be iterative too, to move along the path from the abstract to the concrete along with the technical developments. So for MidWinter, we’ll need to be crystal clear about what we’re doing (and why), as well as why there are blank areas in the road-map.

Thanks again to the Hawaii participants, and especially Nancy Sack, for their efforts to make the workshop happen, and the questions and comments that will improve the Jane-athon in Chicago!

For additional information, including a link to register, look here. Although I haven’t seen the latest registration figures, we’re expecting to fill up, so don’t delay!

[these are the workshop slides]

[these are the general presentation slides]

By Diane Hillmann, December 19, 2014, 10:22 am (UTC-5)

I know many of you are puzzled by this event, so do take a look at a rundown of the plans on the RDA Toolkit Blog.

Not so surprisingly, we were inspired by the notion of a hackathon, but it had to be focused on something other than computer code and application building. All of us have heard conflicting opinions about whether RDA can be fully functional, whether FRBR works and will benefit users, or whether it’s just all too complicated. The big gap in addressing these questions has been the challenge in doing something hands-on instead of the usual sage-on-the-stage doling out large piles of handouts. There are still realities that need to be recognized, as we take a hands-on look at RDA and build some real RDA data.

First of these realities is that RDA has been in development for a hell of a long time, and the rules (the part that gets the most attention, and some think really IS the whole of RDA) started out as AACR3. As one who’s been watching this space (from the outside and the inside) since the beginning, I can confirm that the the notion of AACR3 is a historical artifact, nothing to do with what RDA has become.

I’ve been ranting and railing for years (too many to count) that RDA must be more than rules. And it is–see the RDA Registry for evidence of that. This leads me to the second reality: all of us are learning as we go. The first iteration of the RDA Vocabularies, developed by the DCMI/RDA Task Group after a famous meeting in London in the Spring of 2007, were never published. The published version, much improved, was released early in 2013 along with the new RDA Registry. The learning-by-doing was happening in a lot of other standards-focused groups: IFLA and W3C for example. FRBR, an essential part of the RDA model, was evolving along with RDA, and that fact led to a couple of interesting compromises, still working themselves out.

I can promise you that the Jane-athon will reflect all of those realities, and in addition build out the community familiar with the lessons yet to be learned. There won’t be any papering over of gaps, downplaying of issues, or anything like that. At the Jane-athon we will demonstrate that building real RDA records in the context of FRBR is not a future dream, it’s happening now. What you will see as a participant is the reality–the ability to work within a FRBR flow, to import MARC records and see the system map them into FRBR constructs, to create links with NAF information, and view the results as a tree that highlights the relationships.

Perhaps most important, we want to have fun with this. There will be no quizzes, no grades, no transcripts. That’s why we chose to focus on two sets of materials with great potential to benefit from a FRBR-based approach. Early in the day you’ll walk through the business of creating cataloging for Blade Runner resources (original book by Philip K. Dick), translations, film, etc. After that we’ll turn the group loose on Jane Austen (with some made-ahead basic data). After the flurry of data creation, we’ll be looking at the results, highlighting issues that come up, and not incidentally, getting some feedback from the participants about the tools, processes, and the beta-Jane-athon in its entirety.

We know (and welcome the fact) that not everyone attending will be a cataloger, much less all that familiar with RDA. There will be a place and a role for everyone who wants to learn more, and to dig in and get their hands [virtually] dirty. There is no need to cram for this event, or to study the RDA rules or cataloging before you come. If you want to get a bit familiar with RIMMF before you come, by all means take a look at the site, download the software, and play. The only requirement is an open mind and some excitement about the possibilities (some trepidation is okay too).

Once you have registered for ALA Midwinter in Chicago, you can sign up for the Jane-athon.The Jane-athon is already available as a paid addition of the full registration for ALA Midwinter in Chicago.

Please feel free to use the comments portion of this post to ask questions, or use the RDA-L list to bring up questions and concerns.

We hope to see you there!

By Diane Hillmann, November 30, 2014, 2:59 pm (UTC-5)

Everyone is getting tired of the sage-on-the-stage style of preconferences, so when Deborah Fritz suggested a hackathon (thank you Deborah!) to the RDA Dev Team, we all climbed aboard and started thinking about what that kind of event might look like, particularly in the ALA Midwinter context. We all agreed: there had to be a significant hands-on aspect to really engage those folks who were eager to learn more about how the RDA data model could work in a linked data environment, and, of course, in their own home environment.

We’re calling it a Jane-athon, which should give you a clue about the model for the event: a hackathon, of course! The Jane Austen corpus is perfect to demonstrate the value of FRBR, and there’s no lack of interesting material to look at– media materials, series, spin-offs of every description–in addition to the well known novels. So the Jane-athon will be partially about creating data, and partially about how that data fits into a larger environment. And did you know there is a Jane Austen bobblehead?

We think there will be a significant number of people who might be interested in attending, and we figured that getting the world out early would help prospective participants make their travel arrangements with attendance in mind. Sponsored by ALA Publishing, the Jane-athon will be on the Friday before the midwinter conference (the traditional pre-conference day), and though we don’t yet have registration set up, we’ll make sure everyone knows when that’s available. If you think, as we do that this event will be the hit of Midwinter, be sure to watch for that announcement, and register early! If the event is successful, you’ll be seeing others in subsequent ALA conferences.

So, what’s the plan and what will participants get out of it?

The first thing to know is that there will be tables and laptops to enable small groups to work together for the ‘making data’ portion of the event. We’ll be asking folks who have laptops they can bring to Chicago to plan on bringing theirs. We’ll be using the latest version of a new bibliographic metadata editor called RIMMF (“RDA In Many Metadata Formats”–not yet publicly available–but soon. Watch for it on the TMQ website). We encourage interested folks to download the current beta version and play with it–it’s a cool tool and really is a good one to learn about.

In the morning, we’ll form small cataloging groups and use RIMMF to do some FRBRish cataloging, starting from MARC21 and ending up with RDA records exported as RDF Linked Data. In the afternoon we’ll all take a look at what we’ve produced, share our successes and discoveries, and discuss the challenges we faced. In true hackathon tradition we’ll share our conclusions and recommendations with the rest of the library community on a special Jane-athon website set up to support this and subsequent Jane-athons.

Who should attend?

We believe that there will be a variety of people who could contribute important skills and ideas to this event. Catalogers, of course, but also every flavor of metadata people, vendors, and IT folks in libraries would be warmly welcomed. But wouldn’t tech services managers find it useful? Oh yes, they’d be welcomed enthusiastically, and I’m sure their participation in the discussion portion of the event in the afternoon will bring out issues of interest to all.

Keep in mind, this is not cataloging training, nor Toolkit training, by any stretch of the imagination. Neither will it be RIMMF training or have a focus on the RDA Registry, although all those tools are relevant to the discussion. For RIMMF, particularly, we will be looking at ways to ensure that there will be a cadre of folks who’ve had enough experience with it to make the hands-on portion of the day run smoothly. For that reason, we encourage as many as possible to play with it beforehand!

Our belief is that the small group work and the discussion will be best with a variety of experience informing the effort. We know that we can’t provide the answers to all the questions that will come up, but the issues that we know about (and that come up during the small group work) will be aired and discussed.

By Diane Hillmann, October 27, 2014, 1:57 pm (UTC-5)

In my post last week, I mentioned a paper that Gordon Dunsire, Jon Phipps and I had written for the IFLA Satellite Meeting in Paris last month “Linked Data in Libraries: Let’s make it happen!” (note the videos!). I wanted to talk about the paper and why we wrote it, but I’m not just going to summarize it–I wouldn’t want to spoil the paper for anyone!

The paper, “Versioning Vocabularies in a Linked Data World”, was written in part because we’d seen far too many examples of vocabulary management and distribution that paid little or no attention to the necessity to maintain vocabularies over time and to make them available (over and over again, of course) to the data providers using them. It goes without saying that the vocabularies were expected to change over time, but in too many cases, vocabulary owners distributed changes in document form, or as files with new data embedded but no indication of what had changed, or worse: nothing.

We have been thinking about this problem for a long time. Even the earliest instance of the NSDL Registry (precursor of the current Open Metadata Registry, or OMR, as we like to call it) incorporated a ‘history’ view of the data, basically the ‘who, what, when’ of every change made in every vocabulary. Later on, we added the ability to declare ‘versions’ of the vocabularies themselves, taking advantage of that granular history data, for those trying to manage the updating of their ‘product’ in a rational manner. Sadly enough, not very many of our users took advantage of that feature, and we’re not entirely sure why not, but there it was. Jon has always been frustrated with our first passes at this problem, and after Gordon and I discussed the problem with others at DC-2013 last year, and my rant about the lack of version control on id.loc.gov came out, it seemed time to think about the issue again.

At that point we were also planning our own big time versioning event: the unpublished first version of the RDA Element Sets were about to make their re-debut in ‘published’ form, reorganized, and with new URIs. Jon was also working on the GitHub connection with the OMR underlying the new RDA Registry site, working in a more automated mode as planned. He and Gordon and I had been discussing a new approach for some time, based on the way software is versioned and distributed, which is well-supported in Git and GitHub. So, as we drove back from ALA Midwinter in Philadelphia in January of last year, Jon and I blocked out the paper we’d agreed to do with Gordon on how we thought versioning should work in the semantic vocabulary world.

Consider: how do all of us computer nerds update our applications? Do we have to go to all sorts of websites (sometimes, but not always, prompted by an email) to determine which applications have changed and invoke an update? Well, sure, sometimes we do (particularly when they want more money!), but since the advent of the App Store and Google Play, we can do our updates much more easily, and for the most part those updates are ‘pushed’ to us for decisions on whether we want to update or not, we are told in a general way what has changed, and we click … and it’s done.

This is the way updates should happen in the Semantic Web data world, increasingly dependent on element sets and value vocabularies to provide descriptions of products of all kinds in order to provide access, drive sales or eyeballs, or support effective connections between resources. Now that we’re all reconciled to using URIs instead of text (even if our data hasn’t yet made that transition), shouldn’t we consider an important upside of that change, a simpler and more useful way to update our data?

So, I’ll quit there–go read the paper and let us know what you think. Don’t miss Gordon’s slides from Paris, available on his website. Note especially the last question on his final slide: “Is it time to get serious about linked data management?” We think it’s past time. After all, ‘management’ is our middle name.

Note: As of this week the video of Gordon’s presentation in Paris is now available.

By Diane Hillmann, September 22, 2014, 12:01 pm (UTC-5)