Metadata standards is a huge topic and evaluation a difficult task, one I’ve been involved in for quite a while. So I was pretty excited when I saw the link for “DRAFT Principles for Evaluating Metadata Standards”, but after reading it? Not so much. If we’re talking about “principles” in the sense of ‘stating-the-obvious-as-a-first-step’, well, okay—but I’m still not very excited. I do note that the earlier version link uses the title ‘draft checklist’, and I certainly think that’s a bit more real than ‘draft principles’ for this effort. But even taken as a draft, the text manages to use lots of terms without defining them—not a good thing in an environment where semantics is so important. Let’s start with a review of the document itself, then maybe I can suggest some alternative paths forward.

First off, I have a problem with the preamble: “These principles are intended for use by libraries, archives and museum (LAM) communities for the development, maintenance, governance, selection, use and assessment of metadata standards. They apply to metadata structures (field lists, property definitions, etc.), but can also be used with content standards and value vocabularies”. Those tasks (“development, maintenance, governance, selection, use and assessment” are pretty all encompassing, but yet the connection between those tasks and the overall “evaluation” is unclear. And, of course, without definitions, it’s difficult to understand how ‘evaluation’ relates to ‘assessment’ in this context—are they they same thing?

Moving on to the second part about what kind of metadata standards that might be evaluated, we have a very general term, ‘metadata structures’, with what look to be examples of such structures (field lists, property definitions, etc.). Some would argue (including me) that a field list is not a structure without a notion of connections between the fields; and although property definitions may be part of a ‘structure’ (as I understand it, at least), they are not a structure, per se. And what is meant by the term ‘content standards’, and how is that different from ‘metadata structures’? The term ’value vocabularies’ goes by many names, and is not something that can go without a definition. I say this as an author/co-author of a lot of papers that use this term, and we always define it within the context of the paper for just that reason.

There are many more places in the text where fuzziness in terminology is a problem (maybe not a problem for a checklist, but certainly for principles). Some examples:

1. What is meant by ’network’? There are many different kinds, and if you mean to refer to the Internet, for goodness sakes say so. ‘Things’ rather than ‘strings’ is good, but it will take a while to make it happen in legacy data, which we’ll be dealing with for some time, most likely forever. Prospectively created data is a bit easier, but still not a cakewalk — if the ‘network’ is the global Internet, then “leveraging ‘by-reference’ models” present yet-to-be-solved problems of network latency, caching, provenance, security, persistence, and most importantly: stability. Metadata models for both properties and controlled values are an essential part of LAM systems and simply saying that metadata is “most efficient when connected with the broader network” doesn’t necessarily make it so.

2. ‘Open’ can mean many things. Are we talking specific kinds of licenses, or the lack of a license? What kind of re-use are you talking about? Extension? Wholesale adoption with namespace substitution? How does semantic mapping fit into this? (In lieu of a definition, see the paper at (1) below)

3. This principle seems to imply that “metadata creation” is the sole province of human practitioners and seriously muddies the meaning of the word creation by drawing a distinction between passive system-created metadata and human-created metadata. Metadata is metadata and standards apply regardless. What do you mean by ‘benefit user communities’? Whose communities? Please define what is meant by ‘value’ in this context? How would metadata practitioners ‘dictate the level of description provided based on the situation at hand’?

4. As an evaluative ‘principle’ this seems overly vague. How would you evaluate a metadata standard’s ability to ‘easily’ support ‘emerging’ research? What is meant by ‘exchange/access methods’ and what do they have to do with metadata standards for new kinds of research?

5. I agree totally with the sentence “Metadata standards are only as valuable and current as their communities of practice,” but the one following makes little sense to me. “ … metadata in LAM institutions have been very stable over the last 40 years …” Really? It could easily be argued that the reason for that perceived stability is the continual inability of implementers to “be a driving force for change” within a governance model that has at the same time been resistant to change. The existence of the DCMI usage board, MARBI, the various boards advising the RDA Steering Committee, all speak to the involvement of ‘implementers’. Yet there’s an implication in this ‘principle’ that stability is liable to no longer be the case and that implementers ‘driving’ will somehow make that inevitable lack of stability palatable. I would submit that stability of the standard should be the guiding principle rather than the democracy of its governance.

6. “Extensible, embeddable, and interoperable” sounds good, but each is more complex than this triumvirate seems. Interoperability in particular is something that we should all keep in mind, but although admirable, interoperability rarely succeeds in practice because of the practical incompatibility of different models. DC, MARC21, BibFrame, RDA, and Schema.org are examples of this — despite their ‘modularity’ they generally can’t simply be used as ‘modules’ because of differences in the thinking behind the model and their respective audiences.

I would also argue that ‘lite style implementations’ make sense only if ‘lite’ means a dumbed-down core that can be mapped to by more detailed metadata. But stressing the ‘lite implementations’ as a specified part of an overall standard gives too much power to the creator of the standard, rather than the creator of the data. Instead we should encourage the use of application profiles, so that the particular choices and usages of the creating entity are well documented, and others can use the data in full or in part according to their needs. I predict that lossy data transfer will be less acceptable in the reality than it is in the abstract, and would suggest that dumb data is more expensive over the longer term (and certainly doesn’t support ‘new research methods’ at all). “Incorporation into local systems” really can only be accomplished by building local systems that adhere to their own local metadata model and are able to map that model in/out to more global models. Extensible and embeddable are very different from interoperable in that context.

7. The last section, after the inarguable first sentence, describes what the DCMI ‘dumb-down’ principle defined nearly twenty years ago, and that strategy still makes sense in a lot of situations. But ‘graceful degradation’ and ‘supporting new and unexpected uses’ requires smart data to start with. ‘Lite’ implementation choices (as in #6 above) preclude either of those options, IMO, and ‘adding value’ of any kind (much less by using ‘ontological inferencing’) is in no way easily achievable.

I intend to be present at the session in Boston [9:00-10:00 Boston Conference and Exhibition Center, 107AB] and since I’ve asked most of my questions here I intend not to talk much. Let’s see how successful I can be at that!

It may well be that a document this short and generalized isn’t yet ready to be a useful tool for metadata practitioners (especially without definitions!). That doesn’t mean that the topics that it’s trying to address aren’t important, just that the comprehensive goals in the preamble are not yet being met in this document.

There are efforts going on in other arenas–the NISO Bibliography Roadmap work, for instance, that should have an important impact on many of these issues, which suggests that it might be wise for the Committee to pause and take another look around. Maybe a good glossary would be a important step?

Dunsire, Gordon, et al. “A Reconsideration of Mapping in a Semantic World”, paper presented at International Conference on Dublin Core and Metadata Applications, The Hague, 2011. Available at: http://dcpapers.dublincore.org/pubs/article/view/3622/1848

By Diane Hillmann, December 14, 2015, 4:59 pm (UTC-5)

The Jane-athon series is alive, well, and expanding its original vision. I wrote about the first ‘official’ Jane-athon earlier this year, after the first event at Midwinter 2015.

Since then the excitement generated at the first one has spawned others:

  • the Ag-athon in the UK in May 2015, sponsored by CILIP
  • the Maurice Dance in New Zealand (October 16, 2015 at the National Library of New Zealand in Wellington, focused on Maurice Gee)
  • the Jane-in (at ALA San Francisco at Annual 2015)
  • the RLS-athon (November 9, 2015, Edinburgh, Scotland), following the JSC meeting there and focused on Robert Louis Stevenson
  • Like good librarians we have an archive of the Jane-athon materials, for use by anyone who wants to look at or use the presentations or the data created at the Jane-athons

    We’re still at it: the next Jane-athon in the series will be the Boston Thing-athon at Harvard University on January 7, 2016. Looking at the list of topics gives a good idea about how the Jane-athons are morphing to a broader focus than that of a creator, while training folks to create data with RIMMF. The first three topics (which may change–watch this space) focus not on specific creators, but on moving forward on topics identified of interest to a broader community.

    * Strings vs things. A focus on replacing strings in metadata with URIs for things.
    * Institutional repositories, archives and scholarly communication. A focus on issues in relating and linking data in institutional repositories and archives with library catalogs.
    * Rare materials and RDA. A continuing discussion on the development of RDA and DCRM2 begun at the JSC meeting and the international seminar on RDA and rare materials held in November 2015.

    For beginners new to RDA and RIMMF but with an interest in creating data, we offer:
    * Digitization. A focus on how RDA relates metadata for digitized resources to the metadata for original resources, and how RIMMF can be used to improve the quality of MARC 21 records during digitization projects.
    * Undergraduate editions. A focus on issues of multiple editions that have little or no change in content vs. significant changes in content, and how RDA accommodates the different scenarios.

    Further on the horizon is a recently approved Jane-athon for the AALL conference in July 2016, focusing on Hugo Grotius (inevitably, a Hugo-athon, but there’s no link yet).

    NOTE: The Thing-a-thon coming up at ALA Midwinter is being held on Thursday rather than the traditional Friday to open the attendance to those who have other commitments on Friday. Another new wrinkle is the venue–an actual library away from the conference center! Whether you’re a cataloger or not-a-cataloger, there will be plenty of activities and discussions that should pique your interest. Do yourself a favor and register for a fun and informative day at the Thing-athon to begin your Midwinter experience!

    Instructions for registering (whether or not you plan to register for MW) can be found on the Toolkit Blog.

    By Diane Hillmann, December 7, 2015, 11:19 am (UTC-5)

    Those of you who pay attention to politics (no matter where you are) are very likely to be shaking your head over candidates, results or policy. It’s a never ending source of frustration and/or entertainment here in the U.S., and I’ve noticed that the commentators seem to be focusing in on issues of ideology and faith, particularly where it bumps up against politics. The visit of Pope Francis seemed to be taking everyone’s attention while he was here, but though this event has added some ‘green’ to the discussion, it hasn’t pushed much off the political plate.

    Politics and faith bump up against each other in the metadata world, too. What with traditionalists still thinking in MARC tags and AACR2, to the technical types rolling their eyes at any mention of MARC and trying to push the conversation towards RDA, RDF, BibFrame, schema.org, etc., there are plenty of metadata politics available to flavor the discussion.

    The good news for us is that the conflicts and differences we confront in the metadata world are much more amenable to useful solution than the politics crowding our news feeds. I remember well the days when the choice of metadata schema was critical to projects and libraries. Unfortunately, we’re all still behaving as if the proliferation of ‘new’ schemas makes the whole business more complicated–that’s because we’re still thinking we need to choose one or another, ignoring the commonality at the core of the new metadata effort.

    But times have changed, and we don’t all need to use the same schema to be interoperable (just like we don’t all need to speak English or Esperanto to communicate). But what we do need to think about is what the needs of our organization are at all stages of the workflow: from creating, publishing, consuming, through integrating our metadata to make it useful in the various efforts in which we engage.

    One thing we do need to consider as we talk about creating new metadata is whether it will need to work with other data that already exists in our institution. If MARC is what we have, then one requirement may be to be able to maintain the level of richness we’ve built up in the past and still move that rich data forward with us. This suggests to me that RDA, which RIMMF has demonstrated can be losslessly mapped to and from MARC, might be the best choice for the creation of new metadata.

    Back in the day, when Dublin Core was the shiny new thing, the notion of ‘dumb-down’ was hatched, and though not an elegantly named principle, it still works. It says that rich metadata can be mapped fairly easily into a less-rich schema (‘dumbed down’), but once transformed in a lossy way, it can’t easily be ‘smartened up’. But in a world of many publishers of linked data, and many consumers of that data, the notion of transforming rich metadata into any number of other schemas and letting the consumer chose what they want, is fairly straightforward, and does not require firm knowledge (or correct assumptions) of what the consumers actually need. This is a strategy well-tested with OAI-PMH which established a floor of Simple Dublin Core but encouraged the provision of any number of other formats as well, including MARC.

    As consumers, libraries and other cultural institutions are also better served by choices. Depending on the services they’re trying to support, they can choose what flavor of data meets their needs best, instead of being offered only what the provider assumes they want. This strategy leaves open the possibility of serving MARC as one of the choices, allowing those institutions still nursing an aged ILS to continue to participate.

    Of course, the consumers of data need to think about how they aggregate and integrate the data they consume, how to improve that data, and how to make their data services coherent. That’s the part of the new create, publish, consume, integrate cycle that scares many librarians, but it shouldn’t–really!

    So, it’s not about choosing the ‘right’ metadata format, it’s about having a fuller and more expansive notion about sharing data and learning some new skills. Let’s kiss the politics goodbye, and get on with it.

    By Diane Hillmann, October 12, 2015, 10:08 am (UTC-5)

    A decade ago, when the Open Metadata Registry (OMR) was just being developed as the NSDL Registry, the vocabulary world was a very different place than it is today. At that point we were tightly focussed on SKOS (not fully cooked at that point, but Jon was on the WG that was developing it, so we felt pretty secure diving in).

    But we were thinking about versioning in the Open World of RDF even then. The NSDL Registry kept careful track of all changes to a vocabulary (who, what, when) and the only way to get data in was through the user interface. We ran an early experiment in making versions based on dynamic, timestamp-based snapshots (we called them ‘time slices’, Git calls them ‘commit snapshots’) available for value vocabularies, but this failed to gain any traction. This seemed to be partly because, well, it was a decade ago for one, and while it attempted to solve an Open World problem with versioned URIs, it created a new set of problems for Closed World experimenters. Ultimately, we left the versions issue to sit and stew for a bit (6 years!).

    All that started to change in 2008 as we started working with RDA, and needed to move past value vocabularies into properties and classes, and beyond that into issues around uploading data into the OMR. Lately, Git and GitHub have started taking off and provide a way for us to make some important jumps in functionality that have culminated in the OMR/GitHub-based RDA Registry. Sounds easy and intuitive now, but it sure wasn’t at the time, and what most people don’t know is that the OMR is still where RDA/RDF data originates — it wasn’t supplanted by Git/Github, but is chugging along in the background. The OMR’s RDF CMS is still visible and usable by all, but folks managing larger vocabularies now have more options.

    One important aspect of the use of Git and GitHub was the ability to rethink versioning.

    Just about a year ago our paper on this topic (Versioning Vocabularies in a Linked Data World, by Diane Hillmann, Gordon Dunsire and Jon Phipps) was presented to the IFLA Satellite meeting in Paris. We used as our model the way software on our various devices and systems is updated–more and more these changes happen without much (if any) interaction with us.

    In the world of vocabularies defining the properties and values in linked data, most updating is still very manual (if done at all), and the important information about what has changed and when is often hidden behind web pages or downloadable files that provide no machine-understandable connections identifying changes. And just solving the change management issue does little to solve the inevitable ‘vocabulary rot’ that can make published ‘linked data’ less and less meaningful, accurate, and useful over time.

    Building stable change management practices is a very critical missing piece of the linked data publishing puzzle. The problem will grow exponentially as language versions and inter-vocabulary mappings start to show up as well — and it won’t be too long before that happens.

    Please take a look at the paper and join in the conversation!

    By Diane Hillmann, September 20, 2015, 6:41 pm (UTC-5)

    Most of us in the library and cultural heritage communities interested in metadata are well aware of Tim Berners-Lee’s five star ratings for linked open data (in fact, some of us actually have the mug).

    The five star rating for LOD, intended to encourage us to follow five basic rules for linked data is useful, but, as we’ve discussed it over the years, a basic question rises up: What good is linked data without (property) vocabularies? Vocabulary manager types like me and my peeps are always thinking like this, and recently we came across solid evidence that we are not alone in the universe.

    Check out: “Five Stars of Linked Data Vocabulary Use”, published last year as part of the Semantic Web Journal. The five authors posit that TBL’s five star linked data is just the precondition to what we really need: vocabularies. They point out that the original 5 star rating says nothing about vocabularies, but that Linked Data without vocabularies is not useful at all:

    “Just converting a CSV file to a set of RDF triples and linking them to another set of triples does not necessarily make the data more (re)usable to humans or machines.”

    Needless to say, we share this viewpoint!

    I’m not going to steal their thunder and list here all five star categories–you really should read the article (it’s short), but only note that the lowest level is a zero star rating that covers LD with no vocabularies. The five star rating is reserved for vocabularies that are linked to other vocabularies, which is pretty cool, and not easy to accomplish by the original publisher as a soloist.

    These five star ratings are a terrific start to good practices documentation for vocabularies used in LOD, which we’ve had in our minds for some time. Stay tuned.

    By Diane Hillmann, August 7, 2015, 1:50 pm (UTC-5)

    Over the past weekend I participated in a Twitter conversation on the topic of meaning, data, transformation and packaging. The conversation is too long to repost here, but looking from July 11-12 for @metadata_maven should pick most of it up. Aside from my usual frustration at the message limitations in Twitter, there seemed to be a lot of confusion about what exactly we mean about ‘meaning’ and how it gets expressed in data. I had a skype conversation with @jonphipps about it, and thought I could reproduce that here, in a way that could add to the original conversation, perhaps clarifying a few things. [Probably good to read the Twitter conversation ahead of reading the rest of this.]

    Jon Phipps: I think the problem that the people in that conversation are trying to address is that MARC has done triple duty as a local and global serialization (format) for storage, supporting indexing and display; a global data interchange format; and a focal point for creating agreement about the rules everyone is expected to follow to populate the data (AACR2, RDA). If you walk away from that, even if you don’t kill it, nothing else is going to be able to serve that particular set of functions. But that’s the way everyone chooses to discuss bibframe, or schema.org, or any other ‘marc replacement’.

    Diane Hillmann: Yeah, but how does ‘meaning’ merely expressed on a wiki page help in any way? Isn’t the idea to have meaning expressed with the data itself?

    Jon Phipps: It depends on whether you see RDF as a meaning transport mechanism or a data transport mechanism. That’s the difference between semantic data and linked data.

    Diane Hillmann: It’s both, don’t you think?

    Jon Phipps: Semantic data is the smart subset of linked data.

    Diane Hillmann: Nice tagline 🙂

    Jon Phipps: Zepheira, and now DC, seem to be increasingly looking at RDF as merely linked data. I should say a transport mechanism for ‘linked’ data.

    Diane Hillmann: It’s easier that way.

    Jon Phipps: Exactly. Basically what they’re saying is that meaning is up to the receiver’s system to determine. Dc:title of ‘Mr.’ is fine in that world–it even validates according to the ‘new’ AP thinking. It’s all easier for the data producers if they don’t have to care about vocabularies. But the value of RDF is that it’s brilliantly designed to transport knowledge, not just data. RDF data is intended to live in a world where any Thing can be described by any Thing, and all of those descriptions can be aggregated over time to form a more complete description of the Thing Being Described. Knowledge transfer really benefits from Semantic Web concepts like inferences and entailments and even truthiness (in addition to just validation). If you discount and even reject those concepts in a linked data world than you might as well ship your data around as CSV or even SQL files and be done with it.

    One of the things about MARC is that it’s incredibly semantically rich (http://marc21rdf.info) and has also been brilliantly designed by a lot of people over a lot of years to convey an equally rich body of bibliographic knowledge. But throwing away even a small portion of that knowledge in pursuit of a far dumber linked data holy grail is a lot like saying that since most people only use a relatively limited number of words (especially when they’re texting) we have no need for a 50,000 word, or even a 5,000 word, dictionary.

    MARC makes knowledge transfer look relatively easy because the knowledge is embedded in a vocabulary every cataloger learns and speaks fairly fluently. It looks like it’s just a (truly limiting) data format so it’s easy to think that replacing it is just a matter of coming up with a fresh new format, like RDF. But it’s going to be a lot harder than that, which is tacitly acknowledged by the many-faceted effort to permanently dumb-down bibliographic metadata, and it’s one of the reasons why I think bibframe.org, bibfra.me, and schema.org might end up being very destructive, given the way they’re being promoted (be sure to Park Your MARC somewhere).

    [That’s why we’re so focused on the RDA data model (which can actually be semantically richer than MARC), why we helped create http://marc21rdf.info, and why we’re working at building out our RDF vocabulary management services.]

    Diane Hillmann: This would be a great conversation to record for a podcast 😉

    Jon Phipps: I’m not saying proper vocabulary management is easy. Look at us for instance, we haven’t bothered to publish the OMR vocabs and only one person has noticed (so far). But they’re in active use in every OMR-generated vocab.

    The point I was making was that we we’re no better, as publishers of theoretically semantic metadata, at making sure the data was ‘meaningful’ by making sure that the vocabs resolved, had definitions, etc.

    [P.S. We’re now working on publishing our registry vocabularies.]

    By Diane Hillmann, July 16, 2015, 9:35 pm (UTC-5)

    In the old days, when I was on MARBI as liaison for AALL, I used to write a fairly detailed report, and after that wrote it up for my Cornell colleagues. The gist of those reports was to describe what happened, and if there might be implications to consider from the decisions. I don’t propose to do that here, but it does feel as if I’m acting in a familiar ‘reporting’ mode.

    In an early Saturday presentation sponsored by the Linked Library Data IG, we heard about BibFrame and VIVO. I was very interested to see how VIVO has grown (having seen it as an infant), but was puzzled by the suggestion that it or FOAF could substitute for the functionality embedded in authority records. For one thing, auth records are about disambiguating names, and not describing people–much as some believe that’s where authority control should be going. Even when we stop using text strings as identifiers, we’ll still need that function and should be thinking carefully whether adding other functions makes good sense.

    Later on Saturday, at the Cataloging Norms IG meeting, Nancy Fallgren spoke on the NLM collaboration with Zepheira, GW, (and others) on BibFrame Lite. They’re now testing the Kuali OLE cataloging module for use with BF Lite, which will include a triple store. An important quote from Nancy: “Legacy data should not drive development.” So true, but neither should we be starting over, or discarding data, just to simplify data creation, thus losing the ability to respond to the more complex needs in cataloging, which aren’t going away, (a point demonstrated usefully in the recent Jane-athons).

    I was the last speaker on that program, and spoke on the topic of “What Can We Do About Our Legacy Data?” I was primarily asking questions and discussing options, not providing answers. The one thing I am adamant about is that nobody should be throwing away their MARC records. I even came up with a simple rule: “Park the MARC”. After all, storage is cheap, and nobody really knows how the current situation will settle out. Data is easy to dumb down, but not so easy to smarten up, and there may be do-overs in store for some down the road, after the experimentation is done and the tradeoffs clearer.

    I also attended the BibFrame Update, and noted that there’s still no open discussion about the ‘classic’ (as in ‘Classic Coke’) BibFrame version used by LC, and the ‘new’ (as in ‘New Coke’) BibFrame Lite version being developed by Zepheira, which is apparently the vocabulary they’re using in their projects and training. It seems like it could be a useful discussion, but somebody’s got to start it. It’s not gonna be me.

    The most interesting part of that update from my point of view was hearing Sally McCallum talk about the testing of BibFrame by LC’s catalogers. The tool they’re planning on using (in development, I believe) will use RDA labels and include rule numbers from the RDA Toolkit. Now, there’s a test I really want to hear about at Midwinter! But of course all of that RDA ‘testing’ they insisted on several years ago to determine if the RDA rules could be applied to MARC21 doesn’t (can’t) apply to BibFrame Classic so … Will there be a new round of much publicized and eagerly anticipated shared institutional testing of this new tool and its assumptions? Just askin’.

    By Diane Hillmann, July 10, 2015, 10:10 am (UTC-5)

    The RDA Development Team started talking about developing training for the ‘new’ RDA, with a focus on the vocabularies, in the fall of 2014. We had some notion of what we didn’t want to do: we didn’t want yet another ‘sage on the stage’ event, we wanted to re-purpose the ‘hackathon’ model from a software focus to data creation (including a major hands-on aspect), and we wanted to demonstrate what RDA looked like (and could do) in a native RDA environment, without reference to MARC.

    This was a tall order. Using RIMMF for the data creation was a no-brainer: the developers had been using the RDA Registry to feed new vocabulary elements into their their software (effectively becoming the RDA Registry’s first client), and were fully committed to FRBR. Deborah Fritz had been training librarians and other on RIMMF for years, gathering feedback and building enthusiasm. It was Deborah who came up with the Jane-athon idea, and the RDA Development group took it and ran with it. Using the Jane Austen theme was a brilliant part of Deborah’s idea. Everybody knows about JA, and the number of spin offs, rip-offs and re-tellings of the novels (in many media formats) made her work a natural for examining why RDA and FRBR make sense.

    One goal stated everywhere in the marketing materials for our first Jane outing was that we wanted people to have fun. All of us have been part of the audience and on the dais for many information sessions, for RDA and other issues, and neither position has ever been much fun, useful as the sessions might have been. The same goes for webinars, which, as they’ve developed in library-land tend to be dry, boring, and completely bereft of human interaction. And there was a lot of fun at that first Jane-athon–I venture to say that 90% of the folks in the room left with smiles and thanks. We got an amazing response to our evaluation survey, and the preponderance of responses were expansive, positive, and clearly designed to help the organizers to do better the next time. The various folks from ALA Publishing who stood at the back and watched the fun were absolutely amazed at the noise, the laughter, and the collaboration in evidence.

    No small part of the success of Jane-athon 1 rested with the team leaders at each table, and the coaches going from table to table helping out with puzzling issues, ensuring that participants were able to create data using RIMMF that could be aggregated for examination later in the day.

    From the beginning we thought of Jane 1 as the first of many. In the first flush of success as participants signed up and enthusiasm built, we talked publicly about making it possible to do local Jane-athons, but we realized that our small group would have difficulty doing smaller events with less expertise on site to the same standard we set at Jane-athon 1. We had to do a better job in thinking through the local expansion and how to ensure that local participants get the same (or similar) value from the experience before responding to requests.

    As a step in that direction CILIP in the UK is planning an Ag-athon on May 22, 2015 which will add much to the collective experience as well as to the data store that began with the first Jane-athon and will be an increasingly important factor as we work through the issues of sharing data.

    The collection and storage of the Jane-athon data was envisioned prior to the first event, and the R-Balls site was designed as a place to store and share RIMMF-based information. Though a valuable step towards shareable RDA data, rballs have their limits. The data itself can be curated by human experts or available with warts, depending on the needs of the user of the data. For the longer term, RIMMF can output RDF statements based on the rball info, and a triple store is in development for experimentation and exploration. There are plans to improve the visualization of this data and demonstrate its use at Jane-athon 2 in San Francisco, which will include more about RDA and linked data, as well as what the created data can be used for, in particular, for new and improved services.

    So, what are the implications of the first Jane-athon’s success for libraries interested in linked data? One of the biggest misunderstandings floating around libraryland in linked data conversations is that it’s necessary to make one and only one choice of format, and eschew all others (kind of like saying that everyone has to speak English to participate in LOD). This is not just incorrect, it’s also dangerous. In the MARC era, there was truly no choice for libraries–to participate in record sharing they had to use MARC. But the technology has changed, and rapidly evolving semantic mapping strategies [see: http://dcpapers.dublincore.org/pubs/article/view/3622] will enable libraries to use the most appropriate schemas and tools for creating data to be used in their local context, and others for distributing that data to partners, collaborators, or the larger world.

    Another widely circulated meme is that RDA/FRBR is ‘too complicated’ for what libraries need; we’re encouraged to ‘simplify, simplify’ and assured that we’ll still be able to do what we need. Hmm, well, simplification is an attractive idea, until one remembers that the environment we work in, with evolving carriers, versions, and creative ideas for marketing materials to libraries is getting more complex than ever. Without the specificity to describe what we have (or have access to), we push the problem out to our users to figure out on their own. Libraries have always tried to be smarter than that, and that requires “smart” , not “dumb”, metadata.

    Of course the corollary to the ‘too complicated’ argument lies the notion that a) we’re not smart enough to figure out how to do RDA and FRBR right, and b) complex means more expensive. I refuse to give space to a), but b) is an important consideration. I urge you to take a look at the Jane-athon data and consider the fact that Jane Austen wrote very few novels, but they’ve been re-published with various editions, versions and commentaries for almost two centuries. Once you add the ‘based on’, ‘inspired by’ and the enormous trail created by those trying to use Jane’s popularity to sell stuff (“Sense and Sensibility and Sea Monsters” is a favorite of mine), you can see the problem. Think of a pyramid with a very expansive base, and a very sharp point, and consider that the works that everything at the bottom wants to link to don’t require repeating the description of each novel every time in RDA. And we’re not adding notes to descriptions that are based on the outdated notion that the only use for information about the relationship between “Sense and Sensibility and Sea Monsters” and Jane’s “Sense and Sensibility” is a human being who looks far enough into the description to read the note.

    One of the big revelations for most Jane-athon participants was to see how well RIMMF translated legacy MARC records into RDA, with links between the WEM levels and others to the named agents in the record. It’s very slick, and most importantly, not lossy. Consider that RIMMF also outputs in both MARC and RDF–and you see something of a missing link (if not the Golden Gate Bridge :-).

    Not to say there aren’t issues to be considered with RDA as with other options. There are certainly those, and they’ll be discussed at the Jane-In in San Francisco as well as at the RDA Forum on the following day, which will focus on current RDA upgrades and the future of RDA and cataloging. (More detailed information on the Forum will be available shortly).

    Don’t miss the fun, take a look at the details and then go ahead and register. And catalogers, try your best to entice your developers to come too. We’ll set up a table for them, and you’ll improve the conversation level at home considerably!

    By Diane Hillmann, May 18, 2015, 10:13 am (UTC-5)

    I’ve been back from Chicago for just over a week now, but still reflecting on a very successful Jane-athon pre-conference the Friday before Midwinter. And the good news is that our participant survey responses agree with the “successful” part, plus contain a lot of food for thought going forward. More about that later …

    There was a lot of buzz in the Jane-athon room that day, primarily from the enthusiastic participants, working together at tables, definitely having the fun we promised. Afterwards, the buzz came from those who wished they’d been there (many on Twitter #Janeathon) and others that wanted us to promise to do it again. Rest assured–we’re planning on another one in San Francisco at ALA Annual, but it will probably be somewhat different because by then we’ll have a better support infrastructure and will be able to be more concrete about the question of ‘what do you do with the data once you have it?’ If you’re particularly interested in that question, keep an eye on the rballs.info site, where new resources and improvements will be announced.

    Rballs? What the heck are those? Originally they were meant to be ‘RIMMF-balls’, but then we started talking about ‘resource-balls’, and other such wanderings. The ‘ball’ part was suggested by ‘tar-balls’ and ‘mudballs’ (mudball was a term of derision in the old MARBI days, but Jon and I started using it more generally when we were working on aggregated records in NSDL).

    So, how did we come up with such a crazy idea as a Jane-athon anyway? The idea came from Deborah Fritz, who’d been teaching about RDA for some time, plus working with her husband Richard on the RIMMF (RDA In Many Metadata Formats) tool, which is designed to allow creation of RDA data and export to RDF. The tool was upgraded to version 3 for the Jane-athon, and Deborah added some tutorials so that Jane-athon participants could get some practice with RIMMF beforehand (she also did online sessions for team leaders and coaches).

    Deborah and I had discussed many times the frustration we shared with the ‘sage on the stage’ model of training, which left attendees to such events unhappy with the limitations of that model. They wanted something concrete–they usually said–something they could get their teeth into. Something that would help them visualize RDA out of the context of MARC. The Jane-athon idea promised to do just that.

    I had done a prototype session of the Jane-athon with some librarians from the University of Hawaii (Nancy Sack did a great job organizing everything, even though a dodgy plane made me a day late to the party!) We got some very useful evaluations from that group, and those contributed to the success of the official Chicago debut.

    So a crazy idea, bolstered by a lot of work and a whole lot of organizational effort, actually happened, and was even better than we’d dared to hope. There was a certain chaos on the day, which most people accepted with equanimity, and an awful lot of learning of the best kind. The event couldn’t have happened without Deborah and Richard Fritz, Gordon Dunsire, and Jon Phipps, each of whom had a part to play. Jamie Hennelly from ALA Publishing was instrumental in making the event happen, despite his reservations about herding the organizer cats.

    And, as the cherry on top: After the five organizers finished their celebratory dinner later in the evening after the Jane-athon, we were all out on the sidewalk looking for cabs. A long black limousine pulled up, and asked us if we wanted a ride. Needless to say, we did, and soon pulled up in style in front of the Hyatt Regency on Wacker. Sadly, there was no one we knew at the front of the hotel, but many looked askance at the somewhat scruffy mob who piled out of the limo, no doubt wondering who the heck we were.

    What’s up next? We think we’re on the path of a new data sharing paradigm, and we’ll run with that for the next few months, and maybe riff on that in San Francisco. Stay tuned! And do download a copy of RIMMF and play–there are rballs to look at and use for your purposes.

    P.S. A report of the evaluation survey will be on RDA-L sometime next week.

    By Diane Hillmann, February 14, 2015, 2:43 pm (UTC-5)

    The planning for the Midwinter Jane-athon pre-conference has been taking up a lot of my attention lately. It’s a really cool idea (credit to Deborah Fritz) to address the desire we’ve been hearing for some time for a participatory, hands on, session on RDA. And lets be clear, we’re not talking about the RDA instructions–this is about the RDA data model, vocabularies, and RDA’s availability for linked data. We’ll be using RIMMF (RDA in Many Metadata Formats) as our visualization and data creation tool, setting up small teams with leaders who’ve been prepared to support the teams and a wandering phalanx of coaches to give help on the fly.

    Part of the planning has to do with building a set of RIMMF ‘records’ to start with, for participants to add on their own resources and explore the rich relationships in RDA. We’re calling these ‘r-balls’ (a cross between RIMMF and tarballs). These zipped-up r-balls will be available for others to use for their own homegrown sessions, along with instructions for using RIMMF and setting up a Jane-athon (or other themed -athon), and also how to contribute their own r-balls for the use of others. In case you’ve not picked it up, this is a radically different training model, and we’d like to make it possible for others to play, too.

    That’s the plan for the morning. After lunch we’ll take a look at what we’ve done, and prise out the issues we’ve encountered, and others we know about. The hope is that the participants will walk out the door with both an understanding of what RDA is (more than the instructions) and how it fits into the emerging linked data world.

    I recently returned from a trip to Honolulu, where I did a prototype Jane-athon workshop for the Hawaii Library Association. I have to admit that I didn’t give much thought to how difficult it would be to do solo, but I did have the presence of mind to give the organizer of the workshop some preliminary setup instructions (based on what we’ll be doing in Chicago) to ensure that there would be access to laptops with software and records pre-loaded, and a small cadre of folks who had been working with RIMMF to help out with data creation on the day.

    The original plan included a day before the workshop with a general presentation on linked data and some smaller meetings with administrators and others in specialized areas. It’s a format I’ve used before and the smaller meetings after the presentation generally bring out questions that are unlikely to be asked in a larger group.

    What I didn’t plan for was that I wouldn’t be able to get out of Ithaca on the appointed day (the day before the presentation) thanks not to bad weather, but instead to a non-functioning plane which couldn’t be repaired. So after a phone discussion with Hawaii, I tried again the next day, and everything went smoothly. On the receiving end there was lots of effort expended to make it all work in the time available, with some meetings dribbling into the next day. But we did it, thanks to organizer Nancy Sack’s prodigious skills and the flexibility of all concerned.

    Nancy asked the Jane-athon participants to fill out an evaluation, and sent me the anonymized results. I really appreciated that the respondents added many useful (and frank) comments to the usual range of questions. Those comments in particular were very helpful to me, and were passed on to the other MW Jane-athon organizers. One of the goals of the workshop was to help participants visualize, using RIMMF, how familiar MARC records could be automatically mapped into the FRBR structure of RDA, and how that process might begin to address concerns about future workflow and reuse of MARC records. Another goal was to illustrate how RDA’s relationships enhanced the value of the data, particularly for users. For the most part, it looked as if most of the participants understood the goals of the workshop and felt they had gotten value from it.

    But there were those who provided frank criticism of the workshop goals and organization (as well as the presenter, of course!). Part of these criticisms involved the limitations of the workshop, wanting more information on how they could put their new knowledge to work, right now. The clearest expression of this desire came in as follows:

    “I sort of expected to be given the whole road map for how to take a set of data and use LOD to make it available to users via the web. In rereading the flyer I see that this was not something the presenter wanted to cover. But I think it was apparent in the afternoon discussion that we wanted more information in the big picture … I feel like I have an understanding of what LOD is, but I have no idea how to use it in a meaningful way.”

    Aside from the time constraints–which everyone understood–there’s a problem inherent in the fact that very few active LOD projects have moved beyond publishing their data (a good thing, no doubt about it) to using the data published by others. So it wasn’t so much that I didn’t ‘want’ to present more about the ‘bigger picture’, there wasn’t really anything to say aside from the fact that the answer to that question is still unclear (and I probably wasn’t all that clear about it either). If I had a ‘road map’ to talk about and point them to, I certainly would have shared it, but sadly I have nothing to share at this stage.

    But I continue to believe that just as progress in this realm is iterative, it is hugely important that we not wait for the final answers before we talk about the issues. Our learning needs to be iterative too, to move along the path from the abstract to the concrete along with the technical developments. So for MidWinter, we’ll need to be crystal clear about what we’re doing (and why), as well as why there are blank areas in the road-map.

    Thanks again to the Hawaii participants, and especially Nancy Sack, for their efforts to make the workshop happen, and the questions and comments that will improve the Jane-athon in Chicago!

    For additional information, including a link to register, look here. Although I haven’t seen the latest registration figures, we’re expecting to fill up, so don’t delay!

    [these are the workshop slides]

    [these are the general presentation slides]

    By Diane Hillmann, December 19, 2014, 10:22 am (UTC-5)