It’s been six months since Diane Hillmann, Jon Phipps, and I published a “level 0″ set of RDF elements (all properties) and value vocabularies based on the MARC21 format. That was largely a mechanical process, as we created a separate property for every combination of tag, two indicators, and subfield for most (but not yet all) tags in MARC Bibliographic. I hear this was well-received at the MARC Formats Interest Group session at ALA Midwinter in Dallas in January, with questions like ”When are you going to do the same for MARC Holdings or MARC Authority?“ Well, we’re thinking about it …

Meanwhile, I’ve been looking to see what low-hanging fruit can be plucked from the MARC tree of knowledge. The obvious place to start was the fixed-length data fields tagged as 006, 007, and 008. These have the least complicated mix of syntax and semantic, with no indicators or repeatable subfields to worry about. And there are value vocabularies associated with the coded content, so linked data triples with object URIs are a possibility.

The main complication is the dependency of the codes on the category of material, recorded in the MARC21 record Leader. This introduces a decision node in the process of recasting legacy metadata as RDF triples; it determines which code sets, and therefore value vocabularies, are being used. For all level 0 elements, the property URI is constructed mechanically from the MARC21 coding, so the decision only affects the choice of value vocabulary to be used as the object of the data triple.

The results are encouraging. Here’s a simple example from the record for “Legacy” by Roderick Buchanan, taken from the main catalogue of the National Library of Scotland. It has just a single 008 tag:

@prefix m2100x: <http://marc21rdf.info/elements/00X/>.
@prefix m21terms: <http://marc21rdf.info/terms/>.
ex:1
m2100x:M00806 m21terms:alltyp#s ;
m2100x:M00807-10 "2011" ;
m2100x:M00815-17 <http://id.loc.gov/vocabulary/countries/enk> ;
m2100x:M008BK29 "0" ;
m2100x:M008BK30 "0" ;
m2100x:M008BK31 "0" ;
m2100x:M008BK33 m21terms:booklit#0
m2100x:M00835-37 <http://id.loc.gov/vocabulary/languages/eng> .

Extending the object URIs to their labels gives the RDF graph:

A more complicated example is the CD version of Abbey Road by The Beatles, with one 006, two 007, and one 008 tags in a record from OCLC WorldCat:

@prefix m2100x: <http://marc21rdf.info/elements/00X/>.
@prefix m21terms: <http://marc21rdf.info/terms/>.
ex:5
m2100x:M00600 m21terms:formofmaterial#m ;
m2100x:M006m09 m21terms:computertyp#u ;
m2100x:M00700 m21terms:cat#s ;
m2100x:M00700 m21terms:cat#c ;
m2100x:M007c01 m21terms:electrosmd#o ;
m2100x:M007c03 m21terms:electrocol#c ;
m2100x:M007c04 m21terms:electrodim#g ;
m2100x:M007c05 m21terms:electrodsnd#a ;
m2100x:M007s01 m21terms:soundrecordingsmd#d ;
m2100x:M007s03 m21terms:soundrecordingspd#f ;
m2100x:M007s04 m21terms:soundrecordingcpc#s ;
m2100x:M007s05 m21terms:soundrecordinggro#n ;
m2100x:M007s06 m21terms:soundrecordingdim#g ;
m2100x:M007s07 m21terms:soundrecordingwid#n ;
m2100x:M007s08 m21terms:soundrecordingtap#n ;
m2100x:M007s09 m21terms:soundrecordingkin#m ;
m2100x:M007s10 m21terms:soundrecordingmat#m ;
m2100x:M007s11 m21terms:soundrecordingcut#n ;
m2100x:M007s12 m21terms:soundrecordingspc#e ;
m2100x:M007s13 m21terms:soundrecordingcap#e ;
m2100x:M00806 m21terms:alltyp#r ;
m2100x:M00807-10 "2009" ;
m2100x:M00811-14 "1969" ;
m2100x:M00815-17 <http://id.loc.gov/vocabulary/countries/cau> ;
m2100x:M008MU18-19 m21terms:musicfoc#rc ;
m2100x:M008MU20 m21terms:musicfom#n ;
m2100x:M008MU21 m21terms:musicpar#n ;
m2100x:M008MU33 m21terms:musictra#n ;
m2100x:M00835-37 <http://id.loc.gov/vocabulary/languages/eng> .

This yields the (partial) extended RDF graph:

The MARC Bibliographic manual says “Coded data elements are potentially useful for retrieval and data management purposes“. Any graph connecting to these examples can use them for retrieval, provided the resources URIs ex:1 and ex:5 are linked to the location of one or more copies. For open online resources this might be sufficient, because the resource is a “link” away. For physical resources, more information is required to get access (retrieve), including basic human identification attributes such as title, author, and edition. That additional data is usually present in the MARC21 variable data fields, and I’ll discuss it in a future blog post, but it doesn’t have to come from there. So what we need are stable URIs for the resources ex:1 and ex:5, and some triples containing location information for copies. OCLC has a lot of that data in one place. Next most helpful are standard identifiers such as ISBN and ISSN, because they will help to link to graphs from the publishing, bookselling, and reading communities. Then some title, author, and edition information would be nice …

It would be very useful if national, regional, or international cataloguing agencies could get it together to put this on their agendas, soon.

Finally, notice the “0″ values in ex:1 and the “not applicable” value in ex:5. The MARC21 fixed-length data fields support theOpen World Assumption, unlike the MARC/AACR record as a whole, which definitely uses the Closed World Assumption, for example by not recording a first-edition statement.

Be Sociable, Share!
By Gordon Dunsire, March 26, 2012, 11:48 am (UTC-5)

Add your own comment or set a trackback

Currently 1 comment

  1. Comment by Jonathan Rochkind

    I suspect when MARC bibliographic manual says “useful for retrieval”, they mean “information retrieval”, i.e. a search interface, not actual retrieval of the actual ultimate document/item.

Add your own comment



Follow comments according to this article through a RSS 2.0 feed