The technique described in Using the sub-property ladder works well to “dumb-up” raw, level 0 data from MARC21 fixed-length data fields to interoperate with metadata from other schemas. Unfortunately, it cannot be used with most MARC21 variable data fields (tags) and subfields. We cannot simply dumb-up a subfield to the level of its parent tag because most tags have more than one subfield; the meaning of a tag is a combination of the meanings of its subfields and tag-level data  is a composite of subfield-level data.

There is another technique we can use to bridge the semantic gap between a subfield and its tag: tags generally can be treated as “aggregated statements”, where the value of a tag is a literal string, or statement, which is composed of the values of subfields.

For example, a record may contain a tag 260 (Publication, Distribution, etc.) with subfield a (Place of publication, distribution, etc.) = “Edinburgh :”, subfield b (Name of publisher, distributor, etc.) = “Castle Press,”, and subfield c (Date of publication, distribution, etc.) = “2012.”. The contents of the tag, “$aEdinburgh :$bCastle Press,$c2012.” can be turned into a tag-level value, “Edinburgh : Castle Press, 2012.”, by substituting a space for each subfield indicator ($) and code pair. We can then use a tag-level property with the label “Publication, Distribution, etc. (Imprint)” and URI “m21plus:T260” to publish the metadata statement “This resource – has Publication, Distribution, etc. – ‘Edinburgh : Castle Press, 2012.'” as an RDF triple.

The instructions for deriving the tag-level value or aggregated statement from the subfield values are known as a syntax encoding scheme (SES). This is part of the Dubin Core abstract model, allowing specific SESs to be used in an application profile. There can be many different ways of deriving the value; the example above works because MARC21 subfields contain embedded punctuation that delineates the component parts when the subfield encoding is removed. This simple SES allows a MARC21 record to conform to the syntax prescribed by the International Standard Bibliographic Description (ISBD) for compound statements. Unfortunately, this makes it difficult to apply any other SES to the subfields without first removing the punctuation.

It would be much better if the instructions for adding ISBD punctuation to MARC21 data were embedded in an SES. Then a different SES could produce “Published in 2012 by Castle Press in Edinburgh” rather than “Published in 2012. by Castle Press, in Edinburgh :”. This is the approach taken by ISBD itself, and there is clearly an opportunity here for collaboration between the MARC21 and ISBD communities. The same approach is envisaged for RDA.

The aggregated statement technique is also very useful when a MARC tag is repeated. Using tag 260 again as an example, a record may contain multiple publication statements for intervening publishers, where the tag’s first indicator has value “2”. If there are two such tags, then there may be two or more publication places and two or more publisher names, for example “$32001-2005$aEdinburgh :$bMudhut Publishing” and “$32006-$aEdinburgh :$bCastle Press” (subfield 3 is for Materials specified). A linked data representation of the record needs to keep the places, names, and dates correctly associated so that they don’t get mixed up, for example “Mudhut Publishing” with “2006-“. The tag-level RDF property (m21plus:T260) can be used with an aggregated statement to keep the level 0 data associated with the correct repeat of the tag, avoiding the use of blank nodes in the RDF graph of a specific record.

RDF graph of MARC21 Publication statement data

RDF graph of MARC21 Publication statement data

As the graph shows, the two Publication statements must have URIs so that they can link to the correct subfield values. The URIs identify the literal strings of the aggregated statements, and are instances of an SES; all SESs are sub-classes of the class of literal strings. A blank node, on the other hand, has no URI and uses a local identifier to make the links; such links appear broken in a non-local environment.

To sum up, it seems useful to represent MARC21 tags as RDF properties associated with a syntax encoding scheme. We intend to add these properties to the Open Metadata Registry. Specific encoding schemes can then be assigned using an application profile. There must be many examples of instructions for processing tag subfields for output and display which can form the basis of suitable encoding schemes.

By Gordon Dunsire, May 20, 2012, 5:08 pm (UTC-5)

Add your own comment or set a trackback

Currently no comments

  1. No comment yet

Add your own comment

Follow comments according to this article through a RSS 2.0 feed