In the previous blog I discussed Low-hanging MARC fruit in the MARC21 fixed-length data fields 006, 007, and 008. These fields also contain useful data that hangs slightly higher up, but can be reached with a short ladder. The ladder rungs are constructed using the RDF Schema subPropertyOf property. This is an ontological property which takes the RDF class Property as its domain and range; in other words, it links instances of two properties:
P1 rdfs:subPropertyOf P2 – where P1 and P2 are specific properties.
The subPropertyOf property contains the inference rule or entailment that:
If P1 rdfs:subPropertyOf P2, and X P1 Y, then X P2 Y
That is, if property P1 is a sub-property of property P2, then a machine can entail a triple using property P2 with the same subject and object as any triple using property P1.
There is a lot of semantic overlap in the MARC21 fields. For example, field 006 positions 01-17 relate to positions 18-34 in one of the field 008 configurations; they use the same values. 006 is used in cases when an item has multiple characteristics that cannot be coded in field 008. There is no semantic difference between the 006 and 008 data – a multi-component item may be catalogued as a whole using 008 for the main component and 006 for other components, or each component may be catalogued separately with its own 008 field.
We can aggregate this data by declaring sub-property relationships between corresponding 006 and 008 “level 0″ properties and a new common super-property:
E.g. Create a new property M00Aud with label “Target audience”, and declare M006a05 (“Target audience of Language material”), M006t05 (“Target audience of Manuscript language material”) and M008BK22 (“Target audience of Books”) as sub-properties:
@prefix m2100x: <http://marc21rdf.info/elements/00X/>.
@prefix m21plus: <http://marc21rdf.info/elements/.../>.
m21plus:M00Aud rdfs:label "Target audience" .
m2100x:M006a05 rdfs:subPropertyOf m21plus:M00Aud .
m2100x:M006t05 rdfs:subPropertyOf m21plus:M00Aud .
m2100x:M008BK22 rdfs:subPropertyOf m21plus:M00Aud .
A machine can use this RDF graph to entail new triples from existing data:
ex:1 m2100x:M006a05 m21terms:commonaud#j .
=> ex:1 m21plus:M00Aud m21terms:commonaud#j .
ex:2 m2100x:M006t05 m21terms:commonaud#e .
=> ex:2 m21plus:M00Aud m21terms:commonaud#e .
=> ex:3 m21plus:
M00Aud m21terms:commonaud#g .
Here, three different resources (ex:1, ex:2, ex:3) have target audience data stored in three different MARC21 fixed-length fields. The entailed triples store the data using a common property that encompasses the semantic of the level 0 properties by discarding their differences, which are the material categories. Each entailed triple states “This resource has target audience …”, dropping the distinction of material category which is unnecessary for this metadata attribute.
Using the entailed triples, we only need to process the higher-level property to create, for example, a “Target audience” index for a set of MARC21 records, rather than having to gather the data from the level 0 properties every time.
We can go further. The same value vocabulary for Target audience is used for other categories of material:
- M006c05 (“Target audience of Notated music”)
- M006d05 (“Target audience of Manuscript notated music”)
- M006g05 (“Target audience of Projected medium”)
- M006i05 (“Target audience of Nonmusical sound recording”)
- M006j05 (“Target audience of Musical sound recording”)
- M006k05 (“Target audience of Two-dimensional nonprojectable graphic”)
- M006m05 (“Target audience of Computer file or Electronic resource”)
- M006o05 (“Target audience of Kit”)
- M006r05 (“Target audience of Three-dimensional artifact or naturally occurring object”)
- M008CF22 (“Target audience of Computer Files”)
- M008MU22 (“Target audience of Music”)
- M008VM22 (“Target audience of Visual Materials”)
So we can declare sub-property relationships between each of these level 0 properties and the higher-level “Target audience” property, and generate the entailed triples.
Note that we could create an intermediary rung on our ladder, say M00BKAud “Target audience (Language material)”, to aggregate data at the material category level, and then declare a sub-property relationship with M00Aud to aggregate to the category-free level. There is no specific use-case for this at the moment. If the need arises, this can be done without affecting the existing sub-property relationships and entailments, because the subPropertyOf property is transitive: P1 rdfs:subPropertyOf P2 and P2 rdfs:subPropertyOf P3 entails P1 rdfs:subPropertyOf P3.
Our ladder “dumbs-up” the level 0 data; each sub-property entailment uses a higher-level property that is broader in semantic than the last. The ladders merge at each stage and are just one rung in length, so what we get is more like a climbing net to get to the higher-hanging fruit.
Applications can now deal with just one attribute property for Target audience and avoid the messiness at level 0. And there is just one property to align and map to corresponding properties from other bibliographic metadata schemas …