Note: this is being posted simultaneously on two blogs: Metadata Matters and Coyle’s InFormation

“Why don’t libraries just use FOAF for their Person metadata? Why do they insist on creating their own?”

We don’t know how many times we have heard this on various lists. It often is not really posed as a question; in other words, it isn’t asking for an explanation of why libraries do not choose to use FOAF. It’s more rhetorical, along the lines of “Why can’t we all just get along?” But it is worthy of being asked as a real question, and of getting a real answer.

[Note first that the question of FOAF comes up not so much as we consider the current library standards, but in discussions of upcoming standards that will hopefully be based on the FR** family of standards (FRBR, FRAD, FRSAR). ]

A comparison of FOAF Person and the library Person entity (either in MARC authority files, or RDA, or FRAD) shows that there is not one defined element (or “property” as it is called in Semantic Web-ese) that the two have in common. This is not a coincidence; the two vocabularies serve significantly different communities and purposes. This does not mean that they are irreconcilable; the question therefore becomes: What keeps them apart? and can that be overcome?

The key is in the nature of the two communities.

FOAF stands for ‘Friend of a Friend’, which is a clue to its context: the schema is primarily for use in social networking situations. Its focus is on people who are alive and online, and it includes online contact information like email addresses, web sites, work web sites, Facebook IDs, Skype IDs, etc. The name of the person in FOAF is not an identifier, but presumes that the name of the person plus one or more of the contact IDs is enough to distinguish most humans from one another.

Library name data (which is a form of controlled vocabulary, called “name authority data” in library terms) is focused on creating a unique identifier that brings together the different forms of a name used in published materials under one form. Library users, therefore, can expect to find all of the works by or about a named person under a single entry regardless of the various forms of the name that exist in real data. Uniqueness of names is enforced by adding information to a non-unique name, usually the year of birth, but when that isn’t known (especially for persons of antiquity) titles or even areas of endeavor (“poet”) can be added.

To accommodate both the FOAF (social) function and the libraries’ identification function, at the very least the libraries would need to define a sub-property of FOAF Person, one that has a more strict definition and usage. However, for the library “Person” to be designated as more specific than FOAF:Person does not require that these two be in the same vocabulary. That is one of the important features of Semantic Web properties: like any other resource, they can be linked and related to any other resources on the Web.

Why not combine the library and FOAF properties into a single metadata vocabulary? The answer has little to do with technology, but instead relates to the functioning of communities. Metadata standards need to be developed by (and for) actual communities. The FOAF and library communities clearly have different needs, different goals, and are working with fundamentally different use cases. They also are significantly different as communities.

FOAF is being developed by an informal group of developers, and is quite recent in origin. The group is small: the FOAF development email list has about 350 members. Another 350 individuals are listed on the FOAF wiki pages as having a FOAF profile available on the Web. This is obviously not the full extent of FOAF usage, but these numbers reflect the recent development of this kind of metadata.

The library community has hundreds of years of investment in the creation of metadata (even though it was not called that when libraries began to create it). There are at least tens of thousands of libraries in the world, many of which have been in existence for centuries. Library data has its origins in early 19th century book catalogs but has been created in a machine-readable format since the late 1960’s. Library data is created following formal rules governed in part by international agreements, and there are many hundreds of millions of machine-readable bibliographic records in existence that were created based on these library cataloging principles.

Libraries have engaged in wide-spread data sharing for centuries, and with the global networking capabilities of today libraries are actually able to exchange and re-use data on a huge scale. Libraries do not each create metadata for the same book or item, but instead share the metadata created by one library in cooperative efforts oriented towards resource sharing and efficiency.

This sharing is built into the very core of library data management. The ability to use data created by others is supported by standards and those standards form the basis for the library systems. While most users see only the library catalog available to the public, that is only one function of a system that supports purchasing, fund accounting, inventory control, circulation and patron management, and collection analysis. In the Western world these systems are not created and maintained by libraries but by a small number of specialized commercial vendors whose products are specifically created for the library customers using agreed library standards. Thus the very same system can be sold to hundreds or thousands of libraries, creating a viable market base for system development.

A number of the 70,000 libraries contributing to OCLC are using a single standard, MARC21, and others are following international standards such as ISBD that produces standardized bibliographic description. The development of these standards is based on a large scale community process with international participation. It is not a perfect process by any means, and clearly must be updated to meet modern needs and new technologies that have changed the way we work, but the degree of data sharing libraries depend on requires that a formal process be in place to support the standards of this community.

Sharing of data on a large scale is necessitated by the economic reality of the library sector. Libraries face increasingly shrinking budgets while coping with an upswing in demand for their services. Realistically, this means that changes to library data must be carefully coordinated in order to minimize disruption to the complex network of data sharing that makes cost-effective library services management, based on this data, possible. Libraries may appear to be mistrustful of change agents, and in some cases they certainly are, but there is a real need to minimize risk for the community as a whole in order to assure the health of these often financially fragile institutions.

So we come back to the question of libraries and FOAF. In the final analysis, we’re not at all sure that there’s much gain in trying to combine these two approaches, with the differences in their communities and functions. It could be like trying to combine oil and water, requiring compromises that in the end would be less than satisfactory for both communities. One could argue that the difference between the vocabularies and their contexts is a positive, allowing more than one view of the Person entity. As two separately maintained metadata vocabularies, anyone creating metadata can choose from either as needed without sacrificing precision. One can also imagine other views that will arise, such as Persons in medical data or financial data, which would each carry data elements that are neither in FOAF nor library data, from blood type to bank balance. The important thing is to make sure that these vocabularies are properly described and related to each other where possible. That way, each community can manage its own process based on its needs for standards integration, but data can be shared where appropriate.

We could begin with a more detailed discussion between the FOAF and the library communities about their metadata needs. With hundreds of years of experience in representing names in library catalogs, we feel confident that the library community’s knowledge could contribute in general to the use of personal names in the Semantic Web.

Be Sociable, Share!
By Diane Hillmann, September 10, 2010, 6:20 pm (UTC-5)

Add your own comment or set a trackback

Currently 4 comments

  1. Comment by Bruce D'Arcus

    Karen (Coyle):

    yes, the goals of libraries are library-centric. That should not surprise anyone, and will be the case as long as libraries exist. Try this one on for size: The goals of banks are banking-centric. Makes sense, doesn’t it?

    Maybe I was just being loose with language, but my point is to say that there are different communities of practice around finding bibliographic stuff (articles, videos, books, etc.) and that some of the traditions that have developed within the library world to achieve this task may require a rethink in 2010. The privileging of authoritative names as distinct from people is an obvious example.

    I agree with you it might make sense to link, say, names or persona to some foaf descriptions.

    On:

    As we say in our post, in the case of FOAF there are NO properties that are the same as library Person properties.

    So you’re really saying that a foaf:Person is completely orthogonal to a library:Person: they’re different things? And you need no property that describes the person’s name?

  2. Comment by Karen Weaver

    In the first comment, KCoyle writes that the “goals of libraries are library-centric” in this context, of this post above, I’m not sure I would completely agree with this –it implies that libraries are not taking everything else in terms of enabling broader access into consideration–I would disagree with that narrow view of libraries goals as purely library-centric. enabling changes should not equate with having a library-centric tunnel vision either.
    just something that caught my eyes as reading this post….cheers KW

  3. Comment by Karen Coyle

    Bruce,

    1) yes, the goals of libraries are library-centric. That should not surprise anyone, and will be the case as long as libraries exist. Try this one on for size: The goals of banks are banking-centric. Makes sense, doesn’t it?

    2) Re-use of vocabularies is a GOOD THING, but only works when there are properties that you can use as defined. As we say in our post, in the case of FOAF there are NO properties that are the same as library Person properties. We probably should have said it here, but I have often advocated that library Person entities for living persons could and should link to a person’s FOAF profile to connect the library user to the social networking information provided by the person. But that is very different from saying that libraries should use foaf:Person to describe persons, or foaf:name or foaf:title for library cataloging data. One should only use FOAF properties in a way that is fully compatible with the definition of those properties.

  4. Comment by Bruce D'Arcus

    Your description of the problem you’re trying to solve reflects a peculiarly library-centric formulation, and even in the way you describe the problem assumes a particular kind of solution.

    It seems to me a more neutral way to put it is to say that you (and in particular your users) need to be able to relate people to their publications (etc.). For example, a user needs to be able find all of the publications by a person whose printed name may vary, or to distinguish items by people with the same name.

    I don’t know what practical difference the distinction I am drawing here about problem formulation makes, but it may be quite profound.

    Second, I believe your characterization of the social and technical issues involved in merging, integrating or adapting FOAF to library needs is a bit off. RDF pretty much encourages vocabulary reuse, which by definition means you can mix-and-match terms without any coordination between communities. For example, with the BIBO vocabulary I helped design, we deferred to DC and FOAF for as much as possible. In some cases, that wasn’t enough. But that didn’t hold us back.

    Practically speaking, then, if you want to add biographical information to a FOAF description, that’s trivial to do (there’s a vocabulary for that: bio). Dealing with different name forms may be a little more difficult match, but possible. Pseudonyms are a particularly interesting case that I’ve thought about, and come to the conclusion (IIRC; it’s been awhile) that it could probably be done with a simple property and maybe a new class.

Add your own comment



Follow comments according to this article through a RSS 2.0 feed