Persistent Object Identifiers POID

I learned about a workshop discussing ideas around persistent identifiers held in the Netherlands last month as a result of seeing an email from Andrew Treloar @atreloar (Australian National Data Service – ANDS).  This workshop organised by the Knowledge Exchange was a seminar to pay:

“attention to the usage of PIDs for publications, and increasingly for data, and for combinations of text, media and data. Also the relation with Author Identifiers was discussed. Standardisation and specifications for transparency between systems was addressed.  In break out sessions participants discussed the benefits and challenges in operating multiple persistent identifier systems and the relation of persistent identifiers to Linked Data.”

Numbered | howtodesign | CC BY-NC-ND
Numbered | howtodesign | CC BY-NC-ND

This grabbed my attention because of some of the discussions both semantic and technical at #lodlam back in May and some of the architectural conundrums facing linked open data enthusiasts.

“more than 40 experts involved in various Persistent Object Identifier (POID) communities met for a Knowledge Exchange seminar to discuss the challenges and opportunities involved in interoperability between multiple PID-systems.  Three major systems – Handle, URN:NBN and DOI – presented their current state of affairs and examples of their systems in practice….”

The presentations from this seminar are online and provide some food for thought for the techies thinking around how to set up IDs in linked open data systems.

So I figure this community if it isn’t already aware of this discussion might like to be.  I know this is a conundrum that many of those involved with undertaking ANDS funded projects are trying to get their heads around what identifier systems to use and there has been a heap of documentation made available on the ANDS website in an effort to support this.  There is information to guide those into the area of system identifiers; there are several pages designed to inform the newby, familiar, and the expert on persistent identifiers, and there is a focused page on DOI (Digital Object Identifiers).

If you’re interested to know more about the Party Infrastructure soon to be launched in Australia through the National Library of Australia, keep your eye on the NLA Party Infrastructure project wiki.

I hope some of this information comes in handy!

Ingrid @1n9r1d

Report on the LOD-LAM Summit at ‘Linked Data and Libraries 2011’

Earlier today I gave a short report on the LOD-LAM Summit at the Talis ‘Linked Data and Libraries 2011‘ event held at the British Library in London. I’ve embedded the slides here:

There’s quite a few photos from the event in the slides, as well as some from the Internet Archive visit on the last slide. I drew attention to the breakout group notes on the Pirate Pad as the place to go to find out more, as well as the Summit blog of course. There was clearly interest in the #lodlam London event (#lodlamlon ?) that Mia Ridge and myself have been talking about. Mia has taken the lead on pulling this together via the meet-up page.

I also noted that Antoine Isaac was suggesting the possibility that the #lodlam community may be able to pick up the W3C Library Linked Data Incubator Group activities that are about to finish soon.

There’s a write up of my report and the other talks from the event on Owen Stephen’s ‘Overdue Ideas‘ blog.

Cheers, Ade

Proposed: a 4-star classification-scheme for linked open cultural metadata

One of the outcomes of last week’s LOD-LAM Summit was a draft document proposing a new way to assess the openness/usefulness of linked data for the LAM community. This is a work in progress, but is already provoking interesting debate on our options as we try to create a shared strategy. Here’s what the document looks like today, and we welcome your comments, questions and feedback as we work towards version 1.0.

*******************************************************************

DRAFT

A 4 star classification-scheme for linked open cultural metadata

Publishing openly licensed data on the Web and contributing to the Linked Open Data ecosystem can have a number of benefits for libraries, archives and museums.

  1. Driving users to your online content (e.g., by improved search engine optimization);
  2. Enabling new scholarship that can only be done with open data;
  3. Allowing the creation of new services for discovery;
  4. Stimulating collaboration in the library, archives and museums world and beyond.

In order to achieve these benefits libraries, museums and archives are faced with decisions about releasing their metadata under various open terms. To be open and useful as linked data requires deliberate design choices and systems must be built from the beginning with openness and utility in mind. To be useful for third parties, all metadata made available online must be published under a clear rights statement.

This 4-star classification system arranges those rights statements (e.g. licenses or waivers) that comply with the relevant conditions (2-11) of the open knowledge definition (version 1.1) by order of openness and usefulness: the more stars the more open and easier the metadata is to used in a linked data context. Libraries, archives and museums wanting to contribute to the Linked Open Data ecosystem should strive to make their metadata available under the most open instrument that they are comfortable with that maximizes the data’s usefulness to the community..

Note: This system assumes that libraries, archives and museums have the required rights over the metadata to make it available under the waivers and licenses listed below. If the metadata you want to make available includes external data (for example vocabularies) you may be constrained by contract or copyright to release the data under one of the licenses below.

★★★★ Public Domain (CC0 / ODC PDDL / Public Domain Mark)

as a user:

  • metadata can be used by anyone for any purpose
  • permission to use the metadata is not contingent on anything
  • metadata can be combined with any other metadata set (including closed metadata sets)

as a provider:

  • you are waiving all rights over your metadata so it can be most easily reused
  • you can specify whether and how you would like acknowledgement (attribution or citation, and by what mechanism) from users of your metadata, but it will not be legally binding

This option is considered best since it requires the least action by the user to reuse the data, and to link or integrate the data with other data. It supports the creation of new services by both non-commercial and commercial parties (e.g. search engines), encourages innovation, and maximizes the value of the library, archive or museum’s investment in creating the metadata.

★★★ Attribution License (CC-BY / ODC-BY) when the licensor considers linkbacks to meet the attribution requirement

as a user:

  • metadata can be used by anyone for any purpose
  • permission to use the metadata is contingent on providing attribution by linkback to the data source
  • metadata can be combined with any other metadata set, including closed metadata sets, as long as the attribution link is retained

as a provider:

  • you get attribution whenever your data is used

This option meets the definition of openness, but constrains the user of the data by requiring them to provide attribution (in the legal sense, which is not the same as citation in the scholarly sense). Here, attribution is satisfied by a simple, standard Web mechanism from the new data product or service. By using standard practice such as a linkback, attribution is satisfied without requiring the user to discover which attribution method is required and how to implement it for each dataset reused. Note that there are other methods of satisfying a legal attribution requirement (see below) but here we propose a specific mechanism that would minimize the effort needed to use the data if the LAM community collectively agrees to it. Also note that even this simple (ideally shared) attribution method could prevent some applications of linked data if linkbacks are required by many datasets from many sources.

★★ Attribution License (CC-BY / ODC-BY) with another form of attribution

as a user:

  • metadata can be used by anyone for any purpose
  • permission to use the metadata is contingent on providing attribution in a way specified by the provider
  • metadata can be combined with any other metadata set (including closed metadata sets)

as a data provider:

  • you get attribution whenever your data is used by the method you specify

This option meets the definition of openness in the same way as the linkback attribution open,  but requires the user to provide attribution is some way other than a linkback, as specified by the data provider. The provider could specify an equally simple mechanism (e.g. by retention of another field, such as ‘creator’ from the original metadata record) or by a more complex mechanism  (e.g. a scholarly citation in a Web page connected to the new data product or service). The disadvantage of this option is that the user must discover what mechanism is wanted by the particular data provider and how to comply with it, potentially needing a different mechanism for each dataset reused. For large-scale open data integration (e.g. mashups) this option is difficult to implement.

★ Attribution Share-Alike License (CC-BY-SA/ODC-ODbL)

as a user:

  • metadata can be used by anyone for any purpose
  • permission to use the metadata is contingent on providing attribution in a way specified by the provider
  • metadata can only be combined with data that allows re-distributions under the terms of this license

as a provider:

  • you get attribution whenever your data is used
  • you only allow use of your data by entities that also make make their data available for open reuse under exactly the same license

This option meets the definition of openness but potentially limits reuse of data since if more than one dataset is reused and if each dataset has an associated Share-Alike license. Under an Share-Alike license, the only way to legally combine two datasets is if they share exactly the same SA license, since most SA licenses require that reused data be redistributed under exactly same license. If the source datasets had different Share-Alike licenses originally (e.g. CC-BY-SA and ODC-ODbl) then there is no way for the user to comply with the requirements of both source data licenses so this option only allows users to link or integrate data distributed under one particular SA license (or one SA license and any of the other license or waiver options above). In the LAM domain, where significant value is created by combining datasets, the Share-Alike license requirement severely reduces the utility of a dataset.

Related Material

LODLAM Reading Lists

We’ve got a wide variety of participants coming to the LOD-LAM Summit, so suggesting a reading list is kind of tough. Keep in mind that participants range from technology staff, policy makers, developers, librarians, digital humanists, hackers and everywhere in between. I’m going to throw out some of my favorite books and articles, but please add more in the comments as this is by no means exhaustive. And if a lot of these names look familiar, it’s because you’ve seen them on the participant list for the Summit.

LODLAM Guides
Open Bibliographic Data Guide. This guide from JISC focuses more on open rather than linked data, but it’s a critical first step toward Linked Open Data.

Linked Data primers (books)
Programming the Semantic Web, Toby Segaran, Colin Evans, & Jamie Taylor. 2009. Great primer on graphs and plenty of example code.

Linked Data: Evolving the Web into a Global Data Space, Tom Heath and Christian Bizer. 2011. This is a great book, recently released, that provides a concise and in depth exploration into Linked Data, from conceptual overview to recipes for publishing data.

Licensing and Copyright
Rights and Licensing from JISC Open Bibliographic Data Guide. Recommendations for publishing Open Data for Libraries.

Digital Cultural Collections in an Age of Reuse and Remixing, Kristin R. Eschenfelder and Michelle Caswell. Nov. 2010. This study examines the various views and considerations of cultural institutions in allowing reuse of digital cultural works. It’s based on a 2008 survey that is, in my opinion, just at the turning point of a rather radical cultural shift in opening metadata for reuse and sharing.

Recommendations for independent scholarly publication of data sets, Jonathan Rees. March 2010. These recommendations come from the perspective of the sciences but can equally be applied to the humanities, and embodies the shift toward sharing data for future use and portability.

I’m sure there are more articles to add to this list and please feel free to do so in the comments.