Notes from the Preserving linked data Session

Only four participants! Antoine Isaac (Europeana), Romain Wenz (BnF), Ryan Donahue (Met), Cate O’Neill (Find&Connect)

As it appears, there are more urgent issues to solve for LODLAM.
In fact issues are similar to the ones that were raised about WWW long ago. As WWW survives them, maybe LD can survive them too. It however seems tricky for ‘reference’ datasets. And what would happen when you re-use others’ data?

Some (only slightly curated) bullet points:

– Basic issue: allowing decentralized data access and use, preservation beyond basic requirement of persistent URIs. Data/links can change!

– Handling updates similar to what happens for historical place names in catalogues. (scope of “The netherlands” as of 1821, as opposed to later).

– Preserving context: keeping different levels of truth, different parts of the provenance (time and data producers)

– RDF triples make time and data provenance tricky to represent, unless we go for quadruple or versioned URIs (which have their disadvantages). BnF more-or-less tracks manually (on demand) the provenance.

– Serve representations (data) for which “versions” of a resource (URI)? Interest of an “historical GET”, comparable to Memento (www.mementoweb.org).
Basic solution: no versioned URIs for the resource. but keep track of different versions of the representations (RDF data, HTML page). data.bnf.fr uses Internet archive to archive its representations (just one canonical representation – RDF/XML – for each URI)
Creating Dataset of datasets to find their archive back?

– How to decide what what to preserve/give access to? Everything/every version? Linked data users probably want to get is “best” for the identifier. And it may change! E.g., deprecating some names in authorities from preferred to alternative.
BnF has some cases, where people ask to remove data (Birth dates, Attributions that are not good for the reputation). In such cases, it’s not really desirable to even keep track of historical data in the authoritative service.
Should we mint/re-use URIs or HTTP code for saying that data was removed?
Note: cf OAIS: preservation success is success *for humans*!

– Examples of linked data that was not preserved?
Probably some Talis datasets.

– Misc. remarks on persistent identifiers.
A trick to preserve identifiers is embed identifiers inside other identifiers. But needs some resolver service!
URI design: problem of meaning attached to the URI. We need to separate description function from identification one.

Leave a Reply

Your email address will not be published. Required fields are marked *