What is Europeana doing with semantic web and linked open data?

We’ve received this question several times. Especially since we’ve been claiming for a while that linked data was important for us, for example in this whitepaper, and have tried to encourage its promotion in our domain, for example by this video

The coming LODLAM summit provides a good reason to blog about it. Perhaps this can help discuss, understand and guide how organisations like ours can use it…

First, of course, we’ve played with a “real” Linked Open Data service, data.europeana.eu. But it’s still in pilot phase, not yet strictly aligned with our production system. That will happen in the coming months. However, what I want here is to discuss to other features, which are in the production service, but more hidden, and that show that there’s more in the LOD vision that applying an entire technical stack at once, as often presented in the books.

The foundation for adopting the LOD vision at Europeana is in fact quite deep: a new data model, EDM. As opposed to what Europeana had been based before (plain flat records) the new model encourages providers to send richer, networked metadata. An good thing is that even before the model was implemented, it had a certain effect in the way we interact with data providers. The design phase, especially, was a collaborative effort trying to accommodate fundamental requirements from our library, archive and museum stakeholders. And they rather liked that we would try to handle their data better.

Currently we’re rolling the model out slowly, with first providers sending data (for example this one, one enhancement at a time (for example hierarchical objects are due soon).

For our production service, we still ingest data as XML (even though our XML schema actually specifies a form of RDF/XML). We store the data in a non-RDF database (mongoDB) and we’ll probably do it for a while. But the storage layer has been designed to follow the principles of the new model. In the same fashion, our search portal and API still uses lucene/solr. And still, basic tweaks allow us to emulate some semantic search functions, such as query expansion using semantic hierarchies or concepts with translated labels.

This is in fact useful even for descriptions that are not provided to us as rich metadata with contextual resources (from thesauri, gazetteers, etc): Europeana has started to do simple metadata enrichment using linked open data sources, especially Geonames and GEMET.

Note that when the provider’s metadata refers to a contextual source published as Linked Open Data, such as this concept, we run a script to harvest it. It requires a manual mapping to fit the source’s model to the one we can ingest, but if the source comes in a standard model like SKOS, then we’re covered. Our providers may thus skip sending us data that is already elsewhere. It also gives them more motivation to enrich their object metadata to external reference datasets by themselves, in a better way than what we’d do ourselves.

One last work in progress is data publishing. For the amateurs, our API will also soon spit out JSON-LD, making better justice to the new data model. And since this year we also publish RDFa schema.org mark-up on all our europeana.eu portal pages, resulting for example in this data. This is still experimental though. We are involved the W3C Schema Bib Extend Group and hope this will help us to keep our mark-up aligned with community expectations — and maybe in fact to know better what these expectations are. Perhaps this can also be a good one to work on for LODLAM!

Now, perhaps all this is not the full semantic technology stack, but it is bringing us somewhere, step by step…