June 2013 – LODLAM

One of the bottlenecks to get museum data in the Linked Data Cloud is that it is hard to do it. The Europeana and CRM ontologies are large and complicated, and it is difficult to map data from the museum databases to these ontologies. For the last few years we’ve been working on tools to help people map their data to ontologies without programming or without writing scripts in languages such as xpath and XSLT. The tool is called Karma, and you can download it from http://isi.edu/integration/karma.

We would like to propose a session to show Karma. We have used it with datasets from several museums, and would like to show how we mapped the data from the Smithsonian American Art museum to the Europeana ontology (41,000 objects and 8,000 artists) and how we did linking to DBpedia, the NY Times and several other datasets. We think that Karma makes the process much easier than using other tools, and we’d love to hear what you think, and hopefully provide you tools to help you.

We presented a paper about this last month at the Extended Semantic Web Conference (ESWC) in Montpellier. You can get the paper at http://bit.ly/11X5YPo and the slides at http://slidesha.re/18vxMnn. I am very proud to say that we received the best in-use paper award for this work, and makes me very happy that our work with the Smithsonian museum was recognized at the conference.

You can also browse the data on the SPARQL endpoint. We are using Pubby (same thing as DBpedia), but looking forward to getting better tools from you. So check it out, here is the page for John Singer Sargent.

Pedro Szekely

We’ve received this question several times. Especially since we’ve been claiming for a while that linked data was important for us, for example in this whitepaper, and have tried to encourage its promotion in our domain, for example by this video…

The coming LODLAM summit provides a good reason to blog about it. Perhaps this can help discuss, understand and guide how organisations like ours can use it…

First, of course, we’ve played with a “real” Linked Open Data service, data.europeana.eu. But it’s still in pilot phase, not yet strictly aligned with our production system. That will happen in the coming months. However, what I want here is to discuss to other features, which are in the production service, but more hidden, and that show that there’s more in the LOD vision that applying an entire technical stack at once, as often presented in the books.

The foundation for adopting the LOD vision at Europeana is in fact quite deep: a new data model, EDM. As opposed to what Europeana had been based before (plain flat records) the new model encourages providers to send richer, networked metadata. An good thing is that even before the model was implemented, it had a certain effect in the way we interact with data providers. The design phase, especially, was a collaborative effort trying to accommodate fundamental requirements from our library, archive and museum stakeholders. And they rather liked that we would try to handle their data better.

Currently we’re rolling the model out slowly, with first providers sending data (for example this one, one enhancement at a time (for example hierarchical objects are due soon).

For our production service, we still ingest data as XML (even though our XML schema actually specifies a form of RDF/XML). We store the data in a non-RDF database (mongoDB) and we’ll probably do it for a while. But the storage layer has been designed to follow the principles of the new model. In the same fashion, our search portal and API still uses lucene/solr. And still, basic tweaks allow us to emulate some semantic search functions, such as query expansion using semantic hierarchies or concepts with translated labels.

This is in fact useful even for descriptions that are not provided to us as rich metadata with contextual resources (from thesauri, gazetteers, etc): Europeana has started to do simple metadata enrichment using linked open data sources, especially Geonames and GEMET.

Note that when the provider’s metadata refers to a contextual source published as Linked Open Data, such as this concept, we run a script to harvest it. It requires a manual mapping to fit the source’s model to the one we can ingest, but if the source comes in a standard model like SKOS, then we’re covered. Our providers may thus skip sending us data that is already elsewhere. It also gives them more motivation to enrich their object metadata to external reference datasets by themselves, in a better way than what we’d do ourselves.

One last work in progress is data publishing. For the amateurs, our API will also soon spit out JSON-LD, making better justice to the new data model. And since this year we also publish RDFa schema.org mark-up on all our europeana.eu portal pages, resulting for example in this data. This is still experimental though. We are involved the W3C Schema Bib Extend Group and hope this will help us to keep our mark-up aligned with community expectations — and maybe in fact to know better what these expectations are. Perhaps this can also be a good one to work on for LODLAM!

Now, perhaps all this is not the full semantic technology stack, but it is bringing us somewhere, step by step…

Month: June 2013

Connecting the Smithsonian American Art Museum to the Linked Data Cloud

What is Europeana doing with semantic web and linked open data?