We’ve received this question several times. Especially since we’ve been claiming for a while that linked data was important for us, for example in this whitepaper, and have tried to encourage its promotion in our domain, for example by this video…
The coming LODLAM summit provides a good reason to blog about it. Perhaps this can help discuss, understand and guide how organisations like ours can use it…
First, of course, we’ve played with a “real” Linked Open Data service, data.europeana.eu. But it’s still in pilot phase, not yet strictly aligned with our production system. That will happen in the coming months. However, what I want here is to discuss to other features, which are in the production service, but more hidden, and that show that there’s more in the LOD vision that applying an entire technical stack at once, as often presented in the books.
The foundation for adopting the LOD vision at Europeana is in fact quite deep: a new data model, EDM. As opposed to what Europeana had been based before (plain flat records) the new model encourages providers to send richer, networked metadata. An good thing is that even before the model was implemented, it had a certain effect in the way we interact with data providers. The design phase, especially, was a collaborative effort trying to accommodate fundamental requirements from our library, archive and museum stakeholders. And they rather liked that we would try to handle their data better.
Currently we’re rolling the model out slowly, with first providers sending data (for example this one, one enhancement at a time (for example hierarchical objects are due soon).
For our production service, we still ingest data as XML (even though our XML schema actually specifies a form of RDF/XML). We store the data in a non-RDF database (mongoDB) and we’ll probably do it for a while. But the storage layer has been designed to follow the principles of the new model. In the same fashion, our search portal and API still uses lucene/solr. And still, basic tweaks allow us to emulate some semantic search functions, such as query expansion using semantic hierarchies or concepts with translated labels.
This is in fact useful even for descriptions that are not provided to us as rich metadata with contextual resources (from thesauri, gazetteers, etc): Europeana has started to do simple metadata enrichment using linked open data sources, especially Geonames and GEMET.
Note that when the provider’s metadata refers to a contextual source published as Linked Open Data, such as this concept, we run a script to harvest it. It requires a manual mapping to fit the source’s model to the one we can ingest, but if the source comes in a standard model like SKOS, then we’re covered. Our providers may thus skip sending us data that is already elsewhere. It also gives them more motivation to enrich their object metadata to external reference datasets by themselves, in a better way than what we’d do ourselves.
One last work in progress is data publishing. For the amateurs, our API will also soon spit out JSON-LD, making better justice to the new data model. And since this year we also publish RDFa schema.org mark-up on all our europeana.eu portal pages, resulting for example in this data. This is still experimental though. We are involved the W3C Schema Bib Extend Group and hope this will help us to keep our mark-up aligned with community expectations — and maybe in fact to know better what these expectations are. Perhaps this can also be a good one to work on for LODLAM!
Now, perhaps all this is not the full semantic technology stack, but it is bringing us somewhere, step by step…
Connections, it is all about connections… that’s all I could think about whilst swimming this morning. A gorgeous Sunday in Sydney, Australia. Blue dome sky and a stupendously bright tree radiating autumn. The reason I have connections on the brain is that my sister Cara and I exchanged images via email last week. She sent me a beautiful picture of golden leaves and I returned with this flaming tree. We were connecting our sense of the season change across the Tasman sea.
Linked Open Data Designers
Where is this post going you ask? Well, it is going in the direction of something that always bothers me: who are the beneficiaries of the efforts made to provide linked open data? What impact are we going to have as a community of archivists, curators, developers, information managers, librarians, architects — as designers — on our communities, people like my sister, or the researchers I work with? This is a HUGE season change within the GLAM sector. I finally found the time to read a post by Marshall Breeding: The Systems Librarian, 26 May 2013 “Linked Data: the Next Big Wave or Another Tech Fad?. I recommend it, there is a potted history of library automation, and straight shooting comment on the rise of MARC and changes ahead. More on that front by William Y Arms as a part of a special issue of Library Hi Tech, August 2012 – The 1990s: The Formative Years of Digital Libraries for the curious.
Family History Research and Scale
So I ask myself, what is going to help my sister, our family historian, do her incidental genealogical searching about our family in New Zealand, Australia, Scotland, Ireland and England? Well, I’d really like every local history database, index, register or set of cards turned into linked open data thanks very much. Many of these precious local history treasures are nurtured without much wider awareness of their existence, except by information professionals like us, and those eager researchers doing local or family history. You just have to look at the credits on television shows such as Who do you think you are? derived from the BBC series Who do you think you are? that proliferate and a stream of GLAM names big and small flutter across the screen. Who is going to do that? Well, there’s a question, most small GLAMs do not have the developers on hand to turn their treasures into linked open data. Listen to an interview on Museopunk by Jeffrey Inscho and Susie Cairns with Michael Edson from the Smithsonian and Paul Rowe from Vernon Systems and you’ll get the picture pretty quickly.
In this, the inaugural episode of the Museopunks podcast, the Punks chat to Michael Edson, Director of Web and New Media Strategy at the Smithsonian Institution, and Paul Rowe, CEO of Vernon Systems, about museums in the Age of Scale. How can museums rethink their practices to work at web scale, from the smallest institutions up to the biggest?
This is a perennial problem, and more importantly now, not one to be shirked, while the discussion on linked open data is hot. Family history research is huge and if you trace the fortunes of large providers of user-pays family history information it might offer a clue that this is a V-A-S-T area for social value to be generated (commercial value to some). Digitisation has been driven by heavy usage demands and at the front of that queue are family history researchers. If I were a gambling woman, I’d punt that if these sources of local and family history were prioritised for transformation enormous amounts of social value would be provided. The question I guess might be next, how to coordinate this? Whose role might it be to provide such an online service… ahem… (*whispers* a large cultural institution or a consortium of them?). Perhaps we can have a good old fashioned debate about this at the 2013 summit.
Humanities, Arts and Social Science (HASS) Researchers
The next big question I ask myself as one of the multitude of designers in the realms of digital development is: What about the HASS researchers? Part of my job at Intersect Australia is to support HASS researchers wherever I can, to get their “eResearch” needs met. That can mean being a project manager, a metadata nerd, an analyst, a product owner, whatever it takes. Last week I dug out a paper for a colleague about big data challenges for biological science. On reflection, I have thought how far that discipline has come and my efforts to support the notion of humanities informatics. Secretly, I hope that the deluge of linked open data that pours out from the GLAMs is going to permit AMAZING research to be done that hasn’t before. What I see on the horizon is a level by which discovery of cultural flow, social phenomena or cultural history, will benefit general researchers of culture and social history, and, at the same time, benefit HASS research in totally new ways. Also… watch too what comes out of digital humanities linked open data projects, that’s all I can say (hint: HuNI)!
GLAM Linked Open Data Ecosystems
I’ve mentioned a discussion on linked open data ecosystems I’d like to have at the 2013 summit, at the same time I’m thinking WT? – ecosystems – really? Ages ago I spoke with a chap (Jamie Norrish co-author of EATS the Entity Authoring Tool) whose thoughts I respect immensely at the LODLAM-NZ meetup in Wellington 2011 and he said in so many words: why linked open data Ingrid, it isn’t scalable. Without knowing a darn thing about “scale” in computing terms, except that computer scientists tell me that there is a problem with scaling up linked open data, I said in so many words: GLAMs will establish their own ecosystems. Not all the web may be semantic, but maybe some parts of it really will deliver value by being so, i.e. GLAMs. He seemed to think that was an ok answer. My colleague on the humanities virtual lab project HuNI (Humanities Networked Infrastructure), Conal Tuohy @conal_tuohy and I haven’t had this discussion yet, we’ve both been too busy working, him hurling code around like a concert conductor, me writing RDF weaving patterns.
Reliance on the Ecosystem
Speaking of which, Con is the brains behind the information design (amongst other parts of the lab design) on HuNI – see the Corbicula the linked data gateway tool he has developed and put into the LODLAM 2013 summit challenge – and right now he’s working on faceted search. I’m attempting to mind meld with him when I can and rely on his knowledge of RDF and ontology development (he’s an old hand at TEI). At some point we will need to pop our heads above the build and the data transformation process and answer a question about the services and aggregate we create from a range of Australian scholarly cultural datasets (for the lab) and where it sits in the context of Australian GLAM datasets (providing linked open data services). Hence the minor obsession with looking at linked open data ecosystems. Conal has harvested party identifiers from the National Library of Australia. More recently I’ve learned that Griffith University have identifiers for the ANZSRC codes (that’s standard research codes for Australian and New Zealand research). Perhaps reliance is another useful topic for the summit?
The 2013 LODLAM Summit
I’m seriously looking forward to being part of it – and not just because my Kiwi twang will duet with that of Chris McDowell’s and the Australian high notes accumulated by years in Sydney will merge into a chorus with those of Rowan Brownlee, Kerry Kilner, Eleanor Whitworth and Cate O’Neill. But.. because linked open data is about connections across boundaries – cultural flow and all – I’m really looking forward to hearing about the connections the folks from other parts of the globe want to make too.
For more details see the THATCamp Brisbane website and make sure if you’re keen to be part of the LODLAM pop-up, ideally you register. The good bit about registering is you also get lunch catered for.
The THATCamp Brisbane organisers invite digital arts and humanities researchers and professionals in the GLAM sector to join in and participate in the unconference. The pop-up LODLAM at the THATCamp follows on from the two LODLAM events run in Melbourne with the support of Eleanor Whitworth (Culture Victoria) and Ely Wallis (Museum Victoria).
Conal Tuohy will be at the THATCamp Brisbane, and leading the pop-up LODLAM. Another linked open data enthusiast is Anna Gerber, and she’s one of the THATCamp Brisbane organisers.
So all that’s needed is participants!
People that want to know more about linked open data and an interest in how linked open data could be a useful means of conveying and connecting research and collection data across scholarly, galleries, libraries, archives and museums domains for online search – this is a great chance to ask questions and share information or ideas.