Karma: tools for mapping data to ontologies

One of the bottlenecks to get museum data in the Linked Data Cloud is that it is hard to do it. The Europeana and CRM ontologies are large and complicated, and it is difficult to map data from the museum databases to these ontologies. For the last few years we’ve been working on tools to help people map their data to ontologies without programming or without writing scripts in languages such as xpath and XSLT. The tool is called Karma, and you can download it from http://isi.edu/integration/karma.

We would like to propose a session to show Karma. We have used it with datasets from several museums, and would like to show how we mapped the data from the Smithsonian American Art museum to the Europeana ontology (41,000 objects and 8,000 artists) and how we did linking to DBpedia, the NY Times and several other datasets. We think that Karma makes the process much easier than using other tools, and we’d love to hear what you think, and hopefully provide you tools to help you.

We presented a paper about this last month at the Extended Semantic Web Conference (ESWC) in Montpellier. You can get the paper at  http://bit.ly/11X5YPo and the slides at http://slidesha.re/18vxMnn. I am very proud to say that we received the best in-use paper award for this work, and makes me very happy that our work with the Smithsonian museum was recognized at the conference.

You can also browse the data on the SPARQL endpoint. We are using Pubby (same browser as DBpedia), but looking forward to getting better tools from you. So check it out, here is the page for John Singer Sargent.

 

The Great War, Linked Open Data and Chinese Food.

If you have flown in a day early for the Summit and have an interest in the upcoming centenary of the Great War that is around the corner, join us for dinner at Kam Fung tonight in Chinatown at 7PM for Linked Data, the Great War and … chinese food. (Map here)

(note: just a few spaces left – do get in touch with me if you are planning on coming. )

A few linked open data projects about the topic, including the New Zealand WW1 Linked Open Data Project, The Muninn WW1 Project, Out of the Trenches , Europeana and the SeCo group Timeline and History of World War I.

If we are serious about linked open data, we should look at ways of automating the linking of our data-sets in order to benefit from each others connections. My own topics of interest are going to be distributed name authorities and GIS information about the war. Given the widespread availability of British trench map coordinates and new sources of historical GIS information systems such as the Open Historical Map, there is plenty of low hanging fruit for us to work on.

If you can’t make the dinner, I’m proposing a session on the First World War on Thursday the 20th during the morning session… As well as on on naval / ocean and historical mapping data.

 

CIDOC CRM and its role in semantic CH projects

The CIDOC CRM is a compact top-level (conceptual) ontology that is appropriate for cultural heritage, historic discourse, archaeology. It supports generic description of cultural artifacts, people, places, sites, related events (e.g. creation, acquisition, finding, curation, conservation), cultural periods. It is standardized as ISO 21127:2006, but undergoes continuing development.

CRM is at the heart of the ResearchSpace project (http://www.researchspace.org/), a web-based collaborative system for art research, based on LOD and CRM. ResearchSpace is funded by the Andrew Mellon Foundation, run by the British Museum, software development by Ontotext. The relevance of CRM to CH research discourse is described by Martin Doerr here: http://www.researchspace.org/researchspace-concepts/technological-choices-of-the-researchspace-project. A recent blog by Dominic Oldman and Martin Doerr compares CRM to other aggregation ontologies: http://www.oldman.me.uk/blog/costsofculturalheritage/

Ontotext helped the British Museum to develop its mapping to CIDOC CRM, and Best Practice guidelines that other museums can use. Ontotext has gained strong experience with CRM and is active on the CRM Special Interest Group (CRM SIG). We promote CRM extensions and corrections that facilitate real interoperability and federation between collections of different institutions.
Two relevant papers by Vladimir Alexiev: http://www.ontotext.com/publications/2012#CRM-FR-search and http://www.ontotext.com/publications/2012#CRM-Properties. A brief presentation about Ontotext and ResearchSpace: http://www.slideshare.net/valexiev1/research-space-vre-based-on-cidoc-crm.
More info about Ontotext’s CH projects: http://lodlam.net/members/vladimiralexiev/profile,  http://www.ontotext.com/Libraries_and_Archives

Ontotext organizes the workshop Practical Experiences with CIDOC CRM and its Extensions (CRMEX 2013, http://www.ontotext.com/CRMEX) in conjunction with the Theory and Practice of Digital Libraries (TPDL 2013) conference on 26 September 2013 in Malta.

This session can start with an intro to CIDOC CRM, followed by discussion of its relevance for CH projects.

Introducing a project to publish the Getty Vocabularies as LOD

 I am currently leading a project to publish all four Getty Vocabularies as LOD. The four Getty Vocabularies are: the Art & Architecture Thesaurus (AAT)®, the Union List of Artist Names (ULAN)®, the Getty Thesaurus of Geographic Names TGN)®, and the Cultural Objects Name Authority (CONA)™.  We are on track to start with AAT in July of this year. We will then move on to TGN, ULAN and finally CONA. Here is a PDF version of the most current flier – vocab_lod_flier

I am also interested in advice from the LODLAM community on what it takes to build and maintain a successful community of consumers of LOD versions of AAT, TGN, ULAN and CONA. Some of the discussions that would be helpful to us are:

  • Best ways to host and encourage open communication threads regarding things like issues, comments about our ontologies, offers of help, examples of complex SPARQL queries to share, etc.
  • Creating a road-map for community-built, open-source tools for access, contributing, matching, etc.
  • Use cases from the community

 

Introducing LODLAM Patterns

Linked Data provides us with an incredible opportunity to re-think how we approach sharing information about LAM collections.  However, these opportunities are also fraught with danger and important challenges that we must face.  Translating existing standards into compliant Linked Data will take more than just cross-walking terms with similar meanings, it also means mapping between conceptual models and ontologies.   Linked Data also provides us new opportunities to mix models and vocabularies in ways that we haven’t been able to do before.  How can we take better advantage of these opportunities?

Ultimately, creating Linked Data standards and practices is a set of design problems that we are all engaged in.   Elizabeth Churchill has called for “Data Aware Design” and the need to bring human-computer interaction methods to bear on these problems.  At the Summit I will be presenting a Dork Short about a new site that I’m launching to do just this.   LODLAM Patterns will identify Linked Data design patterns (which I’m calling representation patterns) for cultural heritage resources.   The idea is to identify common problems that we are trying to solve and link them to the solutions that are available across the many, many standards for describing LAM resources.  My goal is to create a resource that will spur discussions focused on problems/solutions,  provide newcomers a way to navigate the LOD standards universe, and a pedagogical tool to teach “design-thinking” for Linked Data.

Participate by signing up at http://lodlampatterns.org or follow along @lodlamp or #lodlamp.

Evaluating (and Enhancing) the Draft MODS RDF Ontology

The Library of Congress’ MODS/MADS Editorial Committee recently released a draft MODS RDF Ontology.  Because Columbia University Libraries / Information Services uses MODS as its primary schema for our digital collections (particularly those in Academic Commons, our institutional repository), we decided to actively experiment with this RDF ontology in the hopes that by improving it we will have an easy path forward to migrate our existing metadata into a triple store, such as 4store, enrich it, use it as an authority system, and make it available for consumption by others.

Our initial testing has shown some promise, particularly as the Editorial Committee have already worked to address some of our initial concerns (particularly how it favored literals over URIs); that said, it clearly has a ways to go. As such, I would like to propose a session for LODLAM where we could discuss how MODS RDF could be further improved to provide a robust and functional ontology for the LODLAM community.

Using automation and user feedback for interlinking archives

I work in BBC R&D, where we’re investigating new ways of publishing archive content, using a combination of automatic interlinking and user feedback. The use-case we’re focusing on to validate (or invalidate!) this approach is the BBC World Service archive, and you can see the results of our experiments in this prototype (registrations are currently open – please sign up and let us know what you think). We wrote about the prototype in a bit more details herehere and here.

From various events and conferences I’ve attended over the last year, there seems to be an increasing interest in trying to get the best of both humans and machines to help annotate and interlink large archives, so I would like to propose a session on that topic. This sessions should probably focus on real use-cases and lessons learned, as well as generating new ideas.

Very much looking forward to attending my first LODLAM!

Curation and Linked Data

In April I had an exchange with a couple of colleagues Sue @suelibrarian and Molly @madradish about a conference we have here in Melbourne, Australia, on a biannual basis: VALA (legacy acronym: Victorian Association of Library Automation).  VALA is a digital library and increasingly GLAM conference in Australia where we get to hear about some of the digital development work done.

I’m not sure what the equivalent of an ear worm is, when it comes to ideas, but Molly’s question strikes me as a good session topic and this thought has been niggling away in my mind for quite — some — time.  When I first started tinkering with the idea of what linked open data was going to DO and why any data collecting institutions (such as a GLAM) might DO linked open data – I wrote a paper to get some thoughts down.  Are the GLAMs going to bring a steampunk/neo-Victorian sensibility and aesthetic to Linked Open Data.   

More recently I have had the chance to talk to another colleague Rowan @usyd_dpa about what could be done with some of the “special” collections at University of Sydney Library.

So… I’d like to propose a session on curation and linked data.  A kind of “why are we doing this and who for?” type session.  Many professionals are going to need to make the case to funders and decision-makers to commit resources to transforming data into linked open data – and ideally there are useful principles or methods we can talk through to help make those arguments for support, and outcomes that are going to spin people’s wheels.

  • What linked open data project to do and why?
  • What ontologies to use and why?
  • What datasets to integrate and why?
  • Who will benefit from this and why?

I’ve another conundrum to share with brighter minds than mine – and it involves the role of large library catalogues, e.g. national union or national bibliographies, and smaller special or research library catalogues.  Without going into too much detail I am happy to take the punt and say there are roles for both (of course!), but… that may morph into a broader discussion about linked open data ecosystems (is there such a thing and is this another session?).