Curation of LOD

These are the session notes (sketchy I’m afraid) for the discussion on curation of linked open data on day 1 of the 2013 LODLAM summit in Montreal.  There are multiple ways to look at curation and that can be seen in the different slants brought into the mix – curation of the data that the agency or person has (and its state or fitness for reuse and supply) and the data that it is desirable to link to (why and what does that mean).  It is no surprise that questions of control and authority emerged and questions around reliance and co-contribution.  What is the perfect combination and how long will those combinations of data complement each other?

Moules, frites, bière
Moules, frites, bière
CC-BY Ingrid Mason

 

The wording in (brackets) is mine from recall.  Please feel free to comment and correct me if I’ve misinterpreted the notes.

  • Who to link to? (whose data to link the data you have to)
  • Why link to them? (is there a working relationship, how much prior collaboration, does this matter?)
  • How good they are? (what is the quality of the LOD you want to use and its relevance to your data?)
  • Who to trust may change over time?
  • Multiple suppliers of data (what to choose?)
  • Ecosystem (developing and changing)
  • Engagement of the curator in the ecosystem
  • Mediator, editor and value add through curation (to the use of LOD)
  • Mappings between different ontologies not just controlled vocabularies
  • Identity – automated linking (issues?)
  • Is VIAF a big enough grid? c/- IFLA hosted by OCLC
  • Wide reliance in (north American) libraries, e.g. OCLC example (Australia has the NLA People Australia service and there is ORCID too)
  • Linking is curation!
  • Is shared curation possible?
  • Institutional support – local, national and global linkages (follow culture, history, economics, language, trade routes and politics and there will be links?)
  • Whose requirements are being met?
  • Who pays for curation?
  • Who or what is a curator (of LOD)?
  • Curating what? (is it the data and the meaning or the interfaces too and the user experience of search and discovery too?)
  • Persistent URI exist as long as the web exists
  • Quid pro quo – get it (LOD) out quick to get it improved (co-contribution of correction or uptake for testing?)
  • !! Editorial decisions of the consuming organisation !! (of LOD) (this is curation?)
  • “publishing (LOD) with the authority of the institution” (surely this is curation?)
  • Some access is better than no access (is that always true?)
  • Data always links with a person (?) (multiple links to data sources provides diversity and useful redundancy?)
  • Open curation to the masses
  • Curation ups the quality but need good processes to help with cleaning or correction
  • Pressure on public institutions to participate in the commons
  • There is a social dimension between the curator, the community and the LOD ecosystem
  • Can use redundancy (see as an opportunity) to track errors, support consensus, and self-helping
  • Unattributed assertions (how to manage these, whether to integrate these, or not to allow them?)
  • Bidirectional (is this always the case, you link to me, I link to you?)
  • Embrace messiness and get over control issues (provide notices where the data hasn’t been checked or gone through curation process?)
  • (Use LOD) to provide supplementary information (see BBC Music)
  • Encode linking and curation as LOD, use W3C PROV-O ontology for provenance
  • Social quality – link Geodata – use: ID, City, Picture, Depiction
  • Example: OpenStreetMap
  • Buddy up with citizen curator (akin to citizen scientists)
  • BBC Wildlife trust of Wikipedia content, it filled in the gaps
  • See: Connecting the Smithsonian American Art Museum to Linked Data Cloud (US artists)
  • Flavours of LOD from well maintained and quality controlled provenance data to anonymous
  • Issues around how you present your LOD
  • Consumers’ may trust organisations may not always want to trace it (the LOD)
  • Attribution and usage (don’t conflate these two concepts for dealing with rights)
  • CC0 is “no rights reserved” effectively releasing the work into the public domain whereas CC-BY-NC is an acknowlegement of copyright and defines the nature of the use (as a licence) requiring attribution and non-commercial use
  • Note CC0 likely does not apply under Australian law and possibly also not New Zealand

Making the Case for LOD

These are the session notes (rough I’m afraid) for the discussion on making the case for linked open data on day 1 of the 2013 LODLAM summit in Montreal.  At some point I’d really like to summarise these ideas better or maybe get to a point where it is possible to tell success stories and cautionary tales so that those interested in making or reusing LOD can pick up and expand on the precious work done thus far.

Gold leaf floating caught on the wind
Gold leaf floating caught on the wind
CC-BY Ingrid Mason

The wording in (brackets) is mine from recall.  Please feel free to comment and correct me if I’ve misinterpreted the notes.

  • What are the pain points? (also who feels the pain)
  • Should the O in LOD be K for knowledge and have it rebadged? (perhaps LOD isn’t the terminology for everyone to understand what LOD can do)
  • Explain LOD so people understand it (keep is simple smarty-pants)
  • Different elevator pitches to stakeholders to get support (headlines for execs perhaps and technical speak for techs?)
  • Internal use case (who will invest and put their support behind you in a LOD project in your organisation)
  • Public use case (who are the public stakeholders and are their any general or specific needs that could be filled with LOD)
  • Listening (to stakeholders, to others experience, etc)
  • Benefits? (work out what these are and who will value what you do)
  • Responsibility? (who leads this work and/or needs to be involved to make it a success)
  • Demystifying LOD for stakeholders (non-tech speak and maybe outcomes in lay terms)
  • Keep LOD ‘under the hood’ (see slide 80, ALIAOnline Practical Linked (Open) Data for Libraries, Archives and Museums, to see how the web view and the underlying linked data are presented)
  • Who for? (make sure it is clear who the audience is for LOD project)
  • Why? (be clear about the goals for a LOD project)
  • What? (have a good think about what data to generate and integration and why)
  • Issues? Backlogs of wobbly data (this is very common and often underestimated, so perhaps including this in a LOD project outline ensures this doesn’t turn into a SNAFU)
  • Type of project – demo or BAU? (depends on how much traction with key supporters and how experimental a LOD project is)
  • Creative Commons (0), revenue risk (something to do with pressure around capacity to generate income if data isn’t CC0 (which is valid in US but not Australia or NZ btw)
  • Focus on your own data – less risk and less cost
  • Example, BBC Music – point out (use other LOD)
  • Users – what are their drivers?
  • Find ways to communicate to them (the users) e.g. via discovery
  • Scale – take care with this – ecosystem grows
  • Metrics e.g. AustLit.edu.au  (to justify investment and uptake)
  • What legal or funding requirements need to be surmounted to enable the data to be released as LOD?
  • Upfront deal with rights and costs (sic and offer value or benefits)
  • Attribution – how to deal with this or ask for it
  • Galaxy Zoo and gamification of the classification of galaxies
  • Work acknowledgement (perhaps rather than at triple level, which seems quite insane)
  • Figshare as an example (of the strength of openness in support of scholarly communication)
  • Scholarly practice and new practices of tagging (as part of a LOD project?)
  • Some ideas based on experience with e-artexte by artexte (small non-profit)
  • Problem: (how to get moving and get support)
  • Agree to be a guinea pig (this is a perfect idea)
  • Find advocates in the community
  • Publishing and visibility (catalogues online via website) (LOD apparent in search interface too?)
  • Work with a partner (Concordia), extension of library service (piggy back)
  • Solution: (what they did)
  • Open access repository (see news release)
  • Lots of outreach (getting buy-in and engagement by long term partners and supporters)
  • Next steps: (building on success)
  • Research projects (taking on new ideas)
  • Success stories (these are needed for LOD projects that hit the spot!)
  • Ways to work with technophobes “helps me do something I already do” (solve a problem with LOD?)
  • Works for open data (Wikimedia), can work with linked open data
  • Who to convince? (what do you need: money, permission, technical partners, registrar time?)
  • Who to trust? (what and who are you relying on and have you relied on them before?)
  • How to manage the question of authority? (publish your own LOD because you created it and monitor that which you integrate or ingest externally)
  • Deliver to core user stories (don’t go off into the wilds unles you’ve been funded to)
  • Prototype stage (is this Agile, i.e. make sure if you have key stakeholders they’re fully engaged)
  • Keep (iterating and checking?)
  • Talk about enhancement of services (competition?)
  • Kickbacks, and feedback loops (look at how to make the most from what you have?)
  • Need to be able to demonstrate (keep the focus and the make the scope small)
  • Social – embedding your knowledge (into the LOD?)
  • Embed LOD in the tools people are already using
  • Attach LOD and allow it to emerge by stealth (trickery)
  • We need to consolidate stories for each to use (write these up)
  • Use the design pattern library

Notes from Normalizing Licensing (and Data) Models

This was originally Normalizing Licensing and Data Models, but we decided that was too much to take on in one session. We had about 15 participants. I did my best to lead this session though was admittedly a bit exhausted! And now I’ve let too much time go by before getting my notes in here.

I started by describing some of the work we’re doing at Historypin to create metadata crowdsourcing and annotation tools for the public and in particular cultural heritage institutions. We talked briefly about our current efforts to consider the data models of Europeana and DPLA, as well as Open Annotation, and how we might incorporate some of this in as simple a way as possible, as we don’t want to differentiate between individuals and institutional contributors. I threw out this worksheet for comparing licensing across various platforms and would welcome anyone to add other examples to it (thanks Antoine Isaac for adding a bit to this already).

I think we agreed that we we’ve come a long way from where we were 2 years ago at the last summit, when the 4 star scheme of open licensing of metadata was launched. Jerry Persons talked about Stanford policy and also about the week long workshop they held in July of 2011 recommending CC0 for all bibliographic metadata.

We talked a bit about international issues of copyright and licensing, with Chris of Digital New Zealand weighing in with the very good point that CC0 is not an option in New Zealand, or at least not respected by New Zealand law. Romain from French National Library echoed this issue for France.

Romain also talked about the difference between what is copyrightable at all, and that courts in France have tested the difference between non-intellectual or creative content vs fact, which we agreed there is international precedent for, and I pointed out that we (at Historypin) are following the lead of the DPLA on this front.

From here we ventured a bit into creating and encouraging a culture of sharing in which institutions/individuals that share with open licensing could get some recognition, as well as some potential centralized site for tracking changes. We discussed the Cooper Hewitt release on to Github, though it was pointed out that Github was putting a 15mb limit on files. The OpenGLAM Data Hub could be a great shared source for us to list content. We talked about the importance/potential about combining forces across GLAMs internationally and agreeing that this would be a good place to share and as importantly, to show uses of and improvements to metadata.

We touched briefly on burnout on behalf of content providers that work very hard to release datasets and then not have anyone use them, or not know about reuses of the datasets, so encouraging this kind of community and circling back is critical.

I’m sure I missed a ton, please feel free to make additions/corrections/etc in the comments or in the notes doc directly.

Notes on the World War 1 Session

The World War 1 session ran a little over time and spilled out over lunch outside with a lot of talk about the war, literature and linking across data sets. I’ve copied here the people who listed their information on the sheet.

Country Project Org Contact URL
EU Australia War Literature Europeana WWI “Isaac, A.H.J.C.A.” http://www.europeana1914-1918.eu/
http://www.europeana-collections-1914-1918.eu/

UK Trenches to Triples King’s College Geoffrey Browell http://openmetadatapathway.blogspot.co.uk/
http://www.jiscww1discovery.net/

Australia Australia War Literature http://www.austlit.edu.au/
Canada Out of the Trenches / Au-Delà des tranchéss Pan Canadian Documentary Heritage Pat Riva http://www.ghamari.net:8080/canada/
New Zealand Remembering WW1 ? ? ww100.govt.nz
UK Open Metadata gateway King College London Archives Geoff Browell
Finland / US WW1LOD Project Semantic Computing Research Group / Aalto Thea Lindquist, Hyvönen Eero et al. http://purl.org/ww1lod
France Awesome rdf-enabled online library French National Library Romain Wenz data.bnr.fr
Canada Muninn WW1 Project Rob Warren rdf.muninn-project.org/sparql

Where do we go from here?

Suggest that you look at the lodlam group and signup to the ww1-lod mailing lists. We have had some very good talk about integrative over GIS information and integrating data over multiple sparql servers.

Keep in touch and keep doing great work!