Dealing with user-contributed LOD: issues, opportunities and applications

Empowering users–other researchers, citizen scholars, or the “crowd”–to annotate or create data is a low-cost way to expand datasets. There are countless applications for this ranging from transcription, translation, and metadata and OCR correction to the generation of a wide array of user-contributed content.

crowsourcing vs open source
image by: opensource.com

There are quite disparate ways of engaging users in different types of content creation, however, and a number of concerns exist:

Quality control: Are there effective workflows for evaluating user-contributed content? Can we crowd-source quality control? How do we even keep spam out of our crowdsourcing tools?

Ethics: what are the ethical considerations associated with sourcing free labour in this way? What kind of disclosure regarding data use and reuse is required? What kind of credit is appropriate? Are there risks related to exposure and recirculation of data beyond the context of contribution?

Management: What are examples of effective interfaces? What workflows are required? How should provenance be tracked and made evident?

I’m hoping we can talk about the practicalities of this: use cases; examples of projects engaging citizen scholars or the public in LOD data production; and tools and interfaces for managing the process.

Some sites to consider:

Transcribe Bentham

Old Weather

Linked Jazz 52nd Street

HistoryPin

Zooniverse

What others are relevant? What are the best models for certain kinds of user contributions?

Cross-domain browsable Linked Data

More and more LAM institutions are publishing their collection data as linked data. It is common practice to link to well known entitities for places, periods, people, concepts. This creates interesting opportunities for integrating data from different sources and build relevant cross-domain services for users. But to do so it still seems necessary to aggregate link data dumps into central repositories.

For the Dutch Digital Heritage Network we are investigating the possibilities for building cross-domain users services by registering only backlinks to the source data instead of aggregating the full datasets. Using lightweight protocols (f.e. Linked Data Fragments, Webmention, Linked Data Notifications) we hope to realize a distributed digital heritage network where the major part of the data lives as linked data at the institutions. By registering backlinks instead of aggregating linked data we think true browsable linked data should be possible. I look forward to exchange ideas with you about this approach.

Read our whitepaper for a high-level overview of our approach.

Sharing strategies for ontology alignment and data sharing

Data sharing, aligning ontology, seamless data sharing across data silos: these are the promises of the semantic web. To date, we still do not have an operational theory of using linked open data tools for practical linking beyond abusing owl:sameAs statements (Halpin et al).

Are people really reusing other people’s ontologies and vocabularies or (re)creating their own? Do top-level ontologies really give you anything or are they table-top toys that do not scale to describing real world problems?

A review of the LOV website clearly shows that beyond core W3C vocabularies, little linkages are being made across vocabularies. Yet we know that this and more will be required if we expect data to scale. Ontologies are being implemented with OWL but is anyone actually using reasoning in production or are their ontologies just simple annotation databases?

This session is meant to swap stories about implementing vocabularies and sharing data in the field. Bring your coffee and tea and share your stories around the virtual triples campfire.

Halpin, Harry, et al. “When owl:sameAs isn’t the same: An analysis of identity in linked data.” The Semantic Web–ISWC 2010 (2010): 305-320.

Using schema.org for simple LOD representations within ORCID

ORCID would like to improve their LOD implementation and at the same time move to a technology that is easier for their core tech team to maintain.  We would like to propose a session that will examine the pros and cons of using a schema.org and JSON-LD based approach to representing researchers and their activities in the ORCID registry . A set of straw man representations will form the basis of a discussion on the various ways this could be achieved.  We’re hoping that this session will help iron out the issues and result in something that is more useful to the community.

The existing ORCID RDF implementation was generously donated to our open source stack by Stian Soiland-Reyes and contains biographic information.  However, it does not reference other entities and as the technologies used are not within our core competencies they are difficult for us to maintain and extend.  We are hoping it’s possible to produce an enhanced schema.org representation that can be embedded within the registry pages and utilised by a more diverse set of users across multiple use cases.  Examples include being consumed by simple web applications and search engines etc as well as being navigated by automated LOD agents.

The recent schema.org implementation from the Datacite DOI registrar will form a point of reference and examined to ensure that crosswalking between the two systems is as simple as possible.  Outcomes from other LODLAM sessions discussing schema.org will also be included to ensure we’re on the same page.

This discussion will be fed back to the ORCID tech team for consideration.

Cool Tools

The richness of Linked Open Data has yet to be exploited and the question is what tools can assist with consumption besides faceted search and basic graph visualizations? As we moved to a “create more triples” market to one of processing and proving the value of LOD for consumption, how can we maximize our benefits of the technology to do real work?

For instance, the Auckland Museum has implemented a tabletop interface for exploring its linked collection data to promote a unique interactive museum experience and engage the public in data curation (Click on the image for a youtube video).

We’re interested in similarly innovative tools and concepts for mobilizing linked data and the session would try and answer the following questions:

  • What tools are robust and usable outside their home environment?
  • What examples are there of natural language tools for SPARQL queries?
  • What tools work with haptic or VR interfaces?
  • How can we give users rich prospect on ontologies and datasets so that their scope, strengths, and gaps are made more apparent to users?
  • How can we really bring LOD alive?

We’re interested in seeing what is out there and brainstorming what is needed.

Bring your tools for informal demos, sketches or wireframes and get the creative juices flowing!

Look forward to seeing you all there!

Exploring persistent identifiers for academic institutions, publishers, funders and more. Can LOD help?

Persistent organisation identifiers continue to be a much needed but ‘not-quite-there’ piece of identifier infrastructure.  They are required at many stages of the research production to credit institutions for their contributions, including funding, authorship and publishing.

There are multiple identifier providers with differing representations and coverage.  Some but not all of these providers link to other identifiers, but all do so in different and proprietary manner making mapping between them difficult. In addition, the identified organisations themselves need to be able to assert information about themselves without needing to manage multiple relationships.  This results in a very complex landscape that makes organisation identifiers difficult to utilise to their full potential.

In this session we will discuss how organisation identifiers and their metadata could be represented using LOD by the organisations themselves.  We will also consider if LOD can help link the disparate organisation identifier providers so that they can be crosswalked and used interchangeably.   If a solution to these problems already exists then we will examine the barriers that have prevented its adoption and how they might be overcome.

Background reading and current work on organisation identifiers can be found through the Organisation Identifier Working Group.  The group was formed in early 2017 to refine the structure, principles, and technology specifications for an open, independent, non-profit organization identifier registry to facilitate the disambiguation of researcher affiliations.

Using LOD to integrate 3D models into the cultural heritage cloud

In recent years, the lowered cost of 3D capture (photogrammetric software, processing power, availability of drones, etc.) has led to an explosion of 3D models within the cultural heritage sector. These models range from artifacts to architecture to archaeological excavations (trenches and full sites), and are produced by GLAM organizations themselves or often by individuals visiting museums. Some content producers rely on commercial entities such as Sketchfab for publication, and others are attempting to build 3D dissemination into their institutional repositories. I have been critical of these efforts, not because I disagree with their value, but because so many entities have moved forward with mass production before considering long-term preservation and access. Presently, 3D data integration into the wider CH cloud suffers from the following:

  • No agreed-upon standard for the model and texture files themselves (obj is the closest thing)
  • No standard for technical metadata
  • No standard for annotation of features in three dimensions
  • No standard APIs to rely on for getting the files or analyzing them in some capacity

I have begun to experiment with a proof of concept integration of 3D models from Sketchfab into a few Linked Open Data projects, namely extending the Nomisma.org data model with a proposed Europeana Data Model extension for 3D that was presented at ALA. As part of the proof of concept, two models of coins were incorporated into Online Coins of the Roman Empire (see http://numismatics.org/ocre/id/ric.4.sa.455 for example), and you can read more about the entire process here.

I myself am not a content producer, but a middleman developer attempting to build a bridge between content producers and the scholars (and general public) that will ultimately make use of these materials. A scholar will not go to every possible Sketchfab profile or institutional repository to dig around for 3D models of relevant artifacts–they should expect to find them in portals for specific materials (like Roman imperial coins or Greek pottery) or through broader aggregations like Europeana and DPLA.

Following the success of the IIIF spec and the community behind it, I hope that we can take some time at LODLAM to discuss laying the foundation for a similar community that might bring order to the chaos of 3D cultural heritage models.

Linked Open Data – Open for Discovery?

The ‘Open’ in Linked Open Data has been key to its successful spread as a data resource enabler, for those wishing to build upon and extend the value of others for the benefit of their projects in particular, and the whole web in general – 1,130+ datasets referenced in the Linking Open Data Cloud providing evidence of this success.

Openly licensed – Openly made available for access – Openly described with Open ontologies – often Open for query.  To those within the LOD & LODLAM communities this all makes perfect sense, but what about those in the wider web world who do not share our understanding and enthusiasm?

As a result of encouragement from the major search engines an others, 10s of millions of websites have deployed open structured data, based upon linked data principles, on billions of pages, using generic vocabularies. Their objective being to increase the discoverability of the resources those pages describe.

Based upon the assumption that one of the reasons for describing LAM resources using LOD is to help people find them; what should we do?

  • Nothing – let the search engines worry about discovery
  • Share structured data on our resource web pages
    1. Using our LOD ontologies
    2. Using Schema.org
    3. Using a mixture of both
  • Use generic vocabularies such as Schema.org instead of our domain specific ontologies

This session would provide an opportunity to explore and discuss these issues.

Schema.org for Archives

I would like to propose a session to review and update an initiative that came out of the LODLAM Summit 2015 in Sydney.

Interest was expressed in Sydney in exploring how data, about resources held within archives, could be widely shared on the web, to aid discovery, using the Schema.org vocabulary.  The intention being to emulate the successful efforts in the bibliographic domain by the Schema Bib Extend W3C Community Group that resulted n the bib.schema.org extension to the main schema.org vocabulary and its supporting documentation.

A W3C Community Group – Schema Architypes – was set up,  chaired by myself.  Recent activity in the group has centred around a ‘straw man’ proposal for a small number of new terms to extend Schema.org that would enable the description of archive holding organisations, archives/fonds, and the resources they contain.

The session would provide the opportunity to review the proposal, the intention that underpins it, and contribute to and engage with the discussion to move it forward.

Ontology for preservation metadata

A working group of the PREMIS Editorial Committee is revising version 1 of the PREMIS OWL ontology, which is based on the PREMIS Data Dictionary for Preservation Metadata version 2.2. The new ontology reflects PREMIS version 3, which was a major revision of the Data Dictionary and it is a substantial remodeling. The goal has been to reflect current Linked Data best practices and reuse other well-known ontologies where possible. There have been several requests to be able to incorporate it into Fedora and Hydra.

We would like to present the ontology in its current state (it is almost completed so that it can go out as a draft for community review) and show some of our modeling choices to get feedback from LOD experts. One area that has resulted in a lot of discussion is Rights information, especially in trying to bring in what seemed to be the most appropriate rights ontology, ODRL, and the relationship between the PREMIS ontology and rightsstatements.org. Another area in which we’d like to get feedback is the relationship between the ontology and the preservation controlled vocabularies at  http://id.loc.gov/preservationdescriptions/ and specifically the practice of reusing controlled vocabulary terms as subclasses or subproperties of the ontology.

Rebecca Guenther and Angela DiIorio