Proposed: a 4-star classification-scheme for linked open cultural metadata

One of the outcomes of last week’s LOD-LAM Summit was a draft document proposing a new way to assess the openness/usefulness of linked data for the LAM community. This is a work in progress, but is already provoking interesting debate on our options as we try to create a shared strategy. Here’s what the document looks like today, and we welcome your comments, questions and feedback as we work towards version 1.0.

*******************************************************************

DRAFT

A 4 star classification-scheme for linked open cultural metadata

Publishing openly licensed data on the Web and contributing to the Linked Open Data ecosystem can have a number of benefits for libraries, archives and museums.

Driving users to your online content (e.g., by improved search engine optimization);
Enabling new scholarship that can only be done with open data;
Allowing the creation of new services for discovery;
Stimulating collaboration in the library, archives and museums world and beyond.

In order to achieve these benefits libraries, museums and archives are faced with decisions about releasing their metadata under various open terms. To be open and useful as linked data requires deliberate design choices and systems must be built from the beginning with openness and utility in mind. To be useful for third parties, all metadata made available online must be published under a clear rights statement.

This 4-star classification system arranges those rights statements (e.g. licenses or waivers) that comply with the relevant conditions (2-11) of the open knowledge definition (version 1.1) by order of openness and usefulness: the more stars the more open and easier the metadata is to used in a linked data context. Libraries, archives and museums wanting to contribute to the Linked Open Data ecosystem should strive to make their metadata available under the most open instrument that they are comfortable with that maximizes the data’s usefulness to the community..

Note: This system assumes that libraries, archives and museums have the required rights over the metadata to make it available under the waivers and licenses listed below. If the metadata you want to make available includes external data (for example vocabularies) you may be constrained by contract or copyright to release the data under one of the licenses below.

★★★★ Public Domain (CC0 / ODC PDDL / Public Domain Mark)

as a user:

metadata can be used by anyone for any purpose
permission to use the metadata is not contingent on anything
metadata can be combined with any other metadata set (including closed metadata sets)

as a provider:

you are waiving all rights over your metadata so it can be most easily reused
you can specify whether and how you would like acknowledgement (attribution or citation, and by what mechanism) from users of your metadata, but it will not be legally binding

This option is considered best since it requires the least action by the user to reuse the data, and to link or integrate the data with other data. It supports the creation of new services by both non-commercial and commercial parties (e.g. search engines), encourages innovation, and maximizes the value of the library, archive or museum’s investment in creating the metadata.

★★★ Attribution License (CC-BY / ODC-BY) when the licensor considers linkbacks to meet the attribution requirement

as a user:

metadata can be used by anyone for any purpose
permission to use the metadata is contingent on providing attribution by linkback to the data source
metadata can be combined with any other metadata set, including closed metadata sets, as long as the attribution link is retained

as a provider:

you get attribution whenever your data is used

This option meets the definition of openness, but constrains the user of the data by requiring them to provide attribution (in the legal sense, which is not the same as citation in the scholarly sense). Here, attribution is satisfied by a simple, standard Web mechanism from the new data product or service. By using standard practice such as a linkback, attribution is satisfied without requiring the user to discover which attribution method is required and how to implement it for each dataset reused. Note that there are other methods of satisfying a legal attribution requirement (see below) but here we propose a specific mechanism that would minimize the effort needed to use the data if the LAM community collectively agrees to it. Also note that even this simple (ideally shared) attribution method could prevent some applications of linked data if linkbacks are required by many datasets from many sources.

★★ Attribution License (CC-BY / ODC-BY) with another form of attribution

as a user:

metadata can be used by anyone for any purpose
permission to use the metadata is contingent on providing attribution in a way specified by the provider
metadata can be combined with any other metadata set (including closed metadata sets)

as a data provider:

you get attribution whenever your data is used by the method you specify

This option meets the definition of openness in the same way as the linkback attribution open, but requires the user to provide attribution is some way other than a linkback, as specified by the data provider. The provider could specify an equally simple mechanism (e.g. by retention of another field, such as ‘creator’ from the original metadata record) or by a more complex mechanism (e.g. a scholarly citation in a Web page connected to the new data product or service). The disadvantage of this option is that the user must discover what mechanism is wanted by the particular data provider and how to comply with it, potentially needing a different mechanism for each dataset reused. For large-scale open data integration (e.g. mashups) this option is difficult to implement.

★ Attribution Share-Alike License (CC-BY-SA/ODC-ODbL)

as a user:

metadata can be used by anyone for any purpose

permission to use the metadata is contingent on providing attribution in a way specified by the provider
metadata can only be combined with data that allows re-distributions under the terms of this license

as a provider:

you get attribution whenever your data is used
you only allow use of your data by entities that also make make their data available for open reuse under exactly the same license

This option meets the definition of openness but potentially limits reuse of data since if more than one dataset is reused and if each dataset has an associated Share-Alike license. Under an Share-Alike license, the only way to legally combine two datasets is if they share exactly the same SA license, since most SA licenses require that reused data be redistributed under exactly same license. If the source datasets had different Share-Alike licenses originally (e.g. CC-BY-SA and ODC-ODbl) then there is no way for the user to comply with the requirements of both source data licenses so this option only allows users to link or integrate data distributed under one particular SA license (or one SA license and any of the other license or waiver options above). In the LAM domain, where significant value is created by combining datasets, the Share-Alike license requirement severely reduces the utility of a dataset.

Related Material

Linked Open Data star scheme by example

35 thoughts on “Proposed: a 4-star classification-scheme for linked open cultural metadata”

Ed Summers says:

June 7, 2011 at 5:06 am

This is a *very* useful outcome from LOD-LAM. I wonder though about the focus on “metadata”. For example, lets say an Archive makes a collection of historic photographs availble on the web that are in the public domain. The Archive publish some feeds with embedded metadata, which reference some high resolution images for download, as well as the splash page for the photographs. Wouldn’t a user of the data want to know about their ability to reuse the photographs themselves, and not just the metadata (the feed). Or do you see the digital photographs as metadata as well. Wouldn’t it be easier to just talk about “data” and avoid this perennial question? 🙂

Reply
Jane Stevenson says:

June 7, 2011 at 5:24 am

Hi Ed,
But you have to distinguish between the two. Most archives want the metadata to be open, but there are questions around the digital content. Surely if you just talk about data you end up treating the description and the object as the same type of thing?

Reply
Ed Summers says:

June 7, 2011 at 7:46 am

Hi Jane. Yes, the terms of a license should be clear about what (e.g archival descriptions vs the objects being described) is being made available under the license. I just am not sure these principles should be limited right out of the gate to just descriptive metadata. But perhaps that’s a scoping decision you all made? I’m certainly no expert in this area…

Reply
1. Paul Keller says:
  
  June 7, 2011 at 8:18 am
  
  hi Ed,
  this is indeed a conscious decision that we made when drafting the scheme. this is supposed to be about descriptive metadata only. the reason for this is relatively simple. by default cultural heritage institutions do not have the rights to the content that they have in their collection. in many cases the rights to the content rests with the creators or other rights holders. if this is the case the institutions cannot freely chose which license to apply.
  
  with regards to metadata the institutions usually have the rights to the metadata (because there is no copyright in factual data and/or because they will have produced the metadata themselves). this means that with regard to the metadata the organisations are free to choose whatever licensing terms they want to apply.
  
  Given the above it makes sense to separate these two discussions and that is why we have limited this to metadata. best, paul
  
  Reply
  1. Ed Summers says:
    
    June 7, 2011 at 8:44 am
    
    Thanks for the clarification Paul. I understand what you are saying, but want you to imagine what Wikipedia would be like if there were no images, or if there were not clear rules about how to use the images that are there… Shouldn’t this be a real model for our licensing concerns going forward, instead of a stunted one that only talks about our metadata?
    
    Reply
    1. Paul Keller says:
      
      June 7, 2011 at 8:50 am
      
      yes of course we should also care about the rights to the content, but these are two completely separate ‘battles’. with regards to metadata it is about convincing cultural heritage institutions to do the right thing (which they can do by themselves) with regards to content you are having an extra stakeholder (the authors) and get all the complexities of rights clearance (orphan works, etc) on top of this. i am working on both issues for a couple of years now, and have found that it makes things much easier to keep them separated. if you don’t you will introduce complexities from the content site that do not exist on the metadata level into the discussion about metadata and at best that creates extra confusion but usually it leads to a stand still…
      
      Reply
      1. Ed Summers says:
        
        June 7, 2011 at 9:02 am
        
        Fair enough. Thanks for the additional context!
Kristin Eschenfelder says:

June 7, 2011 at 9:19 am

I’d like to vouch for Paul’s comment – I initially pushed for discussion of open content/content rights issues discussion at the workshop; but given very limited time, we decided just to focus on the metadata half of the issue.

But, the four star scheme is a nice starting point for continuing a conversation about content.

Reply
Adrian Pohl says:

June 13, 2011 at 4:33 pm

I really like this simple four star scheme. But I still have a problem with it that I already brought up at the LOD-LAM summit: The text is about openly licensing metadata but doesn’t answer the question what “licensing metadata” actually means.

Does “licensing metadata” mean adding license information to single metadata records or does it mean attaching a license to a collection of metadata records, i.e. a catalog?

The underlying problem (at least in the EU) and the reason for my insistence on this question is that if someone attaches an open license to every individual record in a collection, the collection as a whole as well as significant parts of it could nonetheless be subject to intellectual property rights. That is because a dataset and the seperate items which constitute the dataset are legally distinct entities and record licenses won’t automatically also apply to the collection level. The result of only openly licensing individual records is open records but closed collections: Not much is gained.

Thus, I advocate to clarify in the document that when we talk about “licensing metadata” we talk about licensing on the collection level (as well as about licensing individual records which might be copyrighted).

Reply
Paul Keller says:

June 13, 2011 at 11:50 pm

@adrian: let me try to explain why i think that your proposal does not make sense in this particular context (Linked Open Data/Libraries, Archives & Museums) and why i think it is harmful.

collection is a completely arbitrary term. apart from the curator of the collection nobody really knows for sure what belongs to a collection and what does not. also data records can be part of multiple collections. finally there is no guarantee that a collecion maps one-on-one on a particular database that in the europeana context might be vested with sui-generis database rights.

secondly – and more importantly – i do not think we are making a point here that LAM’s should publish and license entire collections. of course that would be great, but there are many scenarios where publishing a part of a collection as LOD will make a lot of sense or where omitting certain fields from the descriptive metadata that is published as LOD makes a lot of sense for the organisations (think of internal data, data that is licensed from external sources or data that is of insufficient quality or such commercial value that they are uncomfortable with publishing it openly). With your approach we would be basically signaling them that this kind of behavior is unwanted, and i do not think that will lead to a productive working relationship with LAMs.

In short we really have to work with LAMs in making this happen and that means providing them with a certain amount of flexibility with regards to what they want to publish. Given this i think the meaning of ‘licensing metadata’ in this context is nothing more (or less) than ‘applying clear open rights statements to what you are publishing’ (that can be individual records or fields in some cases and entire collections and databases in others).

Reply
1. Adrian Pohl says:
  
  June 14, 2011 at 12:37 am
  
  @Paul It’s probably a good thing that we have this discussion again, this time publicly in the written medium.
  
  1. “collection is a completely arbitrary term…”
  Obviously, at least I have the impression that we both understand this term differently. For me a collection is an aggregation of individual items and it doesn’t imply being a _complete_ collection of a memory institution. We might also use the more concrete term “database” which may be the data as a whole stored in an actual database, triple store or named graph.
  
  2. To be clear: I am not making a point that LAMs should _publish_ entire datasets but attach the license to entire datasets! We already conflated these things at the summit. You can license a database as a whole without publishing it as a database dump.
  
  3. I don’t want and never wanted the document to state that only publishing parts of one collection is a bad thing. When you write “but there are many scenarios where publishing a part of a collection as LOD will make a lot of sense or where omitting certain fields from the descriptive metadata that is published as LOD makes a lot of sense for the organisations” I completely agree with you. (There certainly always is an information loss if data is converted to RDF which means publishing LOD is always about publishing only parts of a dataset.) Everybody should be free to publish only part of a collection, but I would like the 4 star scheme to make clear that the published sub-collection (which also is a collection of records) be openly licensed and not only individual records. It is not correct when you write, with my “approach we would be basically signaling them that this kind of behavior is unwanted”. I think this approach provides a certain amount of flexibility while making clear what open licensing is about in the first place.
  
  Reply
Paul Keller says:

June 14, 2011 at 5:01 am

@adrian:

@1: i agree just be aware that for many people in the LAM community the term collection has a very specific meaning namely their collection of artefacts (stuffed penguins, 17 century manuscripts, whatever) and you do not want them to get the impression that they either have to publish/license the complete collection or nothing. with regard to the suggestion to use database i do not agree at all. we do not want to imply that you need to publish license a complete database. it is perfectly fine to just publish/license parts of a database. how much is really to be decided of the metadata providers and not by us.

@2: that does not make sense. if you do not publish something there is absolutely no point in licensing it. licenses only make sense with published material. also in many cases they will not be able to license full datasets openly (for example if some fields contain external data that is licenses under different terms)

@3: i think we are saying the same here. the data that is published is the data that must be licensed.

Reply
1. Adrian Pohl says:
  
  June 15, 2011 at 4:56 am
  
  @2: I will try to make this thought clearer. You can publish a dataset in different ways:
  a) make every single part of the dataset accessible on the web, but only give access to small parts of the dataset at a time by providing a search interface, an API, a SPARQL endpoint or whatsoever
  b) publish a dump of the whole dataset.
  
  I referenced in my above comment option a) where you publish a dataset as a whole but don’t make it accessible as a whole. I think it is not only in case b) crucial to license the whole dataset but also in case a), as someone may – over time – query the database and may use or reuse significant parts or the whole database.
  
  Reply
  1. Paul Keller says:
    
    June 15, 2011 at 5:22 am
    
    i do not think that this discussion falls within the scope of this scheme. this scheme simply says if you publish something (regardless of *how* you publish it). license x is better than license y. nothing more nothing less..
    
    Reply
Pingback: 4 Stars for Metadata: an Open Ranking System for Library, Archive, and Museum Collection Metadata - Creative Commons
PatrickD says:

June 19, 2011 at 1:40 pm

@Paul But there is a danger if Adrians point is not clarified in the document. Libraries, Archives and Museums can get the impression that it is possible to license single data-sets (which are just facts) At minimum in Europe there is no copyright for facts and so it will from point of the open data movement cause harm if the LAM community gets the impression it will be possible to license single datasets which are part of the public domain. A license can be only added to a database (or an important part of it)

Reply
1. MacKenzie Smith says:
  
  June 20, 2011 at 4:02 pm
  
  another point we debated was that *sometimes* metadata isn’t just factual, or is a mix of fact and creativity (e.g. archival finding aids, museum exhibit catalogs, analytics in bibliographic records). So *parts* of the dataset are copyrightable. the 2-4 options cover that case, if the data owner is unwilling to go 100% public domain.
  
  Reply
  1. PatrickD says:
    
    June 21, 2011 at 12:06 am
    
    In this very few cases should be also clear which part is protected and which part is not.
    
    Reply
Paul Keller says:

June 20, 2011 at 12:10 am

@patrickd: yes they can get this impression, but they can get this impression pretty much anywhere. i do not think that it makes sense to work with the assumption that LAMs havebad intentions here, for me it is more about helping them doing the right thing.

also it is important that datasets that are published carry a clear rights statement, so that re-users have clear guidelines with regard to what they can or cannot do with then. in practical terms this is probably much more important than the questions if something is a non-copyrightable fact or a copyrightable expression of some fact (this is clear in any cases, but there are a lot of cases where this is not a distinction that is easy to make). Finally in the case where someone ‘licenses’ facts the licenses mentioned in the scheme simply do not stick. they only work if there is some right that they can attach to, so in the case you mention their restrictions a not applicable anyway (but the license statement provides you with the added bonus that you have an idea what the licensor would like you to do, which you can still follow out of goodwill)

Reply
1. PatrickD says:
  
  June 21, 2011 at 12:04 am
  
  Hmmm when I think what OCLC tried a year before (but a license to every record by contract) I’m not so sure that LOM will always do the best think. Specially when it comes to the point of openness. E.g. there is BIG discussion if classification is a creative act, i totally doubt that. Only if metadata contains really long text information like a new written abstract there can be a copyright. I still think there should be a one or two sentence clarification where licenses should/can be used. Else it can end up in discussion like in the digitalization sector where still libraries and museums say digitalization is a creative act.
  
  Reply
  1. Paul Keller says:
    
    June 21, 2011 at 12:22 am
    
    @patrick: this is really not something this scheme is supposed to adress. you rais a couple of valid points (which i totally agree with) but this scheme is NOT about defining what is original and what is not. the main (only?) purpose of this scheme is to help organisations that have come to the conclusion that they want to publish (some of) their metadata as linked open data to chose a license or waiver. What we want to do here is to encourage them to publish under truely open licenses not lecture them about copyright
    
    Reply
PatrickD says:

June 21, 2011 at 7:50 am

I just mean this issues should be mentioned in the introduction to the schema in few words, because i wouldn’t like the idea that people call say we have a 2* openness, if they start claiming rights on public domain.

Reply
Peter Hirtle says:

July 4, 2011 at 1:54 am

Sorry to be late coming to this very interesting discussion, but I am confused as to why CC licenses are part of the proposed classification. Most CC licenses are grants from the copyright owner; if there is no copyright, you can’t use most CC licenses. Data doesn’t have a copyright. Are CC BY, CC BY-SA really applicable to your proposed use?

Reply
1. Paul Keller says:
  
  July 4, 2011 at 5:25 am
  
  @peter: first of all you are right to note that most cc-licenses only apply to copyright. when a cc-license is applied to something that is not copyrighted the license does not stick (so the restrictions do not apply to the work) but since it is not copyrighted it can be used freely anyway.
  
  secondly, most metadata by cultural heritage organisations will contain at least some copyrightable subject matter (think about descriptions and provenance data) which is why the CC licenses are useful tools here.
  
  you can probably best think about CC licenses as expressions of the intention of the organisation that makes data available. With regards to non-copyrighted parts of the data they are not more than that, with regard to the copyrighted parts they are binding licenses.
  
  Reply
2. Adrian Pohl says:
  
  July 4, 2011 at 6:26 am
  
  Peter, I agree with what Paul writes but find it important to point out the problematic situation in the European context.
  
  In Europe, we have the sui generis database right which isn’t licensed by Creative Commons licenses. Thus, I currently wouldn’t recommend using CC-BY or CC-BY-SA licenses for publishing data and databases in Europe. (In upcoming versions Creative Commons fortunately wants to also include database rights in these licenses.) Only CC0 ist appropriate for licensing data in Europe as it covers not only copyright law but also related and neighbouring rights.
  
  Thus, if you want to license data under an attribution or share-alike license in Europe you should probably stick to the CC0 or the Open Data Commons licenses.
  
  For more discussion about this topic see this thread on the lod-lam mailing list.
  
  Adrian
  
  Reply
Paul Keller says:

July 4, 2011 at 6:33 am

@adrian & @peter: small correction to the last post by adrian. the treatment of the database right differs according to jurisdiction and version. in general the 3.0 CC licenses from European jurisdiction do cover the sui generis database right as well as other related and neighboring rights and they are perfectly good to be used with the types of data we are discussing here. with regards to the database right this rights is waived by the CC licenses, so the restrictions (such as attribution or share alike) only attach to the copyrighted elements of the published data.

Reply
1. Adrian Pohl says:
  
  July 4, 2011 at 6:34 am
  
  Thanks for the correction, Paul.
  
  Reply
Peter Hirtle says:

July 4, 2011 at 7:19 am

Thanks for the explanations. Perhaps you are right that most cultural metadata will include some copyrightable content, but it would still seem safer to me to only recommend the ODC licenses, which are intended to address data content, if you want to enforce a BY or some other limitation on the reuse of the data.

Whether it is a good idea to foster problems with attribution-stacking, etc. is a different issue…

Reply
Marco Streefkerk says:

July 5, 2011 at 4:30 am

Interesting discussion trying to merge two quite different domains (LAM and LOD) but I think essentially for both of them.

I do find myself confused about the stars because LOD is also rated with stars. I don’t see the connection between them. This becomes especially confusing when I follow the link: Linked Open Data star scheme by example. I was expecting to find examples of open to closed licencing but in stead these are examples of different levels of good to best LOD as regards interoperability.

Wouldn’t it be better to use different rating systems or, even better, to align both.

Reply
Pingback: Linked Data and Libraries 2011
Pingback: Infobib » DINI-AG KIM veröffentlicht Open-Data-Empfehlungen
Document Storage Birmingham says:

December 2, 2011 at 8:30 pm

I must agree that this weblog is price all my time spent in studying it. Individuals posting blog should actually exert some effort to educating the readers.This might be the superb weblog for anybody who wishes to search out out about this subject.

Reply
Pingback: What Is Your Data’s Star Rating(s)? | Data Liberate
Pingback: Session Proposal: Linked Open Data | THATCamp Southeast 2012
Pingback: Keynote: ‘From Strings to Things’, LODLAM Melbourne workshop | mia ridge

35 thoughts on “Proposed: a 4-star classification-scheme for linked open cultural metadata”

Leave a Reply Cancel reply