Discussion: #47
Accepted
Use SPDX license URIs to unambiguously specify the license for data and metadata use.
Link a Dataset to its license to document legal constraints by adding a schema:license property. The guide recommends providing a URL that unambiguously identifies a specific version of the license used, but for many licenses it is hard to determine what that URL should be. Thus, we recommend that the license URL be drawn from the SPDX license list, which provides a curated list of licenses and their properties that is well maintained. For each SPDX entry, SPDX provides a canonical URL for the license (e.g., http://spdx.org/licenses/CC0-1.0
), a unique licenseId
(e.g., CC0-1.0
), and other metadata about the license. Here's an example using the SPDX license URI for the Creative Commons CC-0 license:
{ "@context": { "@vocab": "https://schema.org/", }, "@id": "http://www.sample-data-repository.org/dataset/123", "@type": "Dataset", "name": "Removal of organic carbon by natural bacterioplankton communities as a function of pCO2 from laboratory experiments between 2012 and 2016", "license": "http://spdx.org/licenses/CC0-1.0" ... }
While many licenses are ambiguous about the license URI for the license, the Creative Commons licenses are an exception in that they provide consistent URIs for each license, and these are in widespread use. While we recommend using the SPDX URI, it is acceptable to use the CC license URIs directly if preferred. Here's an example using the traditional CC URI for the license.
{ "@context": { "@vocab": "https://schema.org/", }, "@id": "http://www.sample-data-repository.org/dataset/123", "@type": "Dataset", "name": "Removal of organic carbon by natural bacterioplankton communities as a function of pCO2 from laboratory experiments between 2012 and 2016", "license": "https://creativecommons.org/publicdomain/zero/1.0" ... }
One issue is that SPDX URIs currently resolve to a HTML landing page describing a license, rather than a machine-readable version of the license through content negotiation. However, the web page that is returned does contain structured markup in RDFa format indicating the structured license properties. For example, the HTML page for the Apache-2.0 license contains property attributes for structured data for spdx:License
, spdx:deprecated
, spdx:name
, spdx:licenseId
, rdfs:seeAlso
, spdx:isOsiApproved
, and spdx:licenseText
, among others. For example, here is the web snippet for the Apache-2.0 license:
<h1 property="dc:title">Apache License 2.0</h1> <div style="display:none;"><code property="spdx:deprecated">false</code></div> <h2>Full name</h2> <p style="margin-left: 20px;"><code property="spdx:name">Apache License 2.0</code></p> <h2>Short identifier</h2> <p style="margin-left: 20px;"><code property="spdx:licenseId">Apache-2.0</code></p> <h2>Other web pages for this license</h2> <div style="margin-left: 20px;"> <ul> <li><a href="http://www.apache.org/licenses/LICENSE-2.0" rel="rdfs:seeAlso">http://www.apache.org/licenses/LICENSE-2.0</a></li> <li><a href="https://opensource.org/licenses/Apache-2.0" rel="rdfs:seeAlso">https://opensource.org/licenses/Apache-2.0</a></li> </ul> </div> <div property="spdx:isOsiApproved" style="display: none;">true</div> <h2 id="notes">Notes</h2> <p style="margin-left: 20px;">This license was released January 2004</p> <h2 id="licenseText">Text</h2> <div property="spdx:licenseText" class="license-text"> <div class="optional-license-text"> <p>Apache License <br /> Version 2.0, January 2004 <br /> ...
Finally, the SPDX project provides structured data files of the SPDX license data in machine readable formats, including turtle and json-ld. These could be imported into COR or other vocabulary servers to provide a queryable graph of the license data.
- We gain a comprehensive, maintained, unambiguous vocabulary for licenses, increasing consistency across repositories
- We gain compatibility with the software packaging world like Debian and Python
- Licenses that have well-known URIs (e.g., Creative Commons) may be less recognizable by their SPDX URI
- SPDX license URIs only resolve to HTML pages with machine-readable RDFa embedded, but machine-readable representations in other formats do not seem to be available through content negotiation