Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

License is not being detected #3997

Open
cgi-ricardo opened this issue Nov 22, 2024 · 1 comment
Open

License is not being detected #3997

cgi-ricardo opened this issue Nov 22, 2024 · 1 comment
Labels

Comments

@cgi-ricardo
Copy link

Describe the bug
Running scancode for the following github package (https://github.com/stleary/JSON-java/tree/20230227), it doesn't detect the license inside the pom.xml file (https://github.com/stleary/JSON-java/blob/20230227/pom.xml)

System configuration

  • Which version of ScanCode.io are you running? 32.2.1
  • Are you running the app using Docker? yes

Expected behavior
Expect to detect the Public Domain license.

@AyanSinhaMahapatra
Copy link
Member

@cgi-ricardo thanks for the report, this is a bug indeed.

Note that we are able to detect the license correctly in the context of a package, where we extract license statements in the specific context of the package manifest and then scan that statement for licenses.

Screenshot from 2024-11-22 18-04-47

But we also scan the package manfiest files (here the pom.xml) with a license scanner without using the context, as a whole file, and there we are getting false positives:

    "license_clues": [
      {
        "license_expression": "cc0-1.0",
        "license_expression_spdx": "CC0-1.0",
        "from_file": "JSON-java-20230227/pom.xml",
        "start_line": 36,
        "end_line": 42,
        "matcher": "3-seq",
        "score": 59.09,
        "matched_length": 13,
        "match_coverage": 59.09,
        "rule_relevance": 100,
        "rule_identifier": "cc0-1.0_200.RULE",
        "rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/cc0-1.0_200.RULE",
        "matched_text": "    <licenses>\n        <license>\n            <name>Public Domain</name>\n            <url>https://github.com/stleary/JSON-java/blob/master/LICENSE</url>\n            <distribution>repo</distribution>\n        </license>\n    </licenses>",
        "matched_text_diagnostics": "licenses>\n        <license>\n            <name>Public Domain</name>\n            <url>[https]://[github].[com]/[stleary]/[JSON]-[java]/[blob]/[master]/[LICENSE]</url>\n            <distribution>repo</distribution>\n        </license>\n    </licenses>"
      }
    ],

Note that we are also detecting that there is something wrong with the match and hence considering this as a clue instead of a detection, and so not reporting this in the resource license_expression too.

But we can do better:

  1. Add specific rules to detect this as a public-domain license
  2. Do more to combine file and package license detections to remove ambiguity in these cases.

This detection issue is actually present in scancode-toolkit, so moving it there. Also attaching the scancode-toolkit scan results.

JSON-java.json

@AyanSinhaMahapatra AyanSinhaMahapatra transferred this issue from aboutcode-org/scancode.io Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants