Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MIT license not detected in package.json #3843

Open
vw-anton opened this issue Jul 2, 2024 · 4 comments
Open

MIT license not detected in package.json #3843

vw-anton opened this issue Jul 2, 2024 · 4 comments
Labels

Comments

@vw-anton
Copy link

vw-anton commented Jul 2, 2024

Description

From the following file ScanCode does not extract "MIT" license when running ScanCode without --package option:
https://github.com/components/font-awesome/blob/f4f114c4ab37d101e6a15370769bc0af681792fa/package.json

    scanner:
      name: "ScanCode"
      version: "32.1.0"
      configuration: "--copyright --license --license-text --info --strip-root --timeout\
        \ 600 --json-pp"
    summary:
      start_time: "2024-06-28T10:59:46.000199521Z"
      end_time: "2024-06-28T11:01:51.000822060Z"
      licenses:    
       - license: "CC-BY-4.0"
        location:
          path: "package.json"
          start_line: 10
          end_line: 11     
      - license: "OFL-1.1"
        location:
          path: "package.json"
          start_line: 13
          end_line: 13
        score: 50.0

This is also reflected by the result of scancode.io which reports:

      "path": "codebase/font-awesome-f4f114c4ab37d101e6a15370769bc0af681792fa/package.json",
      "type": "file",
      "name": "package.json",
       
       ...
      
      "detected_license_expression": "cc-by-4.0 AND ofl-1.1",
      "detected_license_expression_spdx": "CC-BY-4.0 AND OFL-1.1",
      "license_detections": [
        {
          "license_expression": "cc-by-4.0 AND ofl-1.1",
          "license_expression_spdx": "CC-BY-4.0 AND OFL-1.1",
          "matches": [
            {
              "license_expression": "cc-by-4.0",
              "spdx_license_expression": "CC-BY-4.0",
              "from_file": "codebase/font-awesome-f4f114c4ab37d101e6a15370769bc0af681792fa/package.json",
              "start_line": 10,
              "end_line": 11,
              "matcher": "2-aho",
              "score": 100.0,
              "matched_length": 5,
              "match_coverage": 100.0,
              "rule_relevance": 100,
              "rule_identifier": "cc-by-4.0_103.RULE",
              "rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/cc-by-4.0_103.RULE",
              "matched_text": "  \"license\": [\n    \"CC-BY-4.0\","
            },
            {
              "license_expression": "ofl-1.1",
              "spdx_license_expression": "OFL-1.1",
              "from_file": "codebase/font-awesome-f4f114c4ab37d101e6a15370769bc0af681792fa/package.json",
              "start_line": 13,
              "end_line": 13,
              "matcher": "2-aho",
              "score": 50.0,
              "matched_length": 3,
              "match_coverage": 100.0,
              "rule_relevance": 50,
              "rule_identifier": "spdx_license_id_ofl-1.1_for_ofl-1.1.RULE",
              "rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/spdx_license_id_ofl-1.1_for_ofl-1.1.RULE",
              "matched_text": "    \"OFL-1.1\""
            }
          ],
          "identifier": "cc_by_4_0_and_ofl_1_1-bbdb0005-3895-360f-06e7-55f139405d2f"
        }
      ],

How To Reproduce

Run ScanCode 32.1.0 via ORT 22.5.0

System configuration

  • What OS are you running on? (Windows/MacOS/Linux)
    Linux
  • What version of scancode-toolkit was used to generate the scan file?
    32.1.0
  • What installation method was used to install/run scancode? (pip/source download/other)
    PIP in ORT
@vw-anton vw-anton added the bug label Jul 2, 2024
@pombredanne
Copy link
Member

@vw-anton I doubt we can detect this correctly at scale in a plain JSON file, without the --package option, especially for MIT. MIT being mit in German is a very common word and not discriminant enough to be detected as-is.

Why not use the --package option? it is designed for this purpose. And we cannot detect correctly treating a package.json as a blob of text IMHO.

Some related issues:

@vw-anton
Copy link
Author

vw-anton commented Jul 3, 2024

We are not using it in ORT due to: https://oss-review-toolkit.slack.com/archives/C9NNJ54B1/p1719903918648839

@pombredanne
Copy link
Member

We are not using it in ORT due to: https://oss-review-toolkit.slack.com/archives/C9NNJ54B1/p1719903918648839

Let me paste this thread here for reference:

Anton (VW)
1 day ago
Morning guys, I have a very strange case of a missing license finding: We ran ScanCode via ORT (22.5.0) on https://github.com/components/font-awesome/blob/f4f114c4ab37d101e6a15370769bc0af681792fa/package.json and would expect three licenses (CC-BY-4.0, MIT, OFL-1.1). However in the ORT result MIT is missing. When I run ScanCode via scancode.io all licenses are found. In ORT and in scancode.io the same ScanCode version (32.1.0) is used. Does anybody have an idea where the gap might come from?
ScanCode.io result:
"license_detections": [
{
"license_expression": "cc-by-4.0",
"license_expression_spdx": "CC-BY-4.0",
"matches": [
{
"license_expression": "cc-by-4.0",
"spdx_license_expression": "CC-BY-4.0",
"from_file": "codebase/font-awesome-f4f114c4ab37d101e6a15370769bc0af681792fa/package.json",
"start_line": 1,
"end_line": 1,
"matcher": "1-hash",
"score": 50.0,
"matched_length": 4,
"match_coverage": 100.0,
"rule_relevance": 50,
"rule_identifier": "spdx_license_id_cc-by-4.0_for_cc-by-4.0.RULE",
"rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/spdx_license_id_cc-by-4.0_for_cc-by-4.0.RULE",
"matched_text": "CC-BY-4.0"
}
],
"identifier": "cc_by_4_0-415c083c-ccd1-233c-986e-75bb1ddc3fdc"
},
{
"license_expression": "mit",
"license_expression_spdx": "MIT",
"matches": [
{
"license_expression": "mit",
"spdx_license_expression": "MIT",
"from_file": "codebase/font-awesome-f4f114c4ab37d101e6a15370769bc0af681792fa/package.json",
"start_line": 1,
"end_line": 1,
"matcher": "1-spdx-id",
"score": 100.0,
"matched_length": 1,
"match_coverage": 100.0,
"rule_relevance": 100,
"rule_identifier": "spdx-license-identifier-mit-5da48780aba670b0860c46d899ed42a0f243ff06",
"rule_url": null,
"matched_text": "MIT"
}
],
"identifier": "mit-a822f434-d61f-f2b1-c792-8b8cb9e7b9bf"
},
{
"license_expression": "ofl-1.1",
"license_expression_spdx": "OFL-1.1",
"matches": [
{
"license_expression": "ofl-1.1",
"spdx_license_expression": "OFL-1.1",
"from_file": "codebase/font-awesome-f4f114c4ab37d101e6a15370769bc0af681792fa/package.json",
"start_line": 1,
"end_line": 1,
"matcher": "1-hash",
"score": 50.0,
"matched_length": 3,
"match_coverage": 100.0,
"rule_relevance": 50,
"rule_identifier": "spdx_license_id_ofl-1.1_for_ofl-1.1.RULE",
"rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/spdx_license_id_ofl-1.1_for_ofl-1.1.RULE",
"matched_text": "OFL-1.1"
}
],
"identifier": "ofl_1_1-52c45f4c-8cce-acf4-9ef3-6682faf0c586"
}
vs ORT result:
scanner:
name: "ScanCode"
version: "32.1.0"
configuration: "--copyright --license --license-text --info --strip-root --timeout
\ 600 --json-pp"
summary:
start_time: "2024-06-28T10:59:46.000199521Z"
end_time: "2024-06-28T11:01:51.000822060Z"
licenses:
- license: "CC-BY-4.0"
location:
path: "package.json"
start_line: 10
end_line: 11
- license: "OFL-1.1"
location:
path: "package.json"
start_line: 13
end_line: 13
score: 50.0
5 replies

sschuberth
1 day ago
ORT (deliberately) does not run ScanCode with the --package option. Is the MIT finding maybe only present in the rawresult with that option?

sschuberth
1 day ago
Because the start / end line of 1 is also a bit suspicious / clearly wrong.

sschuberth
1 day ago
In any case, ORT should report a declared license of MIT for that package, so in total no license information is lost.

Anton (VW)
1 day ago
Thanks for the hint, will check that. Would you recommend to always enable the package option?

sschuberth
1 day ago
I would recommend to always disable it when using ORT, that's why that's the default 😉 One of the reasons for this is that enabling it breaks ORT's semantics to clearly distinguish between "detected" and "declared" licenses, as --package causes ScanCode to report declared licenses as detected licenses.

@pombredanne
Copy link
Member

@vw-anton re:

clearly distinguish between "detected" and "declared" licenses, as --package causes ScanCode to report declared licenses as detected licenses.

We track these licenses at the package level

  • declared_license_expression: The license expression for this package typically derived from its extracted_license_statement or from some other type-specific routine or convention.
  • other_license_expression:The license expression for this package which is different from the declared_license_expression, (i.e. not the primary license) routine or convention.

Both are normalized licenses on which we ran ScanCode license detection, using eventually package-type-specific conventions.

We also track:

  • extracted_license_statement: The license statement mention, tag or text as found in a package manifest and extracted. This can be a string, a list or dict of strings possibly nested, as found originally in the manifest.

  • notice_text: A notice text for this package.

declared_license_expression is generally consistent with SPDX definition. There is no such thing as "detected license" in SPDX and we do not track concluded license in ScanCode toolkit since as a tool it does not conclude anything.

So please consider the way we implemented to detect licenses correctly with the --package option. I am open to refinements, improvements and enhancements but you have a designed, tested and correct way to detect all these licenses right now without doing any changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants