Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add clarification on what a list of licenses means #349

Open
lfrancke opened this issue Nov 29, 2023 · 15 comments
Open

Add clarification on what a list of licenses means #349

lfrancke opened this issue Nov 29, 2023 · 15 comments

Comments

@lfrancke
Copy link

Component.licenses has this text "EITHER (list of SPDX licenses and/or named licenses) OR (tuple of one SPDX License Expression)"

It is not made clear what a list of licenses means.
There are at least two options:

  • AND (all licenses need to be complied with)
  • OR (pick one)

This ambiguity can be avoided using SPDX license expressions but if we get an SBOM with just a list we need to make a decision without any further information.

To be safe I would probably interpret it as AND in that case.

At least a comment should be added that this is undefined.
I would probably even go as far as saying that only a single license is allowed and if there are more an expression needs to be used.

I understand that almost all changes except a clarifying comment would be backwards breaking changes.

@lfrancke
Copy link
Author

I think there is a case to be made to even deprecate the list of licenses entirely.

This started as a discussion on Slack
https://cyclonedx.slack.com/archives/CVA0G10FN/p1701087945143049?thread_ts=1701087945.143049&cid=CVA0G10FN

@stevespringett
Copy link
Member

Support for only SPDX license expression is not an option. There are over 2500 open source licenses and SPDX only supports about 500 or so of them. The SPDX project also does not support any commercial licenses, so having support for license names along with attaching the full text of the license is a requirement for any commercial BOM use case.

Maven IMO, has the most ambiguity of any modern build system. For a license list, this is what the Maven POM XSD states:

This element describes all of the licenses for this project. Each license is described by a license element, which is then described by additional elements. Projects should only list the license(s) that applies to the project and not the licenses that apply to dependencies. If multiple licenses are listed, it is assumed that the user can select any of them, not that they must accept all.

An interesting fact is that the last sentence does not appear in their documentation.
https://maven.apache.org/pom.html#Licenses

We could strengthen the documentation to read AND or OR - whichever we choose. We could also add a conjunction field to the spec so that the BOM author can specify.

@jkowalleck
Copy link
Member

jkowalleck commented Dec 31, 2023

We could strengthen the documentation to read AND or OR - whichever we choose. We could also add a conjunction field to the spec so that the BOM author can specify.

I saw this always as an "OR". like in "Here are some licenses that suite our project. Choose the one that applies in your area."
Most multi-license projects I recall had such a sentence in their README and License.txt.


collection of examples:

@lfrancke
Copy link
Author

I talked to someone who thought it is AND.

I don't think any of those are more correct than the other which makes it important to document what is meant.

@tlandschoff-scale
Copy link

I saw this always as an "OR". like in "Here are some licenses that suite our project. Choose the one that applies in your area." Most multi-license projects I recall had such a sentence in their README and License.txt.

collection of examples:

* https://pypi.org/project/text-unidecode/ -- Artistic License, or GPL or GPLv2+

This is one case I have often see. On the other side of the spectrum there are collections of code that was bundled together. To stay on pypi:

I'd like to use a SPDX expression in the SBOM, but we are required to include the full license text for each license. Which is why I am using the licenses as list all the different licenses.

@Joerki
Copy link

Joerki commented Dec 13, 2024

The changed definition of allowed licenses from v1.4 to v1.5 raises a big problem for me (and I assume for all that somehow processes Debian packages and other sources).

(Debian) Packages may have content from different authors, and each contribution to a package may come with a license or license expression. v1.4 is fine for that.

But the limitation to have a list of single license items on the one hand OR a (compound?) SPDX expression on the other hand is not compatible with the real world.

Even more, we do not have only "compound expressions" (what many people might think) as SPDX expressions, we might also have "simple expressions" (LicenseRef-*) to identify a license.

My impression is that with v1.5 we have a significant design flaw.
Or can anybody explain to where I'm wrong and how I may convert a Debian copyright (file) listing single and compound licenses into CycloneDX?

https://spdx.github.io/spdx-spec/v2.3/SPDX-license-expressions/
https://spdx.github.io/spdx-spec/v3.0.1/annexes/spdx-license-expressions/
https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/

A Debian package might be split by its stanzas. and as a result I might have "file" type components instead of "libraries", in theory. But this would lead result into enormous efforts, not practical and helpful. And it does not resolve the use "simple expression" SPDX expressions.

@jkowalleck
Copy link
Member

My impression is that with v1.5 we have a significant design flaw.

@Joerki , could you give a practical example for something that is not possible with today's design?

@Joerki
Copy link

Joerki commented Dec 14, 2024

Hi @jkowalleck ,

My impression is that with v1.5 we have a significant design flaw.

@Joerki , could you give a practical example for something that is not possible with today's design?

to do a separation with between a list or single expression I see the following issues:

With the expression I don't see how to include a license text for a certain item of the expression.
Licenses that come with the SPDX license list that come with a text without placeholders are not a problem.
For a standard license this might be a problem if the license definition has placeholders for e.g. authors or a company in the text. My colleague who deals with legal aspects says that the use of such a "template" is not sufficient for a reference, we need a "verbatim" copy of the license text (which is stored in the public repo of the component). In an attribution report (that we generate from the SBOM) we must have texts that satisfy these legal requirements, so the text must be contained in the SBOM.
This could be a problem with 1.4 already if such a license is referenced in an expression.
SPDX allows to create a custom ID (LicenseRef-*). This is declared also as "expression" like compound expression given as example in the CycloneDX spec. And again: where can I specify the license text that belongs to the non-standard ID?

Example: https://metadata.ftp-master.debian.org/changelogs//main/o/openssl/openssl_3.0.15-1~deb12u1_copyright
Please note that in these files you do not find standardized IDs. Therefore you have both IDs and texts. Texts might appear in a "Files" stanza or a dedicated "License" stanza (which makes sense when licenses appear multiple times). So I don't need a reference to content outside the copyright file.
I convert the IDs to an SPDX ID of a standard license, this was possible for me in the past to have finally a proper SPDX expression. I use the aboutcode.org license list repo and extend it for us.

My conclusion:
With the license list I have the chance to provide (almost) full information when several licenses need to be considered at the same time including license texts (X AND Y).

CycloneDX limits the use of SPDX expressions to cases where the creator has to make a conclusion for a multi-licensed component where he can choose between licenses (X OR Y) that have a known, standardized text that can be taken 1:1 from its original definition.

@jkowalleck
Copy link
Member

i guess we can cut all of your concerns, if you had a look at #454.
This ticket does not address your issue, that you wanted to add MULTIPLE license tests to a single license expression.

@jkowalleck
Copy link
Member

jkowalleck commented Dec 14, 2024

Example: https://metadata.ftp-master.debian.org/changelogs//main/o/openssl/openssl_3.0.15-1~deb12u1_copyright

so the software itself is Apache-2, and some of its components are licensed under different licenses? or what does this file actually mean?
Anyway, if you had a complex SPDX expression, you were unable to add even a single license attachment. In this case, today, i would use license evidences to store the actual texts.
But for future improvement, i would create a feature request. whichI just did: #554

@Joerki
Copy link

Joerki commented Jan 7, 2025

Hi @jkowalleck , if you read Debians copyright specification (link is given in my comment above) you can see what the stanzas (contents of a package) contain and what important meaning the ordering of the stanzas have.

Dear @stevespringett, you said in your comment:

Support for only SPDX license expression is not an option. There are over 2500 open source licenses and SPDX only supports ab out 500 or so of them. The SPDX project also does not support any commercial licenses, so having support for license names along with attaching the full text of the license is a requirement for any commercial BOM use case.

This is wrong (at this time) and it makes me desparate when I refer to the SPDX documentation that explains what an "SPDX expression" is and why the information of the links I gave above to the relevant SPDX documentation are not adhered.

So, please read and understand the SPDX documentation (at least the annex about SPDX expressions) and see that SPDX covers all kinds of licenses!
The list of official SPDX IDs is only a PART of the whole SPDX (license expression) definition.

SBOMs that contain proper licensing information become a very high relevance, because there are legel requirements given by the European Community CRA. And a proper license attribution is a part of it.

The german Federal Office for Information Security is working on that topic:

https://www.bsi.bund.de/EN/Themen/Unternehmen-und-Organisationen/Standards-und-Zertifizierung/Technische-Richtlinien/TR-nach-Thema-sortiert/tr03183/TR-03183_node.html (English)

The referenced SBOM document (in English) clearly describes also how licensing shall be handled. They refer to the SPDX documentation as well.

At this time it is a recommendation, but it might be possible that companies try to implement SBOM creation based on the BSI document.

The software ecosystems in the world give so many different established flavors of software license approaches in their components, and the SBOM specifications need to implement that and prove their suitablity.

@Joerki
Copy link

Joerki commented Jan 7, 2025

I talked to someone who thought it is AND.

I don't think any of those are more correct than the other which makes it important to document what is meant.

@lfrancke

We already have implemented software tools, software packages of many different software ecosystems and specifications in the world like Syft tool, the Debian copyright format, many other ecosystems with their metadata.

My experience with different software ecosystems is that in case of lists scanners give items where "AND" is applicable.
List items may be a simple expression or compound expression (OR in most cases), where the programmer of the software requires to specify a concluded license when the author(s) offer a choice (with "OR").
This is also human readable and managable.

In a multi-licensed component we also need to distinguish between software that has different contributors (like in Linux distributions, source file/directory based, see Debian) and packages that come as a bundle (with the primary work of the authors plus software that is coming e.g. in binary form from other sources).
Also here, just and interpretation of "AND" makes sense to me.

@stevespringett
Copy link
Member

stevespringett commented Jan 12, 2025

I think we're getting off topic, however @Joerki I'm fully aware of the capabilities of SPDX license expressions. The reality is that SAM and ITAM systems that virtually every enterprise relies on, do not have hard requirements on commercial license identifiers being prefixed with LicenseRef-., which is what SPDX expressions require. Without aligning every CMDB vendor on SPDX license expression support, relying solely on SPDX expressions is a non-starter. CycloneDX needs to be able to work across the entire supply chain, not just part of it.

I'm not aware of any commercial or open source license use case that CycloneDX does not support. At the present time, I believe it has the most comprehensive license support available. Please create a separate ticket for each issue if there are gaps you're seeing.

However, this GitHub issue specifically was about the use of AND / OR when it comes to a list of licenses. With the limitations of SPDX expressions in mind, it may make sense to provide an option for the user to specify AND or OR thus creating their own verbose version of an expression. This would have the benefit of being able to be compatible with SPDX expressions as well as the SAM and ITAM systems that enterprises already use.

@jkowalleck
Copy link
Member

jkowalleck commented Jan 16, 2025

it may make sense to provide an option for the user to specify AND or OR

👍 from me

this would probably be the most backwards-compatible solution, spec wise.
I mean, this is a question of the spec (not schema/implementation).

From a legal's perspective, we don't have a decision on "AND versus OR" in the current specs. And I would consider adding such a breaking change.
All existing specs are undecided. If I took an existing spec, and bumped the version, I would expect that the legal situation not changed - being "still undecided".

From a tool-builder's perspective, I do not want to code a process to make irrefutable/unassailable automated license conclusion (that's what well-educated lawyers are there for); that's for the following reason, there are ecosystems that (not yet) decided the "AND versus OR" question for multi-license packages. They require the user to read the README or other documents/evidences, and come to a conclusion. (CycloneDX has all it needs to ship/document these things, too)

But giving the option to make a decision is great.

What do others think?


Schema-implementation wise, I already envision this to be challenging, since a "AND versus OR" thing would need a "concluded/declared" property, too. But that might be a problem for future me. :)

@Joerki
Copy link

Joerki commented Jan 28, 2025

Hi @stevespringett , @jkowalleck

S: I think we're getting off topic, however @Joerki I'm fully aware of the capabilities of SPDX license expressions. The reality is that SAM and ITAM systems that virtually every enterprise relies on, do not have hard requirements on commercial license identifiers being prefixed with LicenseRef-., which is what SPDX expressions require. Without aligning every CMDB vendor on SPDX license expression support, relying solely on SPDX expressions is a non-starter.
CycloneDX needs to be able to work across the entire supply chain, not just part of it.

Yes, I have the entire supply chain in mind, this is what I find in my company with very diverse ecosystems, from device firmware to cloud applications, software that is created by us or by suppliers.
I don't want to apply limitations on the capabilities for license definition. It is the opposite,

S: I'm not aware of any commercial or open source license use case that CycloneDX does not support. At the present time, I believe it has the most comprehensive license support available. Please create a separate ticket for each issue if there are gaps you're seeing.

An example: machine-readable Debian copyright files list contributors of components together with the licenses (names and texts) and source code file sets. With V1.5 and V1.6 I don't see a chance always to map this information to an SBOM without a loss.

We have already an issue that discusses some limitations: #582, #454.

Another ticket is #554 that for me targets the availablility of license information in context of an expression like we have for license id/name items. It focuses the license text, but finally it makes sense to have text, url, licensing and properties for expression item(s) like we already have for id/name items. At this time acknowledgement is the only one we have both for name/id and expression item.

S: However, this GitHub issue specifically was about the use of AND / OR when it comes to a list of licenses. With the limitations of SPDX expressions in mind, it may make sense to provide an option for the user to specify AND or OR thus creating their own verbose version of an expression. This would have the benefit of being able to be compatible with SPDX expressions as well as the SAM and ITAM systems that enterprises already use.

The SPDX SBOM specification (I take v3 as reference) has other means to express sets of simple/composite license expressions and their relation to each other: the ConjunctiveLicenseSet and DisjunctiveLicenseSet classes.

My experience with 3rd party scanners (e.g. "syft" with "syft-json" format) is that they generate SBOMs with licenses as name/id list that can be interpreted as conjunctive. Disjunctive licenses have and explicit " OR " in an expression item. This is what I see in "real life".

J: From a legal's perspective, we don't have a decision on "AND versus OR" in the current specs. And I would consider adding such a breaking change.

A colleague and I were a bit shocked when we read this in the additional (non-JSON) documentation, because especially legal information needs to be precise and clear.

J: From a tool-builder's perspective, I do not want to code a process to make irrefutable/unassailable automated license conclusion (that's what well-educated lawyers are there for); that's for the following reason, there are ecosystems that (not yet) decided the "AND versus OR" question for multi-license packages. They require the user to read the README or other documents/evidences, and come to a conclusion. (CycloneDX has all it needs to ship/document these things, too)

It is a significant fallacy that license conclusion is put ito the hands of lawyers. I attended to several demos of SCA systems including FOSSA, together with people of their company. They explained to us of how to deal with (multi-licenses), and no work was spoken about a lawyer in that context. I also had a small conversation with Philippe Ombredanne about multi-licensing and their difficulties.

I do not expect that I can rely solely on a software that fully automates proper recognition and conclusion of a software, but I want to have the chance to document the (manually) identified concluded license information in a target CycloneDX SBOM based on the information from different kind of sources (SBOMs, copyright files etc.) in a way where I have no loss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants