Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Schema-invalid serialized result when multiple licenses #365

Closed
madpah opened this issue Mar 23, 2023 · 15 comments · Fixed by #466 or #440
Closed

[BUG] Schema-invalid serialized result when multiple licenses #365

madpah opened this issue Mar 23, 2023 · 15 comments · Fixed by #466 or #440
Assignees
Labels
breaking change bug Something isn't working

Comments

@madpah
Copy link
Collaborator

madpah commented Mar 23, 2023

As of cyclonedx-python-lib 4.0.0 there appears to be a serialization error when the provided (valid) Model has more than one license added.

Example:

bom = Bom()
...
bom.metadata.licenses = [LicenseChoice(license=License(
    id='Apache-2.0', text=AttachedText(
        content='VGVzdCBjb250ZW50IC0gdGhpcyBpcyBub3QgdGhlIEFwYWNoZSAyLjAgbGljZW5zZSE=', encoding=Encoding.BASE_64
    ), url=XsUri('https://www.apache.org/licenses/LICENSE-2.0.txt')
)), LicenseChoice(license=License(name='OSI_APACHE'))]

produces

...
<ns0:licenses>
    <ns0:license>
        <ns0:id>Apache-2.0</ns0:id>
        <ns0:text content-type="text/plain" encoding="base64">
            VGVzdCBjb250ZW50IC0gdGhpcyBpcyBub3QgdGhlIEFwYWNoZSAyLjAgbGljZW5zZSE=
        </ns0:text>
        <ns0:url>https://www.apache.org/licenses/LICENSE-2.0.txt</ns0:url>
    </ns0:license>
</ns0:licenses>
<ns0:licenses>
    <ns0:license>
        <ns0:name>OSI_APACHE</ns0:name>
    </ns0:license>
</ns0:licenses>
...

Which is invalid as per the CycloneDX schema.

this CDX schema discrepancy was fixed via CycloneDX/specification#204


important for the fix:
#365 (comment)

@madpah
Copy link
Collaborator Author

madpah commented Mar 23, 2023

Confirmed this impacts JSON and XML serialization.

@madpah
Copy link
Collaborator Author

madpah commented Mar 31, 2023

After some [internal] discussions with CycloneDX folks, we agree that the current Pythonic Model (use of LicenseChoice) is likely a poor representation of the CDX Specification.

Definitions such as:

def licenses(self) -> "SortedSet[LicenseChoice]":

are incorrect as it allows users of the model to supply a set with more than 1 License Expression - which is invalid according to CycloneDX Spec v1.4.

Syntactically, the above definition should more accurately be defined as:

def licenses(self) -> "Optional[Union[LicenseExpression, List[License]]]":

FYI @jkowalleck

madpah added a commit that referenced this issue Apr 3, 2023
… serialization

BREAKING CHANGE: Models changed to resolve #365

Signed-off-by: Paul Horton <[email protected]>
@madpah
Copy link
Collaborator Author

madpah commented Apr 3, 2023

After further investigation, the challenge associated with serialization for licenses is also challenging because the structure/schema is different between JSON and XML under v1.4:

JSON
licenses is an Array of either expression or license - i.e. 0 or more expression or 0 or more license see here

XML
licenses is an complex type defined as either 0 or more license OR 0 or 1 expression see here

@jkowalleck
Copy link
Member

jkowalleck commented Apr 3, 2023

re: #365 (comment)
this is bad news in terms of a deserialization-conflict, because we cannot control input data. generic parser/deserializer would not do it here. we might need to write one manually.
WE can control outputdata via our own serializers - so this path has no problems.

anyway, @madpah , could you open an issue/discussion in https://github.com/CycloneDX/specification/ ?

@jkowalleck
Copy link
Member

jkowalleck commented Apr 3, 2023

@madpah , i think the following is a forgiving deserializing (since a mix of expression and licenses was possible in spec<=1.4)
but also promising for serialization:

  • do not change the models. lets have it a set of mixed expressions and other licenses
  • do not change the deserializations -- as it was possible to mix in JSON for CDX<=1.4
  • add a preprocessor for serializations:
    if there is any expression in the list of licenses: return a list that contains one only expression.
    This way the usual serialization will work with a list of exactly one item(an expression).
    Result: expressions preferred, all other items silently dropped(or add a warning).

Result:

  • you can still deserialize all existing (valid) documents
  • the serializer is forced(fixed) to produce valid documents

PS: here is an example implementation i drafted for TypeScript - https://github.com/CycloneDX/cyclonedx-javascript-library/blob/a43f5f1c61945b4479f3ead79d05de1db36a63f1/src/serialize/xml/normalize.ts#L488-L503
PPS: here is a draft implementation in python: #371

@gruebel
Copy link
Contributor

gruebel commented Jun 15, 2023

hey, @jkowalleck and @madpah any update on this? I wanted to upgrade our project to use the latest version ,but this blocking me a bit. Any insights of potential fixes or recommendation on what to do with the multiple license case?

@jkowalleck
Copy link
Member

I am looking into the proposed fixed.
trying to find a solution.

@jkowalleck
Copy link
Member

jkowalleck commented Jun 20, 2023

@madpah impossible to create a solution with the existing solution for serialization.
The library is just not capable to model these things.

We need custom (de)serialization here, as we had previously.
I am not certain if this can be hooked into the currently used serialization framework at all.
please help.

@jkowalleck jkowalleck added this to the 5.0.0 milestone Sep 16, 2023
@jkowalleck
Copy link
Member

bugfixing this might break things downstream.
so lets put this into next major version

@jkowalleck jkowalleck removed this from the 5.0.0 milestone Sep 22, 2023
@jkowalleck jkowalleck changed the title [BUG] Serialization fails when a component or metadata has more than one license [BUG] Schema-invalid serialized result when multiple licenses Sep 24, 2023
jkowalleck added a commit that referenced this issue Sep 25, 2023
Signed-off-by: Jan Kowalleck <[email protected]>
jkowalleck added a commit that referenced this issue Sep 25, 2023
regression for issue #365

Signed-off-by: Jan Kowalleck <[email protected]>
@jkowalleck
Copy link
Member

jkowalleck commented Oct 1, 2023

with https://github.com/madpah/serializable/releases/tag/v0.13.0
i should be able to solve the issue soon.

@jkowalleck
Copy link
Member

jkowalleck commented Oct 5, 2023

implementation details:

  • models must be backwards-compatible: accept a mixture of exptession and others
  • deserializer must be backwards-compatible: accept a mixture of exptession and others
  • serializer: can print just the one appropriate value ...
  • there MUST be tests to check whether "old"/invalid data can be imported

@jkowalleck jkowalleck self-assigned this Oct 8, 2023
@jkowalleck jkowalleck pinned this issue Oct 8, 2023
@jkowalleck
Copy link
Member

jkowalleck commented Oct 8, 2023


will tackle this issue soon.

@jkowalleck jkowalleck linked a pull request Oct 10, 2023 that will close this issue
jkowalleck added a commit that referenced this issue Oct 10, 2023
breaking changes
------------------
* Reworked license related models and collections
* API
  * Removed class `factory.license.LicenseChoiceFactory`  
    The old functionality was integrated into `factory.license.LicenseFactory`.
  * Method `factory.license.LicenseFactory.make_from_string()`'s parameter `name_or_spdx` was renamed to `value`
  * Method `factory.license.LicenseFactory.make_from_string()`'s return value can also be a `LicenseExpression`
    The behavior imitates the old `factory.license.LicenseChoiceFactory.make_from_string()`
  * Renamed class `module.License` to `module.license.DisjunctliveLicense`
  * Removed class `module.LicenseChoice`
    Use dedicated classes `module.license.DisjunctliveLicense` and `module.license.LicenseExpression` instead
  * All occurrences of `models.LicenseChoice` were replaced by `models.licenses.License`
  * All occurrences of `SortedSet[LicenseChoice]` were specialized to `models.license.LicenseRepository`


fixes
------------------
* serialization of multy-licenses #365

added
------------------
* API
  * Method `factory.license.LicenseFactory.make_with_expression()`
  * Class `model.license.DisjunctiveLicense`
  * Class `model.license.LicenseExpression`
  * Class `model.license.LicenseRepository`
  * Class `serialization.LicenseRepositoryHelper`

tests
------------------
* added regression test for bug #365

misc
------------------
* raised dependency `py-serializable@^9.15`



----

fixes #365

~~BLOCKED by a feature request to serializer: <https://github.com/madpah/serializable/pull/32>~~


---------

Signed-off-by: Jan Kowalleck <[email protected]>
@jkowalleck jkowalleck linked a pull request Oct 10, 2023 that will close this issue
@jkowalleck
Copy link
Member

solution is implemented and will be part up upcoming major version.
regression test and schema validator in place, to assure a complete fix. 🎈

jkowalleck added a commit that referenced this issue Oct 11, 2023
jkowalleck added a commit that referenced this issue Oct 11, 2023
@jkowalleck
Copy link
Member

for everybody willing to test the fix: see 5.0.0-rc.1
this is also available vie PyPI

jkowalleck added a commit that referenced this issue Oct 24, 2023
BREAKING CHANGES
----------------
* Dropped support for python<3.8 ([#436] via [#441]; enable [#433])
* Reworked license related models, collections, and factories ([#365] via [#466])
* Behavior
  * Method `model.bom.Bom.validate()` will throw `exception.LicenseExpressionAlongWithOthersException`, if detecting invalid license constellation ([#453] via [#452])
  * Fixed tuple comparison when unequal lengths (via [#461])
* API
  * Enum `schema.SchemaVersion` is no longer string-like ([#442] via [#447])
  * Enum `schema.OutputVersion` is no longer string-like ([#442] via [#447])
  * Abstract class `output.BaseOutput` requires implementation of new method `output_format` ([#446] via [#447])
  * Abstract method `output.BaseOutput.output_as_string()` got new optional parameter `indent` ([#437] via [#458])
  * Abstract method `output.BaseOutput.output_as_string()` accepts arbitrary kwargs (via [#458], [#462])
  * Removed class `factory.license.LicenseChoiceFactory` (via [#466])  
    The old functionality was integrated into `factory.license.LicenseFactory`.
  * Method `factory.license.LicenseFactory.make_from_string()`'s parameter `name_or_spdx` was renamed to `value` (via [#466])
  * Method `factory.license.LicenseFactory.make_from_string()`'s return value can also be a `LicenseExpression` ([#365] via [#466])  
    The behavior imitates the old `factory.license.LicenseChoiceFactory.make_from_string()`
  * Renamed class `module.License` to `module.license.DisjunctliveLicense` ([#365] via [#466])
  * Removed class `module.LicenseChoice` ([#365] via [#466])  
    Use dedicated classes `module.license.DisjunctliveLicense` and `module.license.LicenseExpression` instead
  * All occurrences of `models.LicenseChoice` were replaced by `models.licenses.License` ([#365] via [#466])
  * All occurrences of `SortedSet[LicenseChoice]` were specialized to `models.license.LicenseRepository` ([#365] via [#466])


Fixed
----------------
* Serialization of multy-licenses ([#365] via [#466])
* Detect unused "dependent" components in `model.bom.validate()` (via [#464])


Changed 
----------------
* Updated latest supported list of supported SPDX license identifiers (via [#433])
* Shipped schema files are moved to a protected space (via [#433])  
  These files were never intended for public use.
* XML output uses a default namespace, which makes results smaller. ([#438] via [#458])


Added
----------------
* Support for Python 3.12 (via [#460])
* JSON- & XML-Validators ([#432], [#446] via [#433], [#448])  
  The functionality might require additional dependencies, that can be installed with the extra "validation".  
  See the docs in section "Installation" for details.
* JSON & XML can be generated in a more human-friendly form ([#437], [#438] via [#458])
* Type hints, typings & overloads for better integration downstream (via [#463])
* API
  * New function `output.make_outputter()` (via [#469])  
    This replaces the deprecated function `output.get_instance()`.
  * New sub-package `validation` ([#432], [#446] via [#433], [#448], [#469], [#468], [#469])
  * New class `exception.MissingOptionalDependencyException` ([#432] via [#433])
  * New class `exception.LicenseExpressionAlongWithOthersException` ([#453] via [#452])
  * New dictionaries `output.{json,xml}.BY_SCHEMA_VERSION` ([#446] via [#447])
  * Existing implementations of class `output.BaseOutput` now have a new method `output_format` ([#446] via [#447])
  * Existing implementations of method `output.BaseOutput.output_as_string()` got new optional parameter `indent` ([#437] via [#458])
  * Existing implementations of method `output.BaseOutput.output_to_file()` got new optional parameter `indent` ([#437] via [#458])
  * New method `factory.license.LicenseFactory.make_with_expression()` (via [#466])
  * New class `model.license.DisjunctiveLicense` ([#365] via [#466])
  * New class `model.license.LicenseExpression` ([#365] via [#466])
  * New class `model.license.LicenseRepository` ([#365] via [#466])
  * New class `serialization.LicenseRepositoryHelper` ([#365] via [#466])


Deprecated
----------------
* Function `output.get_instance()` might be removed, use `output.make_outputter()` instead (via [#469])


Tests
----------------
* Added validation tests with official CycloneDX schema test data ([#432] via [#433])
* Use proper snapshots, instead of pseudo comparison ([#437] via [#464])
* Added regression test for bug [#365] (via [#466], [#467])


Misc
----------------
* Dependencies: bumped `py-serializable@^0.15.0`, was `@^0.11.1` (via [#458], [#463], [#464], [#466])
* Style: streamlined quotes and strings (via [#472])
* Chore: bumped internal dev- and QA-tools ([#436] via [#441], [#472])
* Chore: added more QA tools to prevent common security issues (via [#473])


[#432]: #432
[#433]: #433
[#436]: #436
[#437]: #437
[#365]: #365
[#438]: #438
[#440]: #440
[#441]: #441
[#442]: #442
[#446]: #446
[#447]: #447
[#448]: #448
[#452]: #452
[#453]: #453
[#458]: #458
[#460]: #460
[#461]: #461
[#462]: #462
[#463]: #463
[#464]: #464
[#466]: #466
[#467]: #467
[#468]: #468
[#469]: #469
[#472]: #472
[#473]: #473

---------

Signed-off-by: Jan Kowalleck <[email protected]>
Signed-off-by: Jan Kowalleck <[email protected]>
Signed-off-by: semantic-release <semantic-release>
Co-authored-by: semantic-release <semantic-release>
@jkowalleck
Copy link
Member

jkowalleck commented Oct 24, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment