Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spdx-utils: Very poor performance within ResolvedLicenseInfo.effectiveLicense() #9902

Closed
fviernau opened this issue Feb 7, 2025 · 0 comments · Fixed by #9909
Closed

spdx-utils: Very poor performance within ResolvedLicenseInfo.effectiveLicense() #9902

fviernau opened this issue Feb 7, 2025 · 0 comments · Fixed by #9909
Assignees
Labels
evaluator About the evaluator tool reporter About the reporter tool spdx-utils About the SPDX utility library

Comments

@fviernau
Copy link
Member

fviernau commented Feb 7, 2025

Scenario

  • pipeline runs analyzer, scanner, evaluator, reporter
  • project contains a package with:
    • large amount of detected licenses
    • in particular including ~25 distinct expressions with at least one OR
      operator and 200 distinct expressions in total
  • when upgrading from ORT version 28.0.0 to 45.0.0 the performance renders ORT unusable:
    • The evaluator runs for 3 days without terminating
    • The reporter runs for 3+ hours

observation

@fviernau fviernau added bug evaluator About the evaluator tool reporter About the reporter tool spdx-utils About the SPDX utility library to triage Issues that need triaging labels Feb 7, 2025
@fviernau fviernau changed the title spdx-utils: Massiv performance within ResolvedLicenseInfo.effectiveLicense() spdx-utils: Very poor performance within ResolvedLicenseInfo.effectiveLicense() Feb 7, 2025
fviernau added a commit that referenced this issue Feb 7, 2025
31b9be8 introduced an `equals()` into `SpdxCompoundExpression::and`. For
packages with large amounts of detected licenses including ones with
`OR` operators, the performance of the `and` function becomes so poor
that the evaluator runs for 3 days (with the open source `ort-config`
rules, without terminating. Also the reporter performance degrades from
less than 30 seconds to 3 hours.

Unfortunately the call tree involved is so complicated that there is no
simple fix for this. `equals()` involves recursive `validChoices()`
calls which in turn call `equals()` again when computing distinctness or
inserting further `SpdxExpressions` into sets.

Revert the change to fix the problem in a timely manner. Note that it
makes sense to bring back the now reverted functionaliy, but the code
should be refactored first to produce more managable call trees.

Fixes: #9902.

This reverts commit 31b9be8.

Signed-off-by: Frank Viernau <[email protected]>
fviernau added a commit that referenced this issue Feb 7, 2025
31b9be8 introduced an `equals()` into `SpdxCompoundExpression::and`. For
packages with large amounts of detected licenses including ones with
`OR` operators, the performance of the `and` function becomes so poor
that the evaluator runs for 3 days (with the open source `ort-config`
rules, without terminating. Also the reporter performance degrades from
less than 30 seconds to 3 hours.

Unfortunately the call tree involved is so complicated that there is no
simple fix for this. `equals()` involves recursive `validChoices()`
calls which in turn call `equals()` again when computing distinctness or
inserting further `SpdxExpressions` into sets.

Revert the change to fix the problem in a timely manner. Note that it
makes sense to bring back the now reverted functionality, but the code
should be refactored first to produce more managable call trees.

Fixes: #9902.

This reverts commit 31b9be8.

Signed-off-by: Frank Viernau <[email protected]>
@fviernau fviernau removed the to triage Issues that need triaging label Feb 7, 2025
@fviernau fviernau self-assigned this Feb 10, 2025
fviernau added a commit that referenced this issue Feb 10, 2025
There are multiple code locations which use `reduce` together with
`and()` to concatenate a given collection of expressions to a compound
SPDX expression. If `n` expressions are given, then `n - 1` SPDX
compound expressions instances get constructed via `n - 1` `and()`
calls.

As of [1] each `and()` execution started to invoke `equals()` at the
very beginning. Dependening on the expression this can be very
expensive. For example if the expression is contains OR operator(s)
and is bit larger, then the call tree becomes quite heavy-weight.
It's comprised of recursive `validChoices()` calls which insert
expressions into sets which in turn leads to further `equals()` calls
and so forth.

When upgrading ORT from version 28.0.0 to 45.0.0 a performance
the evaluator hadn't finished after running for 3 days, while the
reporter took about 3 hours to finish for some real world scan.
This issue has been introduced by [1], because reverting that (on top of
`main`) does fix the performance problem.

Reduce the mentioned `n - 1` `and()` calls to just a single one to relax
the issue. This makes the evaluator finish in 1.8 seconds and reporter
in 3 seconds again, for the real world scan mentioned above.

Fixes: #9902.

[1]: 31b9be8

Signed-off-by: Frank Viernau <[email protected]>
fviernau added a commit that referenced this issue Feb 10, 2025
There are multiple code locations which use `reduce` together with
`and()` to concatenate a given collection of expressions to a compound
SPDX expression. If `n` expressions are given, then `n - 1` SPDX
compound expressions instances get constructed via `n - 1` `and()`
calls.

As of [1] each `and()` execution started to invoke `equals()` at the
very beginning. Dependening on the expression this can be very
expensive. For example if the expression contains OR operators and is
bit larger, then the call tree becomes quite heavy-weight. It's
comprised of recursive `validChoices()` calls which insert expressions
into sets which in turn leads to further `equals()` calls and so forth.

When upgrading ORT from version 28.0.0 to 45.0.0 a performance
the evaluator hadn't finished after running for 3 days, while the
reporter took about 3 hours to finish for some real world scan. This
issue has been introduced by [1], because reverting that (on top of
`main`) does fix the performance problem.

Reduce the mentioned `n - 1` `and()` calls to just a single one to
relax the issue. This makes the evaluator finish in 1.8 seconds and
reporter in 3 seconds again, for the real world scan mentioned above.

Fixes: #9902.

[1]: 31b9be8

Signed-off-by: Frank Viernau <[email protected]>
fviernau added a commit that referenced this issue Feb 10, 2025
There are multiple code locations which use `reduce` together with
`and()` to concatenate a given collection of expressions to a compound
SPDX expression. If `n` expressions are given, then `n - 1` SPDX
compound expressions instances get constructed via `n - 1` `and()`
calls.

As of [1] each `and()` execution started to invoke `equals()` at the
very beginning. Dependening on the expression this can be very
expensive. For example if the expression contains OR operators and is
bit larger, then the call tree becomes quite heavy-weight. It's
comprised of recursive `validChoices()` calls which insert expressions
into sets which in turn leads to further `equals()` calls and so forth.

After upgrading ORT from version 28.0.0 to 45.0.0 the execution of
the evaluator took more than 3 days finishing, and the reporter took
about 3 hours to finish for some real world scan. That issue has been
introduced by [1], because reverting [1] (on top of `main`) does fix
the performance problem.

Reduce the mentioned `n - 1` `and()` calls to just a single one to
relax the issue. This makes the evaluator finish in 1.8 seconds and
the reporter in 3 seconds again, for the real world scan mentioned
above.

Fixes: #9902.

[1]: 31b9be8

Signed-off-by: Frank Viernau <[email protected]>
fviernau added a commit that referenced this issue Feb 10, 2025
There are multiple code locations which use `reduce` together with
`and()` to concatenate a given collection of expressions to a compound
SPDX expression. If `n` expressions are given, then `n - 1` SPDX
compound expressions instances get constructed via `n - 1` `and()`
calls.

As of [1] each `and()` execution started to invoke `equals()` at the
very beginning. Dependening on the expression this can be very
expensive. For example, if the expression contains OR operators and is
a bit larger, then the call tree becomes quite heavy-weight. It's
comprised of recursive `validChoices()` calls which insert expressions
into sets which in turn leads to further `equals()` calls and so forth.

After upgrading ORT from version 28.0.0 to 45.0.0 the execution of
the evaluator took more than 3 days without finishing, and the reporter
took about 3 hours to finish for some real world scan. That issue has
been introduced by [1], because reverting [1] (on top of `main`) does
fix the performance problem.

Reduce the mentioned `n - 1` `and()` calls to just a single one to
relax the issue. This makes the evaluator finish in 1.8 seconds and
the reporter in 3 seconds again, for the real world scan mentioned
above.

Fixes: #9902.

[1]: 31b9be8

Signed-off-by: Frank Viernau <[email protected]>
fviernau added a commit that referenced this issue Feb 10, 2025
There are multiple code locations which use `reduce` together with
`and()` to concatenate a given collection of expressions to a compound
SPDX expression. If `n` expressions are given, then `n - 1` SPDX
compound expressions instances get constructed via `n - 1` `and()`
calls.

As of [1] each `and()` execution started to invoke `equals()` at the
very beginning. Dependening on the expression this can be very
expensive. For example, if the expression contains OR operators and is
a bit larger, then the call tree becomes quite heavy-weight. It's
comprised of recursive `validChoices()` calls which insert expressions
into sets which in turn leads to further `equals()` calls and so forth.

After upgrading ORT from version 28.0.0 to 45.0.0 the execution of
the evaluator took more than 3 days without finishing, and the reporter
took about 3 hours to finish for some real world scan. That issue has
been introduced by [1], because reverting [1] (on top of `main`) does
fix the performance problem.

Reduce the mentioned `n - 1` `and()` calls to just a single one to
relax the issue. This makes the evaluator finish in 1.8 seconds and
the reporter in 3 seconds again, for the real world scan mentioned
above.

Fixes: #9902.

[1]: 31b9be8

Signed-off-by: Frank Viernau <[email protected]>
fviernau added a commit that referenced this issue Feb 10, 2025
There are multiple code locations which use `reduce` together with
`and()` to concatenate a given collection of expressions to a compound
SPDX expression. If `n` expressions are given, then `n - 1` SPDX
compound expressions instances get constructed via `n - 1` `and()`
calls.

As of [1] each `and()` execution started to invoke `equals()` at the
very beginning. Dependening on the expression this can be very
expensive. For example, if the expression contains OR operators and is
a bit larger, then the call tree becomes quite heavy-weight. It's
comprised of recursive `validChoices()` calls which insert expressions
into sets which in turn leads to further `equals()` calls and so forth.

After upgrading ORT from version 28.0.0 to 45.0.0 the execution of
the evaluator took more than 3 days without finishing, and the reporter
took about 3 hours to finish for some real world scan. That issue has
been introduced by [1], because reverting [1] (on top of `main`) does
fix the performance problem.

Reduce the mentioned `n - 1` `and()` calls to just a single one to
relax the issue. This makes the evaluator finish in 1.8 seconds and
the reporter in 3 seconds again, for the real world scan mentioned
above.

Fixes: #9902.

[1]: 31b9be8

Signed-off-by: Frank Viernau <[email protected]>
@sschuberth sschuberth removed the bug label Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
evaluator About the evaluator tool reporter About the reporter tool spdx-utils About the SPDX utility library
Projects
None yet
2 participants