spdx-utils: Very poor performance within `ResolvedLicenseInfo.effectiveLicense()` #9902

fviernau · 2025-02-07T14:18:21Z

Scenario

pipeline runs analyzer, scanner, evaluator, reporter
project contains a package with:
- large amount of detected licenses
- in particular including ~25 distinct expressions with at least one OR
  operator and 200 distinct expressions in total
when upgrading from ORT version 28.0.0 to 45.0.0 the performance renders ORT unusable:
- The evaluator runs for 3 days without terminating
- The reporter runs for 3+ hours

observation

effectiveLicense() joins all expression using the and function here:
https://github.com/oss-review-toolkit/ort/blame/c96f2e8707c74b5929ab870de4149c41460b23d5/model/src/main/kotlin/licenses/ResolvedLicenseInfo.kt#L70-L73
The and function as a duplicate elimination which calls equals().
Without the duplicate elimination the code is fast.
Duplicate elimination was introduced by 31b9be8
...is likely the root cause.

The text was updated successfully, but these errors were encountered:

31b9be8 introduced an `equals()` into `SpdxCompoundExpression::and`. For packages with large amounts of detected licenses including ones with `OR` operators, the performance of the `and` function becomes so poor that the evaluator runs for 3 days (with the open source `ort-config` rules, without terminating. Also the reporter performance degrades from less than 30 seconds to 3 hours. Unfortunately the call tree involved is so complicated that there is no simple fix for this. `equals()` involves recursive `validChoices()` calls which in turn call `equals()` again when computing distinctness or inserting further `SpdxExpressions` into sets. Revert the change to fix the problem in a timely manner. Note that it makes sense to bring back the now reverted functionaliy, but the code should be refactored first to produce more managable call trees. Fixes: #9902. This reverts commit 31b9be8. Signed-off-by: Frank Viernau <[email protected]>

31b9be8 introduced an `equals()` into `SpdxCompoundExpression::and`. For packages with large amounts of detected licenses including ones with `OR` operators, the performance of the `and` function becomes so poor that the evaluator runs for 3 days (with the open source `ort-config` rules, without terminating. Also the reporter performance degrades from less than 30 seconds to 3 hours. Unfortunately the call tree involved is so complicated that there is no simple fix for this. `equals()` involves recursive `validChoices()` calls which in turn call `equals()` again when computing distinctness or inserting further `SpdxExpressions` into sets. Revert the change to fix the problem in a timely manner. Note that it makes sense to bring back the now reverted functionality, but the code should be refactored first to produce more managable call trees. Fixes: #9902. This reverts commit 31b9be8. Signed-off-by: Frank Viernau <[email protected]>

There are multiple code locations which use `reduce` together with `and()` to concatenate a given collection of expressions to a compound SPDX expression. If `n` expressions are given, then `n - 1` SPDX compound expressions instances get constructed via `n - 1` `and()` calls. As of [1] each `and()` execution started to invoke `equals()` at the very beginning. Dependening on the expression this can be very expensive. For example if the expression is contains OR operator(s) and is bit larger, then the call tree becomes quite heavy-weight. It's comprised of recursive `validChoices()` calls which insert expressions into sets which in turn leads to further `equals()` calls and so forth. When upgrading ORT from version 28.0.0 to 45.0.0 a performance the evaluator hadn't finished after running for 3 days, while the reporter took about 3 hours to finish for some real world scan. This issue has been introduced by [1], because reverting that (on top of `main`) does fix the performance problem. Reduce the mentioned `n - 1` `and()` calls to just a single one to relax the issue. This makes the evaluator finish in 1.8 seconds and reporter in 3 seconds again, for the real world scan mentioned above. Fixes: #9902. [1]: 31b9be8 Signed-off-by: Frank Viernau <[email protected]>

There are multiple code locations which use `reduce` together with `and()` to concatenate a given collection of expressions to a compound SPDX expression. If `n` expressions are given, then `n - 1` SPDX compound expressions instances get constructed via `n - 1` `and()` calls. As of [1] each `and()` execution started to invoke `equals()` at the very beginning. Dependening on the expression this can be very expensive. For example if the expression contains OR operators and is bit larger, then the call tree becomes quite heavy-weight. It's comprised of recursive `validChoices()` calls which insert expressions into sets which in turn leads to further `equals()` calls and so forth. When upgrading ORT from version 28.0.0 to 45.0.0 a performance the evaluator hadn't finished after running for 3 days, while the reporter took about 3 hours to finish for some real world scan. This issue has been introduced by [1], because reverting that (on top of `main`) does fix the performance problem. Reduce the mentioned `n - 1` `and()` calls to just a single one to relax the issue. This makes the evaluator finish in 1.8 seconds and reporter in 3 seconds again, for the real world scan mentioned above. Fixes: #9902. [1]: 31b9be8 Signed-off-by: Frank Viernau <[email protected]>

There are multiple code locations which use `reduce` together with `and()` to concatenate a given collection of expressions to a compound SPDX expression. If `n` expressions are given, then `n - 1` SPDX compound expressions instances get constructed via `n - 1` `and()` calls. As of [1] each `and()` execution started to invoke `equals()` at the very beginning. Dependening on the expression this can be very expensive. For example if the expression contains OR operators and is bit larger, then the call tree becomes quite heavy-weight. It's comprised of recursive `validChoices()` calls which insert expressions into sets which in turn leads to further `equals()` calls and so forth. After upgrading ORT from version 28.0.0 to 45.0.0 the execution of the evaluator took more than 3 days finishing, and the reporter took about 3 hours to finish for some real world scan. That issue has been introduced by [1], because reverting [1] (on top of `main`) does fix the performance problem. Reduce the mentioned `n - 1` `and()` calls to just a single one to relax the issue. This makes the evaluator finish in 1.8 seconds and the reporter in 3 seconds again, for the real world scan mentioned above. Fixes: #9902. [1]: 31b9be8 Signed-off-by: Frank Viernau <[email protected]>

There are multiple code locations which use `reduce` together with `and()` to concatenate a given collection of expressions to a compound SPDX expression. If `n` expressions are given, then `n - 1` SPDX compound expressions instances get constructed via `n - 1` `and()` calls. As of [1] each `and()` execution started to invoke `equals()` at the very beginning. Dependening on the expression this can be very expensive. For example, if the expression contains OR operators and is a bit larger, then the call tree becomes quite heavy-weight. It's comprised of recursive `validChoices()` calls which insert expressions into sets which in turn leads to further `equals()` calls and so forth. After upgrading ORT from version 28.0.0 to 45.0.0 the execution of the evaluator took more than 3 days without finishing, and the reporter took about 3 hours to finish for some real world scan. That issue has been introduced by [1], because reverting [1] (on top of `main`) does fix the performance problem. Reduce the mentioned `n - 1` `and()` calls to just a single one to relax the issue. This makes the evaluator finish in 1.8 seconds and the reporter in 3 seconds again, for the real world scan mentioned above. Fixes: #9902. [1]: 31b9be8 Signed-off-by: Frank Viernau <[email protected]>

fviernau added bug evaluator About the evaluator tool reporter About the reporter tool spdx-utils About the SPDX utility library to triage Issues that need triaging labels Feb 7, 2025

fviernau changed the title ~~spdx-utils: Massiv performance within ResolvedLicenseInfo.effectiveLicense()~~ spdx-utils: Very poor performance within ResolvedLicenseInfo.effectiveLicense() Feb 7, 2025

fviernau mentioned this issue Feb 7, 2025

fix(spdx-utils): Fix a performance issue in SpdxExpression::and() #9903

Closed

fviernau removed the to triage Issues that need triaging label Feb 7, 2025

fviernau self-assigned this Feb 10, 2025

fviernau mentioned this issue Feb 10, 2025

fix(spdx-utils): Fix performance issue in callers of and() #9909

Merged

sschuberth removed the bug label Feb 10, 2025

fviernau closed this as completed in #9909 Feb 10, 2025

fviernau closed this as completed in 7243c75 Feb 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spdx-utils: Very poor performance within `ResolvedLicenseInfo.effectiveLicense()` #9902

spdx-utils: Very poor performance within `ResolvedLicenseInfo.effectiveLicense()` #9902

fviernau commented Feb 7, 2025 •

edited

Loading

spdx-utils: Very poor performance within ResolvedLicenseInfo.effectiveLicense() #9902

spdx-utils: Very poor performance within ResolvedLicenseInfo.effectiveLicense() #9902

Comments

fviernau commented Feb 7, 2025 • edited Loading

spdx-utils: Very poor performance within `ResolvedLicenseInfo.effectiveLicense()` #9902

spdx-utils: Very poor performance within `ResolvedLicenseInfo.effectiveLicense()` #9902

fviernau commented Feb 7, 2025 •

edited

Loading