[QD-7638] Support multiple (& possibly different) fingerprint versions #13

jckoenen · 2023-12-08T13:04:33Z

Completely rewrote the algorithm for baseline calculation, as it had two major problems when dealing with fingerprints:

The calculation always picked the latest version of a fingerprint. So, if we had a baseline which contains only v1 prints, and now a new report which contains v1 and v2 prints, issues would no longer be compared correctly. This means, introducing a v2 would break all existing baselines immediately. (The opposite is also true: a baseline generated with v1 and v2 could not be successfully compared to a report containing only v1)
Changing the behaviour to check all prints breaks the algorithm, as it requires a 1-1 connection between result and print. It would start to find way more results than expected.

The commit chain reflects the path I took and can be reviewed individually. But I will squash the the PR in the end.

github-actions · 2023-12-08T13:10:24Z

Qodana for JVM

It seems all right 👌

No new problems were found according to the checks applied

☁️ View the detailed Qodana report

Dependencies licenses

Third-party software list

This page lists the third-party software dependencies used in qodana-sarif

Dependency	Version	Licenses
annotations	13.0	Apache-2.0
gson	2.8.9	Apache-2.0
kotlin-stdlib	1.9.21	Apache-2.0

Contact Qodana team

Contact us at [email protected]

Or via our issue tracker: https://jb.gg/qodana-issue
Or share your feedback: https://jb.gg/qodana-discussions

jckoenen · 2023-12-11T12:14:25Z

@avafanasiev / @hybloid PTAL. This also passes with all test-data in IJ

github-actions · 2023-12-11T17:43:26Z

Qodana for JVM

It seems all right 👌

No new problems were found according to the checks applied

View the detailed Qodana report

To be able to view the detailed Qodana report, you can either:

Register at Qodana Cloud and configure the action
Use GitHub Code Scanning with Qodana
Host Qodana report at GitHub Pages
Inspect and use qodana.sarif.json (see the Qodana SARIF format for details)

To get *.log files or any other Qodana artifacts, run the action with upload-result option set to true,
so that the action will upload the files as the job artifacts:

      - name: 'Qodana Scan'
        uses: JetBrains/[email protected]
        with:
          upload-result: true

Dependencies licenses

Third-party software list

This page lists the third-party software dependencies used in qodana-sarif

Dependency	Version	Licenses
annotations	13.0	Apache-2.0
gson	2.8.9	Apache-2.0
kotlin-stdlib	1.9.21	Apache-2.0

Contact Qodana team

Contact us at [email protected]

Or via our issue tracker: https://jb.gg/qodana-issue
Or share your feedback: https://jb.gg/qodana-discussions

sarif/src/main/java/com/jetbrains/qodana/sarif/baseline/Baseline.kt

avafanasiev · 2023-12-12T21:54:48Z

sarif/src/main/java/com/jetbrains/qodana/sarif/baseline/Baseline.kt

+    undecidedFromBaseline.each { result ->
+        val foundInReport = result.equalIndicators
+            .flatMap(reportIndex::getOrEmpty)
+            .filter(undecidedFromReport::remove)


This is very ineffective.

Yes it is, but it still outperforms a Set based calculation because of the heavy hashCode, at least for the reports I checked (100/1000/10_000 results). The new algorithm also outperforms the old implementation on these.

Do you have a suggestion to improve here?

So running through ArrayList with equals and then removing element from the middle of it is ok?

100k is also quite possible.

avafanasiev · 2023-12-12T21:59:06Z

sarif/src/main/java/com/jetbrains/qodana/sarif/baseline/Baseline.kt

+    val undecidedFromBaseline = baseline.results.noNulls()
+        .filterNot { it.baselineState == BaselineState.ABSENT }
+        .onEach { result -> baselineIndex.add(ResultKey(result), result) }
+        .toMutableSet()


So, now you have 4 different hashes? Result has very heavy hashcode and equals calculation.

avafanasiev · 2023-12-12T22:32:58Z

sarif/src/main/java/com/jetbrains/qodana/sarif/baseline/Baseline.kt

+
+private typealias Fingerprint = String
+private typealias StrictIndex = MultiMap<Fingerprint, Result>
+private typealias LaxIndex = MultiMap<ResultKey, Result>


probably dont need here result, counter would be enough.

Very nice, just found that when using a counter I can drop the Set conversion from undecidedFromBaseline

avafanasiev · 2023-12-18T21:18:11Z

sarif/src/main/java/com/jetbrains/qodana/sarif/baseline/Baseline.kt

+    undecidedFromBaseline.each { result ->
+        val foundInReport = result.equalIndicators
+            .flatMap(reportIndex::getOrEmpty)
+            .filter(undecidedFromReport::remove)


So running through ArrayList with equals and then removing element from the middle of it is ok?

avafanasiev · 2023-12-18T21:19:08Z

sarif/src/main/java/com/jetbrains/qodana/sarif/baseline/Baseline.kt

+    undecidedFromBaseline.each { result ->
+        val foundInReport = result.equalIndicators
+            .flatMap(reportIndex::getOrEmpty)
+            .filter(undecidedFromReport::remove)


100k is also quite possible.

sarif/src/main/java/com/jetbrains/qodana/sarif/baseline/Baseline.kt

avafanasiev · 2024-01-09T23:25:42Z

sarif/src/main/java/com/jetbrains/qodana/sarif/baseline/Baseline.kt

-            foundInReport -> remove()
-            !options.wasChecked.apply(result) -> {
-                remove()
+        .filter { result ->


i didn't like the style than filter lambda produces side effects, it's quite counterintuitive and leads to poor code readability.

I agree in principal, but I think it's also common that filter() calls do things like set.add or counter checks.

Anyway, refactored out all but two very simple side-effects, where refactoring would make it less clear imho

avafanasiev · 2024-01-09T23:28:01Z

sarif/src/main/java/com/jetbrains/qodana/sarif/baseline/Baseline.kt

-            !options.wasChecked.apply(result) -> {
-                remove()
+        .filter { result ->
+            val foundInReport = result.equalIndicators


it means we are comparing hashes v1 against v2. It shouldnt be really bad for performance. But it could be important for correctness.

Changed that. For now it doesn't affect correctness, because the hash algorithms are not compatible with each other. An alternative approach could also be to first determine which version of fingerprint to use, but I don't think it's necessary atm.

The RunResultGroup class has been simplified, reducing the complexity of baseline calculation. The changes include renaming FingerprintIndex and KeyIndex to StrictIndex and LaxIndex, respectively, and streamlining processes to efficiently manage report results with state. Several baseline calculation steps have been modified, enhancing the diff creation process and making the code more concise. In addition, a test has been temporarily disabled due to outdated test data.

- Keep only single result set - Reduce number of iterations

Also convert most `Set` to list, as hashCode comparison turned out to be useless.

…only once

hybloid

LGTM

jckoenen requested review from avafanasiev and hybloid December 8, 2023 13:04

jckoenen marked this pull request as ready for review December 8, 2023 13:12

avafanasiev requested changes Dec 12, 2023

View reviewed changes

avafanasiev requested changes Dec 18, 2023

View reviewed changes

jckoenen force-pushed the QD-7638/multi-equality branch 2 times, most recently from c8f3493 to 7afc841 Compare January 5, 2024 13:07

avafanasiev requested changes Jan 9, 2024

View reviewed changes

jckoenen added 15 commits January 10, 2024 13:28

Convert building of baseline lookups to kotlin

11e015e

Extract RunResultGroup class to separate file

a9e0ef9

Rename .java to .kt

67eebaa

Convert RunResultGroup to kotlin, merge with Baseline.kt

055c5da

Comments

11ef843

Remove RunResultGroup class, convert to single function

f92e4d7

Add test for same results with different message

76bbf2e

Optimize BaselineCalculation, removed unused method

45203f2

- Keep only single result set - Reduce number of iterations

Better handling for multiple results with same fingerprint

08481b8

Also convert most `Set` to list, as hashCode comparison turned out to be useless.

Review: Change lax index to counter, remove Set conversion

aacf7f7

Review: Replace ArrayList removal with IdentitySet, iterate baseline …

3ffa548

…only once

Review: Remove always false second wasChecked invocation

e47f283

Review: Remove side-effects from calls

86e6ab6

Review: Only compare equal version in fingerprints

f15c836

jckoenen force-pushed the QD-7638/multi-equality branch from 7afc841 to f15c836 Compare January 10, 2024 15:07

avafanasiev approved these changes Jan 12, 2024

View reviewed changes

jckoenen merged commit ed27354 into main Jan 12, 2024
4 checks passed

jckoenen deleted the QD-7638/multi-equality branch January 12, 2024 11:06

hybloid reviewed Jan 12, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QD-7638] Support multiple (& possibly different) fingerprint versions #13

[QD-7638] Support multiple (& possibly different) fingerprint versions #13

jckoenen commented Dec 8, 2023 •

edited

Loading

github-actions bot commented Dec 8, 2023 •

edited

Loading

Third-party software list

jckoenen commented Dec 11, 2023

github-actions bot commented Dec 11, 2023 •

edited

Loading

Third-party software list

avafanasiev Dec 12, 2023

jckoenen Dec 18, 2023

avafanasiev Dec 18, 2023

avafanasiev Dec 18, 2023

avafanasiev Dec 12, 2023

avafanasiev Dec 12, 2023

jckoenen Dec 18, 2023

avafanasiev Dec 18, 2023

avafanasiev Dec 18, 2023

avafanasiev Jan 9, 2024

jckoenen Jan 10, 2024

avafanasiev Jan 9, 2024

jckoenen Jan 10, 2024

hybloid left a comment

[QD-7638] Support multiple (& possibly different) fingerprint versions #13

[QD-7638] Support multiple (& possibly different) fingerprint versions #13

Conversation

jckoenen commented Dec 8, 2023 • edited Loading

github-actions bot commented Dec 8, 2023 • edited Loading

Qodana for JVM

Third-party software list

jckoenen commented Dec 11, 2023

github-actions bot commented Dec 11, 2023 • edited Loading

Qodana for JVM

Third-party software list

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hybloid left a comment

Choose a reason for hiding this comment

jckoenen commented Dec 8, 2023 •

edited

Loading

github-actions bot commented Dec 8, 2023 •

edited

Loading

github-actions bot commented Dec 11, 2023 •

edited

Loading