Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QD-7638] Support multiple (& possibly different) fingerprint versions #13

Merged
merged 15 commits into from
Jan 12, 2024

Conversation

jckoenen
Copy link
Contributor

@jckoenen jckoenen commented Dec 8, 2023

Completely rewrote the algorithm for baseline calculation, as it had two major problems when dealing with fingerprints:

  • The calculation always picked the latest version of a fingerprint. So, if we had a baseline which contains only v1 prints, and now a new report which contains v1 and v2 prints, issues would no longer be compared correctly. This means, introducing a v2 would break all existing baselines immediately. (The opposite is also true: a baseline generated with v1 and v2 could not be successfully compared to a report containing only v1)
  • Changing the behaviour to check all prints breaks the algorithm, as it requires a 1-1 connection between result and print. It would start to find way more results than expected.

The commit chain reflects the path I took and can be reviewed individually. But I will squash the the PR in the end.

Copy link

github-actions bot commented Dec 8, 2023

Qodana for JVM

It seems all right 👌

No new problems were found according to the checks applied

☁️ View the detailed Qodana report

Dependencies licenses

Third-party software list

This page lists the third-party software dependencies used in qodana-sarif

Dependency Version Licenses
annotations 13.0 Apache-2.0
gson 2.8.9 Apache-2.0
kotlin-stdlib 1.9.21 Apache-2.0
Contact Qodana team

Contact us at [email protected]

@jckoenen jckoenen marked this pull request as ready for review December 8, 2023 13:12
@jckoenen
Copy link
Contributor Author

@avafanasiev / @hybloid PTAL. This also passes with all test-data in IJ

Copy link

github-actions bot commented Dec 11, 2023

Qodana for JVM

It seems all right 👌

No new problems were found according to the checks applied

View the detailed Qodana report

To be able to view the detailed Qodana report, you can either:

  1. Register at Qodana Cloud and configure the action
  2. Use GitHub Code Scanning with Qodana
  3. Host Qodana report at GitHub Pages
  4. Inspect and use qodana.sarif.json (see the Qodana SARIF format for details)

To get *.log files or any other Qodana artifacts, run the action with upload-result option set to true,
so that the action will upload the files as the job artifacts:

      - name: 'Qodana Scan'
        uses: JetBrains/[email protected]
        with:
          upload-result: true
Dependencies licenses

Third-party software list

This page lists the third-party software dependencies used in qodana-sarif

Dependency Version Licenses
annotations 13.0 Apache-2.0
gson 2.8.9 Apache-2.0
kotlin-stdlib 1.9.21 Apache-2.0
Contact Qodana team

Contact us at [email protected]

undecidedFromBaseline.each { result ->
val foundInReport = result.equalIndicators
.flatMap(reportIndex::getOrEmpty)
.filter(undecidedFromReport::remove)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very ineffective.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is, but it still outperforms a Set based calculation because of the heavy hashCode, at least for the reports I checked (100/1000/10_000 results). The new algorithm also outperforms the old implementation on these.

Do you have a suggestion to improve here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So running through ArrayList with equals and then removing element from the middle of it is ok?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100k is also quite possible.

val undecidedFromBaseline = baseline.results.noNulls()
.filterNot { it.baselineState == BaselineState.ABSENT }
.onEach { result -> baselineIndex.add(ResultKey(result), result) }
.toMutableSet()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, now you have 4 different hashes? Result has very heavy hashcode and equals calculation.


private typealias Fingerprint = String
private typealias StrictIndex = MultiMap<Fingerprint, Result>
private typealias LaxIndex = MultiMap<ResultKey, Result>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably dont need here result, counter would be enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, just found that when using a counter I can drop the Set conversion from undecidedFromBaseline

undecidedFromBaseline.each { result ->
val foundInReport = result.equalIndicators
.flatMap(reportIndex::getOrEmpty)
.filter(undecidedFromReport::remove)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So running through ArrayList with equals and then removing element from the middle of it is ok?

undecidedFromBaseline.each { result ->
val foundInReport = result.equalIndicators
.flatMap(reportIndex::getOrEmpty)
.filter(undecidedFromReport::remove)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100k is also quite possible.

@jckoenen jckoenen force-pushed the QD-7638/multi-equality branch 2 times, most recently from c8f3493 to 7afc841 Compare January 5, 2024 13:07
foundInReport -> remove()
!options.wasChecked.apply(result) -> {
remove()
.filter { result ->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i didn't like the style than filter lambda produces side effects, it's quite counterintuitive and leads to poor code readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree in principal, but I think it's also common that filter() calls do things like set.add or counter checks.

Anyway, refactored out all but two very simple side-effects, where refactoring would make it less clear imho

!options.wasChecked.apply(result) -> {
remove()
.filter { result ->
val foundInReport = result.equalIndicators
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it means we are comparing hashes v1 against v2. It shouldnt be really bad for performance. But it could be important for correctness.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed that. For now it doesn't affect correctness, because the hash algorithms are not compatible with each other. An alternative approach could also be to first determine which version of fingerprint to use, but I don't think it's necessary atm.

The RunResultGroup class has been simplified, reducing the complexity of baseline calculation. The changes include renaming FingerprintIndex and KeyIndex to StrictIndex and LaxIndex, respectively, and streamlining processes to efficiently manage report results with state. Several baseline calculation steps have been modified, enhancing the diff creation process and making the code more concise. In addition, a test has been temporarily disabled due to outdated test data.
- Keep only single result set
- Reduce number of iterations
Also convert most `Set` to list, as hashCode comparison turned out to be useless.
@jckoenen jckoenen force-pushed the QD-7638/multi-equality branch from 7afc841 to f15c836 Compare January 10, 2024 15:07
@jckoenen jckoenen merged commit ed27354 into main Jan 12, 2024
4 checks passed
@jckoenen jckoenen deleted the QD-7638/multi-equality branch January 12, 2024 11:06
Copy link
Contributor

@hybloid hybloid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants