Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements for referential integrity checking #980

Merged
merged 4 commits into from
Aug 28, 2024

Conversation

al-niessner
Copy link
Contributor

@al-niessner al-niessner commented Aug 23, 2024

🗒️ Summary

Multi step process of cleaning up validate to improve performance:

  1. Fixed ValidateTarget to be compliant with Java: hashCode() returned different values when equals() returned true.
  2. Fixed ValidateTarget.cachedTargets was implemented in a Utility class so very few creators of ValidateTarget used it. Moved cache to ValidateTarget, privatized constructors, used factory methods (build) for consistent use of cache.
  3. Fixing Identifier to be compliant with Java: hashCode() returned different values when equals() returned true.
  4. Fix all of the loops to use contains() instead. Those loops are a consequence of (3).

⚙️ Test Data and/or Report

Automated unit/regression tests below should pass.

♻️ Related Issues

Closes #969

Step one: removed the tallest tent pole by fixing the ValidationTarget cache. Not everything was being cached. In fact, little to none of it was. Implementing this sped up the final step that counts on this by 10x. The 11000 strong bundle went from 73 seconds to 7.3 seconds. The overall time remains fixed at 1500 seconds. Not sure if the 73 seconds were simply redistributed or if that is just normal variation of an uncertain process (depends on network download times).
@al-niessner al-niessner self-assigned this Aug 23, 2024
@al-niessner al-niessner requested a review from a team as a code owner August 23, 2024 21:20
@al-niessner al-niessner marked this pull request as draft August 23, 2024 21:20
Copy link
Member

@nutjob4life nutjob4life left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Al Niessner added 3 commits August 26, 2024 09:22
TargetExaminer does not need to read the same file over and over. Do it once, collecting the information needed and then store for later. Yes, another cache.

The look in referential integrity was overly complicated. Reduced it to a HashSet.contains() which really speeds things up.
@al-niessner al-niessner marked this pull request as ready for review August 27, 2024 23:37
@al-niessner
Copy link
Contributor Author

Unfortunately, it did not improve the unit testing time. Must be dominated by something else.

@jordanpadams jordanpadams changed the title 969: performance improvement Performance improvements for referential integrity checking Aug 28, 2024
@jordanpadams jordanpadams merged commit be8e40e into main Aug 28, 2024
3 checks passed
@jordanpadams jordanpadams deleted the issue_969 branch August 28, 2024 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Slow performance with all content and product validation turned off
3 participants