-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YUNIKORN-2910] Fix data corruption due to insufficient shim context locking #924
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #924 +/- ##
==========================================
+ Coverage 68.21% 68.69% +0.47%
==========================================
Files 70 70
Lines 7621 7752 +131
==========================================
+ Hits 5199 5325 +126
- Misses 2213 2217 +4
- Partials 209 210 +1 ☔ View full report in Codecov by Sentry. |
…locking Restore context locking that was removed as part of YUNIKORN-2629. The locks are necessary to prevent logical data corruption due to concurrent processing of both pod and node events.
2360602
to
b7c83c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a good way to mock or reproduce those data race/corruption issues?
It will be very helpful if we can do it easily, for both e2e testing or unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
I don't think so. It's very difficult to repro things like that. Sometimes it's totally workload dependent. |
Ok, i see. |
I am working on making issues more reproducible. Will share updates more widely. |
What is this PR for?
Restore context locking that was removed as part of YUNIKORN-2629. The locks are necessary to prevent logical data corruption due to concurrent processing of both pod and node events.
What type of PR is it?
Todos
What is the Jira issue?
https://issues.apache.org/jira/browse/YUNIKORN-2910
How should this be tested?
Ensure existing tests still pass and new deadlocks are not introduced.
Screenshots (if appropriate)
Questions: