-
Notifications
You must be signed in to change notification settings - Fork 231
Annotation removal using locking #3015
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v5.3
Are you sure you want to change the base?
Conversation
Signed-off-by: Steve Hawkins <[email protected]>
Signed-off-by: Steve Hawkins <[email protected]>
d650f86
to
8558c1a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally like this approarch. What we should do is also document it on high level, like javadoc on TemporaryResourceCache
. Like the general idea; using lock striping for IO operations, etc.
Not sure if all the scenarios are covered with unit tests, probably those could be also added.
Will have to spend more time to think through all the scenarion, overall LGTM, added some comments, and questions to clarify.
} | ||
try { | ||
temporaryResourceCache.startModifying(id); | ||
var updated = updateMethod.apply(resourceToUpdate); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to iterate over this, the start modifyn and recent resource update should be called from an utils, but fine by me now as it is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had initially started down the path of a common method with the primary cache utils - but for clarity this seemed like a better place to start.
Are you expecting that we'll want to push the event skipping logic down into ManagedInformerEventSource, and/or are you thinking that we'll effectively be able to use a Context rather than a direct reference to the event source?
.equals(previousResourceVersion)) | ||
|| isLaterResourceVersion(resourceId, newResource, cachedResource))) { | ||
// first check against the source in general - this also prevents resurrecting resources when | ||
// we've already seen the deletion event |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned on other PR , would be nice to describe the resurrection scenario more in detial
boolean[] known = new boolean[1]; | ||
synchronized (this) { | ||
if (!unknownState) { | ||
latestResourceVersion = resource.getMetadata().getResourceVersion(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While is this better this way as using the latest sync version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are effectively using the TemporaryResourceCache to reason about things the reconciler was responsible for creating / updating.
If we skip putting something in the cache, then we may not skip that event as intended.
Let's say the informer has seen:
- create v1
- modify v2 - the reconciler did this
- modify v3 - something else did this
But has only delivered to the TemporaryResourceCache
- create v1
And we're now attempting a put with v2. If we use the informer latest version, we won't actually put v2 into the TemporaryResourceCache as it looks out-of-date. But then on onAddOrUpdateEvent we'll answer that the event wasn't known and trigger a reconciliation.
So by using a local notion of latestResourceVersion we're reasoning over what the TemporaryResourceCache actually knows.
Also because the get logic has changed to check both the TemporaryResourceCache and the informer cache, we won't actually return an out-of-date resource due to the put.
private final GroupVersionKind groupVersionKind; | ||
private final InformerConfiguration<R> informerConfig; | ||
private final KubernetesClient kubernetesClient; | ||
private final boolean comparableResourceVersions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add this also to @Informer
annotation
|
||
@Override | ||
public void onDelete(R resource, boolean b) { | ||
public synchronized void onDelete(R resource, boolean b) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we have this synchronized, could you elaborate pls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current logic has start and the onAddOrUpdate methods synchronized. I'm assuming that the intention there is to effectively block event handling until the primaryToSecondaryIndex has been built from the initial listing. I don't think delete should be handled differently.
That could also be done with a more granular lock around the usages of primaryToSecondaryIndex.
In general there's effectively a single threaded model of event delivery, so unless users are calling the onXXX methods manually, those calls will be serialized.
Replacement / alternative to #2999.
Layered on the tombstone removal changes / default enablement of parsing resource versions. It turns out that some of the tombstone changes needed refinement anyway. For example, we need to compare to a localized latest version, not to the cache.
This seems like the most straight-forward way to do this with locking - but of course that means we have to be very careful with what locks are already held. If this seems too risky, then queuing managed in the TemporaryResourceCache could be used instead, but that will need more code restructuring.
These changes also assume we only want to deal with this skipping problem when using the InformerEventSource. That could be generalized if needed for #3011
Much like the PrimaryUpdateAndCacheUtils, users will need to call InformerEventSource.PrimaryUpdateAndCacheUtils, to properly use this via custom logic.
This does not do any cleanup the extraneous puts into the cache because of onCreated / onUpdated being called after the updateAndCacheResource method.
This also does not try to reason about deletes.
cc @csviri @metacosm