-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zookeeper: Make Create+Put on a new znode atomic #148
base: master
Are you sure you want to change the base?
Conversation
Writes to a new Zookeepr znode should take advantage of Zookeeper's atomic create + write primitive. If not, it is possible that a read that was triggered by a watch will return an empty string. The previous workaround for this does not always work (e.g., when an empty string is returnedi due to a race) and can potentially cause call-stack overflow. This change-set fixes this race and removes the workaround. It also adds a call to Zookeepeer's Sync() on a Get operation, only when an empty string (or SOH) is returned to guard against an older version of libkv doing create+write in a non atomic fashion. This change-set addresses github.com/docker-archive/classicswarm/issues/1915 Signed-off-by: Amir Malekpour <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, we should add test to validate the changes.
// SOH control character instead of the actual value | ||
if string(resp) == SOH { | ||
return s.Get(store.Normalize(key)) | ||
if (string(resp) == SOH || string(resp) == "") && i < syncRetryLimit { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel i < syncRetryLimit
here is unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongluochen The i < syncRetryLimit is to prevent a Sync in the very last iteration of the for loop. Without this, we would unnecessarily call sync and then exit the loop. (So effectively we call sync not more than syncRetryLimit number of times).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
I'm not familiar with this repo. CI is failing for last few PRs.
|
// SOH control character instead of the actual value | ||
if string(resp) == SOH { | ||
return s.Get(store.Normalize(key)) | ||
if (string(resp) == SOH || string(resp) == "") && i < syncRetryLimit { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
defaultTimeout = 10 * time.Second | ||
defaultTimeout = 10 * time.Second | ||
|
||
syncRetryLimit = 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to choose 5? I don't know but it seems high. How much delay this may introduce?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongluochen No specific reason, except in my test environment 3 retries would always be needed to get the correct value, and I rounded this up to 5.
Note that the sync currently happens on SOH or empty string getting returned, and not on all reads. Also this is to work around older versions of libkv which have this bug. The versions that have this patch will do create+write atomically and should not return SOH or empty so the sync should never be needed.
Ping @docker/core-libkv-maintainers. CI is also failing. |
@dongluochen my PR fixes it #154 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amirma are you still working on this?
@nishanttotla We've merged this in our production repository and I can confirm it has fixed the issue. I think the patch is ready to merge as is. |
@amirma thanks for the prompt reply! Seems like your build is failing. Could you take a look at that? It would be nice to merge this. |
@nishanttotla I just pushed another build and it failed due to a new (I believe infrastructure-related) reason. Any insights? |
It seems like some package paths/locations may have changed:
Ping @docker/core-libkv-maintainers, the CI might need fixing. |
The fix is included to my fork: abronan/libkv along with other patches, thanks @amirma. |
@abronan should we merge this PR, or might it need more changes? I think this is useful and fixes a long standing issue. |
@nishanttotla I think it is safe to merge it yes, just merge the other PRs that are fixing the build first (the commit history on my fork could be of help). |
Writes to a new Zookeepr znode should take advantage of Zookeeper's
atomic create + write primitive. If not, it is possible that a read
that was triggered by a watch will return an empty string. The previous
workaround for this does not always work (e.g., when an empty string
is returned due to a race) and can potentially cause call-stack overflow.
This change-set fixes this race and removes the workaround.
It also adds a call to Zookeepeer's Sync() on a Get operation,
only when an empty string (or SOH) is returned to guard against an older
version of libkv doing create+write in a non atomic fashion.
This change-set addresses github.com/docker-archive/classicswarm/issues/1915
Signed-off-by: Amir Malekpour [email protected]