-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle context cancellation properly #428
Conversation
a previous fix #391, attempted to address a lock that occurred on context cancel. however in doing so, it introduced a new lock. essentially, if a message was not sent to the requestmanager/responsemanager go routine, waiting for a response to that message could last indefinitely. the previous fix therefore stopped waiting when the calling context cancelled. However once a message reaches the go routine of the requestmanager/responsemanager, it's important that it's processed to completion, so that the the message loop doesn't lock. The proper fix is to detect when the message is sent successfully, and if so, wait for it to be processed. If it isn't sent, we can safely abort the go routine immediately.
8637464
to
5b921b5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fine to me, the removal of the case <-ctx.Done():
in the public API (client) functions being the key.
However the failure of TestBlockHooks/responding_to_extensions
in one of the CIs does ring a few bells because it seems close to this code.
think I found it, unrelated, so double fix! |
* fix(cancellation): handle message cancellation properly a previous fix ipfs#391, attempted to address a lock that occurred on context cancel. however in doing so, it introduced a new lock. essentially, if a message was not sent to the requestmanager/responsemanager go routine, waiting for a response to that message could last indefinitely. the previous fix therefore stopped waiting when the calling context cancelled. However once a message reaches the go routine of the requestmanager/responsemanager, it's important that it's processed to completion, so that the the message loop doesn't lock. The proper fix is to detect when the message is sent successfully, and if so, wait for it to be processed. If it isn't sent, we can safely abort the go routine immediately. * fix(race): resolve race condition with test responses
* fix(cancellation): handle message cancellation properly a previous fix ipfs#391, attempted to address a lock that occurred on context cancel. however in doing so, it introduced a new lock. essentially, if a message was not sent to the requestmanager/responsemanager go routine, waiting for a response to that message could last indefinitely. the previous fix therefore stopped waiting when the calling context cancelled. However once a message reaches the go routine of the requestmanager/responsemanager, it's important that it's processed to completion, so that the the message loop doesn't lock. The proper fix is to detect when the message is sent successfully, and if so, wait for it to be processed. If it isn't sent, we can safely abort the go routine immediately. * fix(race): resolve race condition with test responses
Handle context cancellation properly (ipfs#428)
A previous fix #391, attempted to address a lock that occurred when a client facing function was called with a context that was cancelled. However in doing so, this PR introduced a new, potentially more critical lock the request manager/response manager message loop.
The original issue is as follows: if a message was not sent to the requestmanager/responsemanager go routine because the calling context is cancelled, then subsequent code waiting for a response to that message being processed could block indefinitely.
The previous fix therefore stopped waiting for a response when the calling context cancelled.
However once a message reaches the go routine of the requestmanager/responsemanager, it's important that it's processed to completion, so that the the message loop doesn't lock. If we stop waiting for a response, the message loop itself can lock trying to send a response to the message.
The proper fix is to detect when the message is sent to the message loop successfully vs aborted due to context cancellation. If it is sent successfully before the calling context cancels, then we need to wait for it to be processed, even if the calling context cancels while it's processed (this should be a miniscule amount of time). If it isn't sent before the context cancels, we can safely abort the go routine immediately.