Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go: add deadlock detector, improve locking #618

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

djs55
Copy link
Collaborator

@djs55 djs55 commented Feb 17, 2023

We weren't fully compliant with the net.Conn interface as concurrent Write calls had problems:

  1. concurrent Write calls could exceed the allowed window size: one would see space available, drop the lock to write, allow a second to see the same space and write
  2. concurrent Write calls could block: when the window is closed they would block in Wait() but the window update message only used Signal() and would only wake up one call, leaving the other blocked. It should use Broadcast().

The Write() timeout handling was strange because it offloaded the Wait() to a goroutine. It's simpler to keep the Wait() on the main goroutine and do a Broadcast() from an AfterFunc. It's also good to stop the timer in the common case where you don't need it, rather than leave them to build up.

Also

  • add a deadlock detector
  • update the logging library
  • ensure the tests run on Windows

Signed-off-by: David Scott <[email protected]>
The code is clearer if each object only acquires its own mutex, and
doesn't reach into the mutexes of others (if possible)

Signed-off-by: David Scott <[email protected]>
The Write window is used to keep track of how much buffer space is
free in the remote to avoid one connection blocking the rest.
Previously we checked the window and decided how much to write, then
dropped the metadata mutex before performing the write. In theory
another Write call on the same connection could see that buffer
size is free, send too much and block the connection.

Therefore Write should take ownership of the space by bumping the
`current` window before dropping the lock.

Signed-off-by: David Scott <[email protected]>
Previously we had a complicated set of channels and a condition
variable, and the condition variable Wait() was in a goroutine.

Instead the Wait() is now in the main goroutine and an optional
timeout is in an optional goroutine.

Signed-off-by: David Scott <[email protected]>
If multiple calls are blocked waiting for window space, we need to
use Broadcast() to wake them all up on a window update.

Signed-off-by: David Scott <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant