-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Progress halt with a certain future composition on the grammers-client API user side #282
Comments
The root cause has been boiled down to how locking inside of the Note that if There are no issues with I'd argue that |
@Lonami how are you willing to proceed? I can work on a fix for this when I get back to it - but at the moment I'm busy elsewhere. |
I still need enough brain power to read through and truly understand the problem. I don't know when that'll be. I'm not under any time pressure to see this fixed. So if you have a clear path forward on your mind, that'd be best. |
This is a very peculiar issue, but fortunately I have managed to come up with an relatively small example that reproduces this bug: https://github.com/MOZGIII/grammers-halt
The issue manifests with this:
If the code was working as intended, the log line
main::reconciler: before iter next
should've been followed by something else - either an error, ormain::reconciler: after iter next
; however the code stalls and does not progress.My investigation showed:
There are no blocking locks held by any threads that would block the further execution, and also there are no async locks held besides the one lock on the
grammers_mtsender::Sender
by thegrammers_client::Client
call ofnext_raw_update
call under the hood - which is intended; in other words this is not deadlock in a general sense.Tested this via tokio console from async and lldb for sync mutexes.
The issue is not with
tokio::sync::Mutex
specifically, since I have tried swapping the mutex implementation in question withasync_mutex::Mutex
and got the same outcome - the app stalled.The
select
chain atgrammers/lib/grammers-mtsender/src/lib.rs
Lines 333 to 338 in 724188f
select
is the issue here, because it seems that the reason they do not progress is because the whole task doesn't get scheduled - i.e. the futures (or, rather, the combinedselect
future) do not progress because are not polled, meaning the task is simply not schedule - not because the futures are polled but are stuck.I have managed to test this by trying this code here:
Screenshot of the code here.
Sorry for the screenshot, didn't commit it, and this is all I got now.
I wrapped the
grammers/lib/grammers-mtsender/src/lib.rs
Line 321 in 724188f
with
tokio::spawn
and added logging inside thatspawn
ed future after thesleep
returns; then made theselect
wait onJoinHandle
of that spawned task. After a minute, I saw the log message - which means that timer worked - butselect
that was polling on theJoinHandle
didn't return - which means that the task that executes select was not scheduled uponJoinHandle
returning.Rather odd, yet maybe I'm missing something.
I don't know what's going on there, but I suspect a
tokio
bug.I have managed to create a workaround - essentially the same logic but executed with two independent spawned tasks instead of structured concurrency, it can be found at https://github.com/MOZGIII/grammers-halt/tree/working1.
See MOZGIII/grammers-halt@master...working1 for the changes that make the code work - does not solve the issue, but at least might be useful for someone who needs to work around this bug in the meantime.
The text was updated successfully, but these errors were encountered: