-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
syncRoot barrier does not block for unfulfilled pledges #97
Comments
In particular in this finite state automaton, a new event + transition or changing the MyTreeIsIdle event to also check for delayed tasks should be sufficient: Lines 107 to 136 in 3bfb641
|
* Try to isolate the recurrent nil pledge bug on Travis MacOS * Remove some weird delaying logic * Try to sense if Travis bug is linked to #97 * more verbosity * nextDep wasn't properly zero-initialized * remove a repr * missed another "nil" * Cleanup investigative changes
Actually this is not so simple. If a task is delayed, its owner is the That said:
So where is the bug? A sanity check would be for each worker to count the delayed tasks created and processed. On idle, they send that count to their parent. We can then assert in the syncRootBarrier that when the runtime is quiescent the delayed tasks created/process counts are the same. |
* model checking - 1st try to fix MPSC queue (the model checker crashes with not enough memory :/) * Give the thread the opportunity to not deadlock on sleep on Mac/with Clang * whoopsie * Add impl of Weave MPSC channel in C++ for CDSChecker model checking + comment out fences * Comment out GEMM tests for syncRoot + Pledges: #97 * don't use sleep, it's can deadlock in the CI ... * Try get epoch time to avoid mac bugs * use `getTime` and hope that it's properly implemented on Mac * State-machine, return to CheckTask to avoid leaving task spawning multitasks in queue (followup #121) * Don't spinlock for testing, deadlocks ARM and OSX * Could it be non-mono clock jitter? * Add some log for MacOS debug * Race condition between spawning the thread and entering the spinlock in the `isReady` test * a,d obviously I mess up the function call
syncRoot does not take into account unfulfilled pledges.
This is related to workers being able to send termination even though they have a task delayed by a pledge.
This can be avoided by workers keeping a local count of the delayed tasks they have pending.
See: https://dev.azure.com/numforge/Weave/_build/results?buildId=288&view=logs&j=6b0d97ed-2246-525a-6e2b-9532bade852d&t=90276eb0-5b44-500a-95e7-4e793047d715
The text was updated successfully, but these errors were encountered: