-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cycles are possible in the dependency graph #83
Comments
@insertinterestingnamehere This is something we maybe to talk about and return to the system semantics. (I'm a bit embarrassed I didn't see this issue. I had this weird bad feeling about splitting tasks last year, but it was a useful feature and I didn't think follow through on the feeling.) |
Honestly, I'm not 100% sure of what the right fix for this is. Part of me is saying that it's okay to let people write a deadlock as long as we provide some way to detect that failure case easily. Probably just finding a way to raise an a meaningful error if there are tasks queued with none of them ready. Being able to await tasks you just spawned before doing other stuff is a win for usability even if it complicates the tasking model a bit. Our async/await stuff is a very reasonable answer to some "how do I write this easily" kind of questions. If I had to guess at a solution, I'd say we should just disallow child tasks depending on parent tasks. I can't think of any use-cases for it that aren't more naturally covered by other idioms that we do support. For example, it might be useful if you want a bunch of tasks to run after some setup code, but the way we designed that to be handled was to literally just launch the tasks after the setup was run. I don't think any of our current benchmarks/examples have a launched task depend on a parent. OTOH, detecting this might be nontrivial. We'd have to have each task track which task was running when it was originally launched and then crawl the implied stack for each task at launch time to raise an appropriate error. Tracking info about which task launched what seems appropriate for debugging. Crawling that info at launch time sounds expensive. |
I think we might actually already track that. Or at least I thought about doing it. The reason was that I wanted to provide fake stacks for exceptions so that you could see the "spawn" stack instead of the real stack used in the worker thread. This spawn stack is important for debugging, and if available we could use it for the checking you want. We could also have a fall back cycle checker that runs if the scheduler runs out of work while tasks still exist, to trigger a warning that advises turning on the stack tracking debugging feature. |
Yah, +1 to both these ideas. More readable stack traces would be a big win. Falling back to cycle detection to raise a meaningful error when deadlock occurs makes sense since we won't pay the cost at runtime. |
Another thought on this: this would be very easy to detect in the sequential execution case. We need to add that execution mode for better debugging anyway. During sequential execution we could just raise an error on task creation if any of the specified dependencies haven't already completed. Given that this kind of deadlock is the result of a user doing something weird, addressing this that way should be good enough to address this specific issue (though I'm still in favor of better error stacks and better error messages from the scheduler). |
Because of the trick of allowing a task to be split and having any task that depends on the original task depend on the final segment split away from the task. The issue is that since we are "pausing" (really splitting) a task after create new tasks and blocking on tasks (including new ones maybe), the new tasks can potentially have their parent as a dependency. The new task depending on the creator is reasonable. And the creator depending on the new task is reasonable. However if we do both, we have a cycle.
This breaks some of our guarantees and it's not at all clear to me how to fix it.
The text was updated successfully, but these errors were encountered: