-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revise deadlock detection #22
Comments
Inner tasks will be disallowed to start new tasks as their duplication feature introduces a transient state in which deadlock is hard (or impossible?) to detect. If we'd allow inner tasks to start new tasks, then as the future duplicated inner task threads are not yet present in our representation, therefore incorrectly detecting a deadlock although new task would be started. So to summarize, task manipulation (starting and waiting for) can only be done on the main thread of the build task. It can also be done on build clusters so that is allowed now. Inner tasks are not allowed to any of these. However, they can start new inner tasks and wait for their results. The retrieval of inner task results are limited to task threads only, however, this may be lifted in the future. (As well with other inner task restrictions as stated above.) |
Unexpected deadlock during testing:
|
Make it clearer what the code does when waiting for task ancestors Related issue #22
Another related error in CI build, logs below. Seems like the issue is that sometimes a waiting thread doesn't get unparked when the condition gets satisfied. At least in the below case, the stack trace shows that the thread was waiting for the ancestors (the task with "plus" identifier) to be finished. After the timeout interruption, the condition check returned successful, and the task was further executed.
|
A thread handle might've gotten added multiple times to the waitingThreads collections of a task future. These changes probably doesnt fix the issues observed in #22, but simplifies the deadlock checking in some aspects.
The threads that wait on a state of a task future are stored in a collection. This collection got cleared when the task finished, and its state was set to RESULT_READY. This operation was incorrect, as if some threads wait on a notification about a new ancestor being added to a task, then it will never get notified, as its thread handle got purged from the collection. This gets fixed by not clearing the waiting thread collection in case the task finishes, so threads can get notified after finish. The thread handles that are only interested in the RESULT_READY event will still get removed not to keep them unnecessarily in the collection. AddAncestorBlockingWaitTaskTest was extended to test this scenario. This reliably happened when the str task finishes before the child starter task (plus) can start it. Related issue: #22
The deadlock detection mechanism of saker.build depends on the
ThreadGroup
API. The code itself is not really robust as lingering threads may cause the deadlock not to be noticed. Additionally, the issue #2 is also a side effect of the current mechanism.As per https://mail.openjdk.java.net/pipermail/loom-dev/2020-July/001471.html, the intetion for the
ThreadGroup
API is to be deprecated over time. It also includes heavy synchronization and may not be reliable enough for proper use for deadlock detection. The solution for this is to revise the current implementation and come up with some different mechanism for deadlock detection. This may include placing additional restrictions on the build tasks.One solution might be is to only allow waiting for other tasks on the main thread of a task. The main thread is the one that
Task.run()
is invoked on. This could make it easier to detect the deadlocks as only a single thread needs to be kept in check for the detection.In order for this to work additional APIs may need to be added to allow waiting for multiple tasks at once. Existing task implementations also need to be checked and tested to be conforming.
With this solution the build will be considered as deadlocked if all running main threads are in waiting state.
This would also make the deadlock detection faster as no polling would be required, the deadlock would be detected instantaneously as the last thread enters the waiting state.
The solution would still allow retrieving task results without waiting on worker threads. E.g.
getFinished
should work.Inner task threads also need to be checked for deadlock. Starting new tasks should also be constrained to the main task threads or main inner task threads.
The text was updated successfully, but these errors were encountered: