Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BackgroundTask binding : Release GIL when destroying BackgroundTask #5579

Merged
merged 1 commit into from
Dec 8, 2023

Conversation

johnhaddon
Copy link
Member

The BackgroundTask constructor calls cancelAndWait(), and a Python function can't be cancelled unless it is able to aquire the GIL. Thus if we don't release the GIL before calling wait(), we can deadlock. We were already releasing the GIL appropriately for tasks returned by ParallelAlgo::callOnBackgroundThread() but not for those constructed with BackgroundTask() directly.

Fixing this revealed another problem that was dealt with correctly by callOnBackgroundThread() but not by the BackgroundTask() constructor. We need to hold the GIL when destroying the Python function owned by the task, and we were doing this by manually deleting it after it ran, at a point when we knew we had the GIL. But this wasn't sufficient if the BackgroundTask was cancelled before the function even started. In this case, the function would be destroyed from C++ and we would be deleting Python objects without holding the GIL. The solution is again to adopt the GIL management used by callOnBackgroundThread().

I'm just making this PR for main as the problem only showed up in the LocalDispatcher work I'm doing for 1.4. All current usage of BackgroundTasks goes through callOnBackgroundThread(), so shouldn't suffer from this issue.

@danieldresser-ie
Copy link
Contributor

Looks good. A minor stylistic thing: I'm not sure I like having both functions named gilManagingPtr just with different signatures - this code is technical enough that anyone looking at it needs to keep track of exactly what's happening, and while both "lock the gil before deleting this" and "acquire the gil before deleting this" are related to the gil, they are different enough in function that I think different names would be clearer - but these names aren't publicly visible, so it's not particularly important.

@johnhaddon
Copy link
Member Author

I'm not sure I like having both functions named gilManagingPtr just with different signatures

I've taken a stab at improving this in 3469422 - let me know if you think any other names would be better.

@danieldresser-ie
Copy link
Contributor

Looks good to me to squash and merge.

The BackgroundTask constructor calls `cancelAndWait()`, and a Python function can't be cancelled unless it is able to aquire the GIL. Thus if we don't release the GIL before calling `wait()`, we can deadlock. We were already releasing the GIL appropriately for tasks returned by `ParallelAlgo::callOnBackgroundThread()` but not for those constructed with `BackgroundTask()` directly.

Fixing this revealed another problem that was dealt with correctly by `callOnBackgroundThread()` but not by the `BackgroundTask()` constructor. We need to hold the GIL when destroying the Python function owned by the task, and we were doing this by manually deleting it after it ran, at a point when we knew we had the GIL. But this wasn't sufficient if the BackgroundTask was cancelled before the function even started. In this case, the function would be destroyed from C++ and we would be deleting Python objects without holding the GIL. The solution is again to adopt the GIL management used by `callOnBackgroundThread()`.
@johnhaddon johnhaddon merged commit 943b5f9 into GafferHQ:main Dec 8, 2023
3 of 4 checks passed
johnhaddon added a commit to johnhaddon/gaffer that referenced this pull request Dec 8, 2023
This was made necessary by changes in GafferHQ#5579
which meant that the functor wasn't destroyed immediately after running. Using the
WeakMethod is better though, because the old method wouldn't have broken the circular
reference if the background task was cancelled before it was started.
johnhaddon added a commit to johnhaddon/gaffer that referenced this pull request Dec 12, 2023
This was made necessary by changes in GafferHQ#5579
which meant that the functor wasn't destroyed immediately after running. Using the
WeakMethod is better though, because the old method wouldn't have broken the circular
reference if the background task was cancelled before it was started.
@johnhaddon johnhaddon deleted the backgroundTaskDeadlockFix branch March 15, 2024 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants