Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Python module thread model to allow for reusing interpreters #26857

Merged
merged 11 commits into from
Mar 6, 2025

Conversation

jabraham17
Copy link
Member

@jabraham17 jabraham17 commented Mar 6, 2025

Fixes an issue where using a Python interpreter in disjoint on blocks would segfault.

The use case fixed by this PR is captured in the new test, test/library/packages/Python/correctness/reuseInterpreter.chpl. This test creates a block distributed array of interpreters in one function and then uses them in another function. Prior to this PR, this would segfault due to the following scenario. I have simplified to a single locale

  1. initPythonInterpreters creates a new comm task and initializes the Python interpreter in that thread.
    • Under the hood, Python is creating global state and thread local state
  2. distributedApply creates a separate comm task and tries to use the Python interpreter.
    • Python doesn't yet know about the new thread and has no thread local state, so it crashes

This problem can be fixed by following https://docs.python.org/3/c-api/init.html#non-python-created-threads. At interpreter initialization, we must call PyGILState_Ensure and PyEval_SaveThread. At interpreter finalization, we must call PyEval_RestoreThread and PyGILState_Release. Then, all Python operations in between must be wrapped in matching calls to PyGILState_[Ensure|Release].

However, implementing this then introduces a new problem, the Python interpreter must be started and stopped from the same thread (see https://docs.python.org/3/c-api/init.html#c.Py_FinalizeEx). This is solved in this PR by only starting and stopping the Python interpreter a daemon thread inside the Interpreter's init. Note that due to #26855, this is implemented as a pthread, not a begin.

Since all Python operations as now wrapped in calls to the GIL, for the most part user code no longer needs to manually handle the GIL. This PR updates the examples and docs accordingly. Future work should investigate if we should still expose the GIL and threadState objects.

  • start_test test/library/packages/Python on Mac with Python 3.13
    • with gasnet
    • without gasnet
  • start_test test/library/packages/Python on Mac with Python 3.9
    • with gasnet
    • without gasnet
  • start_test test/library/packages/Python on Linux with Python 3.10
    • with gasnet
    • without gasnet
  • Check that original motivating code in Arkouda works in true multilocale
    • with gasnet oversubscribed
    • with libfabric

[reviewed by @lydia-duncan and @mppf]

Signed-off-by: Jade Abraham <[email protected]>
Signed-off-by: Jade Abraham <[email protected]>
Signed-off-by: Jade Abraham <[email protected]>
Signed-off-by: Jade Abraham <[email protected]>
Signed-off-by: Jade Abraham <[email protected]>
Signed-off-by: Jade Abraham <[email protected]>
Signed-off-by: Jade Abraham <[email protected]>
@jabraham17 jabraham17 marked this pull request as ready for review March 6, 2025 20:03
@jabraham17 jabraham17 requested review from lydia-duncan and mppf March 6, 2025 20:03
Copy link
Member

@mppf mppf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach for fixing GIL handling looks good to me

Signed-off-by: Jade Abraham <[email protected]>
Signed-off-by: Jade Abraham <[email protected]>
Copy link
Member

@lydia-duncan lydia-duncan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be much cleaner in the case of parallelism, nicely done! I've got some comments inline, some of which are nits, some of which we already discussed offline, and a couple that were somewhere in between.

entirety of each task, these examples will be no faster than running the tasks
serially.
Although these examples use Chapel's task parallelism constructs,
they will be no faster than running the tasks serially due to the GIL.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does acquiring the GIL slow down non-Python code? My expectation would be that interleaving Python calls with regular Chapel code will be okay until the Python code segments are reached, at which point they will behave like a barrier was used. Do we know if that's the case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it should only effect Python code. So this statement of "no faster than running serially" is mostly about these specific examples, where the only work done in the loop is the Python function call

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, cool

PySequence_SetItem(this.getPyObject(), idx.safeCast(Py_ssize_t), pyItem);
this.check();
}
}


// TODO: now that we handle this on every call to the python interpreter,
// is this still needed as a user facing type?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue no. But I've of two minds whether I think it should be handled in this PR:

  • Pros:
    • All the related changes are in one place, which makes it easier to see the link between actions. If we do another full overhaul, we'll know to look at this type again if we start by looking at this PR
  • Cons:
    • If we do end up backing away from this change, we might not remember to revisit whether this type should be public again, so users might not have documentation for any strategy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing that fromPickle still needs the GIL makes me lean more towards "don't hide it yet". But that probably means we should update the documentation for it to point back to the earlier docs about the more recommended strategy

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fromPickle only needs it because its an internal function that is meant to be called with the GIL already held. Those tests are testing undocumented/private API details.

But I take your point that we should not hide it yet, which is also my inclination.

I'll note that GIL can be used freely by users if they so choose, threadState is likely to crash their program

Signed-off-by: Jade Abraham <[email protected]>
@jabraham17 jabraham17 merged commit 613a530 into chapel-lang:main Mar 6, 2025
9 checks passed
@jabraham17 jabraham17 deleted the fix-py-threading branch March 6, 2025 23:32
@bradcray
Copy link
Member

bradcray commented Mar 6, 2025

Woohoo! That was a tough nut to crack, congrats on landing it (and by feature freeze as well!)

@bradcray
Copy link
Member

bradcray commented Mar 6, 2025

(And thanks to @mppf for the assist yesterday!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants