Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread safety #385

Open
oanegros opened this issue Apr 27, 2023 · 20 comments
Open

Thread safety #385

oanegros opened this issue Apr 27, 2023 · 20 comments

Comments

@oanegros
Copy link

Currently the SHTOOLS SHExpandGLQ function i regularly use fails in multithreaded applications. I think this comes from the modulation of fftw3.3.3 plans as only the fftw execute calls are thread-safe: https://www.fftw.org/fftw3_doc/Thread-safety.html
This probably affects (almost) all fftw calls in SHTOOLS.

Would it be an idea to implement the brute fftw_make_planner_thread_safe call and force fft expansions to be thread safe?

@MarkWieczorek
Copy link
Member

I have used the fortran routines with openmp in the past without any problems. Could you give me more info about when this fails and when it doesn't? Or does it fail for even the most simply tasks?

@mreineck
Copy link
Contributor

mreineck commented May 2, 2023

If the problem is indeed FFTW-related, it's most likely not connected with ducc, since ducc doesn't use FFTW internally.

@oanegros
Copy link
Author

oanegros commented May 2, 2023

Oh interesting! I had it fail when multiple threads were trying to execute SHExpandGLQ calls on different datasets (with the same dimensions) at the same time. The failure was then a segmentation fault without clear error codes. I found the FFTW thread safety information, and saw that the variables they talk about there were edited in the SHExpandGLQ code.
I tried now changing backend to ducc as @mreineck suggested, but my segmentation faults stay.

The threading implementation i'm using is a python-rewrite of Java threading that is implemented in the bigger software i'm writing in, so this may also have specific vulnerabilities?

I managed to fix it in my code by encapsulating the SHExpandGLQ in an RLock - so it's not very pressing for me, but weird that it happens especially if it's stable with openMP

@MarkWieczorek
Copy link
Member

I don't use openmp very often, so I'll need to investigate...

@mreineck
Copy link
Contributor

mreineck commented May 2, 2023

OK, I have an idea why the Fortran code cannot be called concurrently: in the function you linked there are arrays with the "save" attribute, which means that they are equivalent to static variables in C/C++, i.e. global state. Calling into such a function concurrently will almost certainly lead to undesired effects.

Not sure why you also encounter this with the ducc back-end as well though ... it shouldn't have any global state.

Semi-related question: how do you manage to execute concurrent calls from Python? Doesn't the GIL prevent this?

@mreineck
Copy link
Contributor

mreineck commented May 2, 2023

Sorry, I thake that back: all variables marked save are also threadprivate, so this could in principle work.

On the other hand, threadprivate is an OpenMP-specific keyword, so I have no idea what happens if this is called from multiple non-OpenMP threads...

@oanegros
Copy link
Author

oanegros commented May 2, 2023

I had to look it up to figure out how the software im using is doing this: I'm never releasing the GIL, I call SHExpandGLQ in multiple consecutive threads (that exist for other non-GIL-locked reasons outside of my code) and it seems like the entering into SHExpandGLQ unlocks the GIL somewhere, causing another of my threads to progress until it collides inside of pyshtools.

@oanegros
Copy link
Author

oanegros commented May 2, 2023

The GIL unlocking was proposed in #304 and is done here it seems

@mreineck
Copy link
Contributor

mreineck commented May 2, 2023

Sure! I was just wondering how you can call the SHTOOLS Python interface from two concurrently running Python threads. I most likely have an oversimplified picture in my mind of what you are actually doing.

@mreineck
Copy link
Contributor

mreineck commented May 2, 2023

To get a clearer picture: which function from the SHTOOLS Python interface are you calling exactly?

@oanegros
Copy link
Author

oanegros commented May 2, 2023

the only function i have isssues with is SHExpandGLQ, if i lock this function to wait for thread execution, my code works fine

@oanegros
Copy link
Author

oanegros commented May 2, 2023

And i have to say i dont fully understand your question on having multiple concurrent python threads, this is something that does not require the release of the GIL (such as the multiprocessing.ThreadPool implementation)

@oanegros
Copy link
Author

oanegros commented May 2, 2023

I'll try to make a minimal working example soon 😄

@mreineck
Copy link
Contributor

mreineck commented May 2, 2023

Thanks, that will certainly be very helpful!

@oanegros
Copy link
Author

oanegros commented May 2, 2023

from concurrent import futures
import pyshtools as pysh
import numpy as np

your_patches = np.random.randint(1,1e6, size=(100, 251,503))

zero, w = pysh.expand.SHGLQ(251)
with futures.ThreadPoolExecutor(max_workers=10) as executor:
    jobs = [executor.submit(pysh.expand.SHExpandGLQ, patch, w, zero) for patch in your_patches]
    [fut.result() for fut in futures.as_completed(jobs)]

This breaks with a segmentation fault for me, but works with max_workers=1 . The 251 lmax is just because i use this as a default grid size (~80*π), but it also breaks with other lmax that i tested

@mreineck
Copy link
Contributor

mreineck commented May 2, 2023

OK, I can reproduce the segmentation fault when the standard shtools backend is used, but not with ducc. Do you have the ducc0 Python package installed? If not, I think the backend will be silently reverted to shtools.

@oanegros
Copy link
Author

oanegros commented May 2, 2023

Ah okay, i did not notice because print(pysh.backends.preferred_backend()) did seem to return ducc when this was not available. With the actual ducc backend it does work 😄 . I might switch to this backend for my project.
But I would still say that either the calls should be thread-safe or this behavior needs to be documented for the shtools backend.

@mreineck
Copy link
Contributor

mreineck commented May 2, 2023

OK, might be good to add a function like actual_backend(), so that the current backend can be identified.
If you switch to the ducc backend, please make sure to select an appropriate value for nthreads, otherwise you'll be overcommitting your hardware (since you already are running in parallel on the caller side).

@oanegros
Copy link
Author

oanegros commented May 3, 2023

I tried to now implement this in my main project, but there i seem to (currently) be limited to pyshtools v4.9.1 and ducc0 v0.26.0 due to other dependencies. Here the segmentation faults still happen with preferred_backend ducc. Is this not correctly setting the backend, or is the thread safety of ducc a newer thing?

@mreineck
Copy link
Contributor

mreineck commented May 3, 2023

As far as I know, there shouldn't have been any relevant ducc changes snice 0.26, but we did a lot of tweaking to the backend selection code inside pyshtools after v4.9.1. I'm not sure that this causes the differences, but I'd suspect it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants