Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve support for parallel fitting #3102

Open
pkienzle opened this issue Sep 4, 2024 · 0 comments
Open

Improve support for parallel fitting #3102

pkienzle opened this issue Sep 4, 2024 · 0 comments
Labels
Enhancement Feature requests and/or general improvements

Comments

@pkienzle
Copy link
Contributor

pkienzle commented Sep 4, 2024

Looking at the code, SasView should already support parallel fitting:

omp_threads = int(os.environ.get('OMP_NUM_THREADS', '0'))
mapper = MPMapper if omp_threads == 1 else SerialMapper

This support needs to be improved:

  • Use SAS_NUM_THREADS rather than OMP_NUM_THREADS.
  • Configure SAS_NUM_THREADS from the OpenCL config dialog.
  • Set the cpu count (third parameter to start_mapper) using the value of SAS_NUM_THREADS.

The OMP_NUM_THREADS variable affects programs using OpenMP, including sasmodels compiled with OpenMP support. I guess my thinking at the time (2014) was that parallel fitting only made sense when running sasmodels with single-threaded models. With OpenCL off by default(?) and tinycc not supporting OpenMP, I suspect most environments are running SasView with only one core. Using a different config variable allows us to untangle these concepts and give the user control.

With no GPU and no OpenMP compiler we should be using SAS_NUM_THREADS=0. This will use one thread per core.

User can force single threaded by setting SAS_NUM_THREADS=1. This should select SingleMapper rather than MPMapper.

If we have GPU enabled (or OpenCL on the CPU), then it is trickier. For small cards with few cores (< 100) we should be using single threaded. For large cards with thousands of cores then we can use cpu count = num gpu cores / num q points. Assuming 200 q points per curve on average, an nvidia 4090 with 16000 cores should support SAS_NUM_THREADS=80. This will still leave a lot of unused compute for lm and amoeba fitters, but a big speed increase for dream. Fits to 2-D data should still use SingleMapper.

Multiple GPUs will require that different threads use different gpu device IDs. Since this is done in multiprocessing we should be able to modify the SAS_OPENCL environment variable for each thread to indicate which device to use for that GPU context. We will want two different MPMapper instances, one for 1-D data with many threads per device, and another for 2-D data with a single thread per device. Either a separate environment variable (SAS_GPUS=4?) or a string like SAS_NUM_THREADS=4x80 could indicate that there are multiple devices. Setting the device context for multiple GPUs may require some changes to bumps.

@pkienzle pkienzle added the Enhancement Feature requests and/or general improvements label Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Feature requests and/or general improvements
Projects
None yet
Development

No branches or pull requests

1 participant