-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sim_objects sometimes gets stuck #296
Comments
Hm.. I should look at this. In the mean while, how many threads are you using? Can you test with some different numbers? This is controlled by OMP_NUM_THREADS. That said, the symptoms you report sound most like a memory leak. If so it should be relatively simple to fix. |
same issues with 1 thread
|
|
I tested this on two different clusters and could not reproduce your error. For example, on NERSC, using 30 cores, it takes around 12 seconds and consistently uses 2.88 GB of RAM. I am using pixell 0.28.0 and numpy 1.26.4. I will try with another environment. It might be an issue with numpy >= 2.0, the latest pixell 0.28.3 version, or another package update.
|
thanks @cpvargas! did not think about trying with a different numpy. possibly it is happening less? but still happening on Popeye:
|
I also tried checking out
|
I've tested in a clean environment (Python 3.10.16, NumPy 2.0.2, pixell 0.28.0) and observed intermittent high memory usage. Out of 20 runs, 2 consumed approximately 70 GB. I ran a second set of 20 runs and observed the same issue in one of those runs. This strongly points to an intermittent memory leak. I'll investigate if the issue is related to map geometry and source position, and also test if the problem doesn't occur using other Python and NumPy versions. |
I am using
sim_objects
in the implementation of the point source catalog in PySM3 to simulate maps of point sources from a catalog with a gaussian beam directly applied to them.I have been affected by this bug where an execution for 100k sources that should run in a few seconds instead gets stuck, and keeps increasing memory usage.
What is disconcerting is that this happens once every 3 or 4 runs.
@amaurea this is blocking for running point source simulations for Simons Observatory, could you please take a look?
The input dataset is in this Google Drive folder: https://drive.google.com/drive/folders/1BtbahZ8rkXswzBn1zJBZP3cdOvxRdTUR?usp=drive_link
Run script on Popeye
When I run on a computing node on Popeye the exact same script, sometimes it runs in 7s using less than 3 GB of RAM, sometimes I have to kill it after 5 minutes (and 14.5 GB of RAM):
The 2 scripts are here: https://gist.github.com/zonca/7b33648c21235f833aa3315099b65146
Run on Colab
I tried to reproduce this on Colab here:
https://colab.research.google.com/drive/19AxJROAGYU-PrKbxHFNb9sTl9PtDIE8L?usp=sharing
It seems like it always works on the first execution of the notebook, but if executed again without restarting the kernel, most of the times it keeps increasing memory until it crashes.
The text was updated successfully, but these errors were encountered: