-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save all files in a task at the same time to avoid recomputing intermediate results #2522
base: main
Are you sure you want to change the base?
Conversation
614ed25
to
4599bf6
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2522 +/- ##
==========================================
+ Coverage 94.77% 94.79% +0.02%
==========================================
Files 251 251
Lines 14266 14293 +27
==========================================
+ Hits 13520 13549 +29
+ Misses 746 744 -2 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bouweandela this is very nice, but I have serious concerns related to store
and locking, there are many known issues with HDF5 and threads, and even the Dask folk are looking at this type of IO issue, see eg dask/distributed#780 - inherently what iris are doing is also not thread-safe, so I am doubly concerned
The linked issue is not related to writing to HDF5 files.
If you have concerns about Iris's capability, I would recommend playing around with it and see if you can get it to crash and report any issues you find on the Iris GitHub page. The code that handles saving to NetCDF with distributed lives in `iris.fileformats.netcdf in case you would like to have a look.
Could you elaborate on those and provide an example of a case where it does not work? |
I need to hatch me a few used cases and test for stress points. Not on the priority list though, so let's see if any issues pop up naturally, am not gonna block this PR, just wanted to see if you have any concerns too 🍺 |
it seems that a Lock object is indeed in dask.distributed, so first hurdle I was afraid of is alleviated dask/dask#1892 (comment) |
this, though, is a bit scary dask/dask#2488 |
We're not using the
It would be great if you could give it a try. I've tested this with the recipe in #2300 and there is seems to work well. |
Thanks Bouwe, looks good to me! I will test this with some real recipes and merge if successful 🚀 |
I successfully tested this with many recipes with the default scheduler, a distributed The only remaining comment I have is regarding the progress bar. This messes up the log files a lot, e.g.
In addition, it also sometimes produces overlapping lines in the terminal when the tool is run interactively (I guess caused by multiple parallel processes?):
I don't think there is an easy fix for that, so my suggestion would be to just remove the progress bar. Although I quite like that actually 😢 |
Can the progress bar not be piped to file? |
Thanks a lot for testing and reviewing! As far as I can see, the progress bar does not end up in the log files, only in the stdout output that SLURM records. Would it help to disable the progress bar if |
That's right! Sorry, I only checked the SLURM log and was just assuming that the I guess disabling it for |
The SLURM log still looks OK-ish with |
Indeed that does not work. So how do you suggest we proceed? |
I am not sure. The easiest solution would probably be to drop it entirely, and I cannot think of another solution at the moment... |
Description
Save all files in a task at the same time to avoid recomputing intermediate results.
This change is not backward compatible because it changes the return value of
esmvalcore.preprocessor.save
, which is part of the public API. Previously this function returned the filename, not it returnsNone
on immediate saves and adask.delayed.Delayed
for delayed saves that can be requested with thecompute=False
argument.Closes #2521
Closes #2042
Link to documentation: https://esmvaltool--2522.org.readthedocs.build/projects/ESMValCore/en/2522/api/esmvalcore.preprocessor.html#esmvalcore.preprocessor.save
Before you get started
Checklist
It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.
Changes are backward compatibleTo help with the number pull requests: