-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save all files in a task at the same time to avoid recomputing intermediate results #2522
base: main
Are you sure you want to change the base?
Conversation
614ed25
to
4599bf6
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2522 +/- ##
==========================================
+ Coverage 94.67% 94.72% +0.05%
==========================================
Files 251 252 +1
Lines 14302 14436 +134
==========================================
+ Hits 13540 13675 +135
+ Misses 762 761 -1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bouweandela this is very nice, but I have serious concerns related to store
and locking, there are many known issues with HDF5 and threads, and even the Dask folk are looking at this type of IO issue, see eg dask/distributed#780 - inherently what iris are doing is also not thread-safe, so I am doubly concerned
The linked issue is not related to writing to HDF5 files.
If you have concerns about Iris's capability, I would recommend playing around with it and see if you can get it to crash and report any issues you find on the Iris GitHub page. The code that handles saving to NetCDF with distributed lives in `iris.fileformats.netcdf in case you would like to have a look.
Could you elaborate on those and provide an example of a case where it does not work? |
I need to hatch me a few used cases and test for stress points. Not on the priority list though, so let's see if any issues pop up naturally, am not gonna block this PR, just wanted to see if you have any concerns too 🍺 |
it seems that a Lock object is indeed in dask.distributed, so first hurdle I was afraid of is alleviated dask/dask#1892 (comment) |
this, though, is a bit scary dask/dask#2488 |
We're not using the
It would be great if you could give it a try. I've tested this with the recipe in #2300 and there is seems to work well. |
That sounds like a very reasonable solution to me! @bouweandela what do you think? |
It is a nice idea and would be better than not having a progress bar at all, but it requires running the tool in interactive mode to see progress. Running recipes interactively on an HPC machine without the program crashing if your internet connection is temporarily lost requires setting up something like a We do occasionally get questions from users if the tool is actually doing something, see e.g. ESMValGroup/ESMValTool#3738, so having progress bars (perhaps also for downloading data), seems desirable. |
For a non interactive use case like that, it may be desirable to do something that doesn't overwrite old output, as that doesn't always work in non-interactive systems. For example, when attached to a non-interactive terminal you could print the percentage completion as
Or do what many data transfer tools do and just print dots to show progress is happening. |
Also a nice idea, but that would require implementing our own progress bars instead of the ones provided by Dask, which is more work than I would be willing to put into this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am now happy with the IO handling - had doubts about thread-safety, but should be fine since there is an internal Lock()
, so I'll approve - the progress bar saga still needs resolution, but I think Manu put a Request Changes in for that. Thanks a lot, @bouweandela 🍺
esmvalcore/preprocessor/__init__.py
Outdated
dask.compute(futures) | ||
else: | ||
with dask.diagnostics.ProgressBar(): | ||
dask.compute(delayeds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe tweak it via tqdm
which is a lot more progress-y bar-y? See tqdm/tqdm#278 @schlunma have a look at that too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW no need for notebooks, see https://stackoverflow.com/questions/64982455/format-dask-diagnostics-progress-bar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tqdm
s progress bars look nice, but didn't work for me when I tried to use them to track progress for the ESGF downloads a few years ago. The maintainers of that package are not very responsive to the issue either: tqdm/tqdm#1272 (comment). I would be happy to try again though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's see if it does the trick, can't be any worse than that picture Manu posted in #2522 (comment) 🤣
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented some fancy progress bars for you using rich
let's see if this hits us, after we merge - no point holding up a performance imrpovement for something that's putative 👍 |
The conclusion from community consultation at the workshop is that we would like to display progress bars (also for downloads) and will provide a configuration option to disable them. |
Fix other tests
@jfrost-mo I implemented this solution for the case where a progress bar is not desired. |
@schlunma It is now possible to disable the progress bar by setting a value other than 0 for
If Does it look better now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Bouwe, that look great! I really love the new progress bar ❤️ . It is also nicely rendered in the SLURM log now (I tried VSCode, vim
, and cat
):
The only "issue" I found is that with a distributed scheduler, the progress bar sometimes does not finish with "100%" (I didn't see this with the default scheduler):
One "issue" with the progress logger is that with the default scheduler, there is a new line character after the bar, this does not happen with a distributed scheduler:
I wonder if it would make sense to add an option to disable the progress logging completely? Maybe with log_progress_interval=-1
?
Apart from that, I only have minor comments on the doc. Thank you!!
Description
Save all files in a task at the same time to avoid recomputing intermediate results.
This change is not backward compatible because it changes the return value of
esmvalcore.preprocessor.save
, which is part of the public API. Previously this function returned the filename, not it returnsNone
on immediate saves and adask.delayed.Delayed
for delayed saves that can be requested with thecompute=False
argument.Closes #2521
Closes #2042
Links to documentation:
Before you get started
Checklist
It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.
Changes are backward compatibleTo help with the number pull requests: