-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CMIP6 climate patterns #2785
CMIP6 climate patterns #2785
Conversation
hi @mo-gregmunday many thanks for opening this PR! Could I please ask you to set a descriptive title (it's not very clear to what the PR does from the current title), and also look for two reviewers - Pull Requests to ESMValTool usually need both a scientific and a technical reviewer - for the scientific bit you will have to ask someone who's got some experience with the implementations related to the sciencey bits, and the tech reviewer is usually someone who's just gonna go through the code and reviewe its technical/programming/deployment etc bits. Cheers 🍺 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, @mo-gregmunday. This diagnostic is looking good, documentation builds and reads well, and the metric runs as expected.
I think there are still a few Codacy warnings that we can easily address, so I’ve made a few comments on this throughout the review above (edit: below 😄). Reducing the number of local variables in some of your calculation functions could be a bit more challenging, but if we address enough of the low hanging fruit elsewhere then maybe people will look kindly on their complexity.
One general note is that I think it would be useful to add a few more debug logs throughout the diagnostic, e.g. logger.debug('Processing model: {}'.format(model))
at the start of the patterns function to make the logs a bit more useful.
@ESMValGroup/esmvaltool-coreteam, I've got a few questions about parallelisation within a diagnostic. Given the number of datasets being processed by this metric, @mo-gregmunday has successfully used a multiprocessing pool to help bring the execution time down significantly by processing each dataset concurrently. Do you think this is an appropriate way to do this within ESMValTool, or is there a more efficient way that we haven’t considered?
If it is appropriate, then I’ve another question. The number of cores to be divvied up by the pool is currently defined in the recipe file. My instinct was to suggest that either all available cores or ‘max_parallel_tasks’ in the config-user.yml
file could be used instead, but both could be problematic when the second diagnostic (coming in a future PR) is introduced and attempts to use the same number when ESMValTool itself runs them concurrently. Is there a way to get the number of diagnostics from a recipe file within a diagnostic so that the pool can be given ‘max_parallel_tasks
/ number of diagnostics
’?
Hi @valeriupredoi, sure - I'll add one now. @Jon-Lillis is leading the technical review on this one, and I'm still on the hunt for an 'ESMValTool-certified' scientific reviewer, although scientifically speaking this code has been verified by experts internally. |
cheers @mo-gregmunday 🍺 Maybe you can add one of those internal reviewers as sci reviewer here? If they're on GH, that is |
Hi @valeriupredoi, apologies for the slow response, I've been on annual leave for the last few weeks! I've got their GH username: eleanorgb, however I think they may need to be added to the ESMValTool repo team on here before I can add them? |
Anyone with a GitHub account can review any pull request, but if you would like to add her to the organization you can send an email to @axel-lauer. |
It works, but it does have its problems. For example:
A better solution would be to make use of the features provided by dask. However, this may first need better support in ESMValCore. We're currently experimenting with this in ESMValGroup/ESMValCore#1714. In the future, I think we may automatically add a bit of code to every Python diagnostic so it uses the dask scheduler that is configured for ESMValCore. Make sure that you do not needlessly realize the data in the diagnostic script (e.g. use My advice would be to use multiprocessing for now if you have to, but make sure you add it in such a way that it can easily be removed in the future (perhaps add a switch to disable it from the recipe?). |
…Tool into climate_patterns_only
…Tool into climate_patterns_only
@bouweandela @ehogan I've gone ahead and made the changes, so should all be ready! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mo-gregmunday I did a final sanity check on the latest version of the changes and have a few minor comments 👍
There is a switch in the recipe file which allows the script to be parallelised or not (using multiprocessing). I've not found any need for Dask in terms of optimisation of the script itself - I've vectorised the linear regression operations on cube data which is very fast, and elsewhere in the script I've used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good, Greg! I think there may still be some dask optimisation to be done in future but for now I'm happy that my comments have been discussed and addressed, and I think this looks ready to be included in the next release.
Thanks so much @Jon-Lillis! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @mo-gregmunday, many thanks for addressing all my review comments! @bouweandela and / or @valeriupredoi, would it be possible for one of you to merge this asap, please? We are planning to start the process of finalising the release tomorrow! 😁
Thanks so much @ehogan, @Jon-Lillis and @eleanorgb for your time and hard work reviewing this!! :) |
One last comment to add that I did run the recipe at the MO:
|
good work, folks! Afraid this merge broke the GA tests https://github.com/ESMValGroup/ESMValTool/actions/runs/9606425046/job/26495922820 - but @ehogan is fixing it by ontroducing package-level importing in #3672 |
Description
This diagnostic generates climate patterns for CMIP6 models.
Checklist
It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.
New or updated recipe/diagnostic