CMIP6 climate patterns #2785

mo-gregmunday · 2022-09-01T10:34:14Z

Description

This diagnostic generates climate patterns for CMIP6 models.

Closes Generating Climate Patterns from CMIP6 Models #2701
Link to documentation: https://esmvaltool--2785.org.readthedocs.build/en/2785/recipes/recipe_climate_patterns.html)

☝ Create an issue to discuss what you are going to do

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.

🛠 This pull request has a descriptive title
🛠 Code is written according to the code quality guidelines
🛠 Documentation is available
🛠 Tests run successfully
🛠 The list of authors is up to date
🛠 Any changed dependencies have been added or removed correctly
🛠 All checks below this pull request were successful

New or updated recipe/diagnostic

🧪 Recipe runs successfully
🧪 Recipe is well documented
🧪 Figure(s) and data look as expected from literature
🛠 Provenance information has been added

valeriupredoi · 2022-09-06T11:54:10Z

hi @mo-gregmunday many thanks for opening this PR! Could I please ask you to set a descriptive title (it's not very clear to what the PR does from the current title), and also look for two reviewers - Pull Requests to ESMValTool usually need both a scientific and a technical reviewer - for the scientific bit you will have to ask someone who's got some experience with the implementations related to the sciencey bits, and the tech reviewer is usually someone who's just gonna go through the code and reviewe its technical/programming/deployment etc bits. Cheers 🍺

Jon-Lillis

Thanks for this, @mo-gregmunday. This diagnostic is looking good, documentation builds and reads well, and the metric runs as expected.

I think there are still a few Codacy warnings that we can easily address, so I’ve made a few comments on this throughout the review above (edit: below 😄). Reducing the number of local variables in some of your calculation functions could be a bit more challenging, but if we address enough of the low hanging fruit elsewhere then maybe people will look kindly on their complexity.

One general note is that I think it would be useful to add a few more debug logs throughout the diagnostic, e.g. logger.debug('Processing model: {}'.format(model)) at the start of the patterns function to make the logs a bit more useful.

@ESMValGroup/esmvaltool-coreteam, I've got a few questions about parallelisation within a diagnostic. Given the number of datasets being processed by this metric, @mo-gregmunday has successfully used a multiprocessing pool to help bring the execution time down significantly by processing each dataset concurrently. Do you think this is an appropriate way to do this within ESMValTool, or is there a more efficient way that we haven’t considered?

If it is appropriate, then I’ve another question. The number of cores to be divvied up by the pool is currently defined in the recipe file. My instinct was to suggest that either all available cores or ‘max_parallel_tasks’ in the config-user.yml file could be used instead, but both could be problematic when the second diagnostic (coming in a future PR) is introduced and attempts to use the same number when ESMValTool itself runs them concurrently. Is there a way to get the number of diagnostics from a recipe file within a diagnostic so that the pool can be given ‘max_parallel_tasks / number of diagnostics’?

esmvaltool/recipes/recipe_climate_patterns.yml

esmvaltool/diag_scripts/climate_patterns/climate_patterns.py

esmvaltool/diag_scripts/climate_patterns/sub_functions.py

esmvaltool/diag_scripts/climate_patterns/rename_variables.py

esmvaltool/diag_scripts/climate_patterns/climate_patterns.py

mo-gregmunday · 2022-09-12T12:12:57Z

hi @mo-gregmunday many thanks for opening this PR! Could I please ask you to set a descriptive title (it's not very clear to what the PR does from the current title), and also look for two reviewers - Pull Requests to ESMValTool usually need both a scientific and a technical reviewer - for the scientific bit you will have to ask someone who's got some experience with the implementations related to the sciencey bits, and the tech reviewer is usually someone who's just gonna go through the code and reviewe its technical/programming/deployment etc bits. Cheers 🍺

Hi @valeriupredoi, sure - I'll add one now. @Jon-Lillis is leading the technical review on this one, and I'm still on the hunt for an 'ESMValTool-certified' scientific reviewer, although scientifically speaking this code has been verified by experts internally.

valeriupredoi · 2022-09-13T11:24:53Z

cheers @mo-gregmunday 🍺 Maybe you can add one of those internal reviewers as sci reviewer here? If they're on GH, that is

mo-gregmunday · 2022-10-06T13:03:50Z

cheers @mo-gregmunday 🍺 Maybe you can add one of those internal reviewers as sci reviewer here? If they're on GH, that is

Hi @valeriupredoi, apologies for the slow response, I've been on annual leave for the last few weeks!

I've got their GH username: eleanorgb, however I think they may need to be added to the ESMValTool repo team on here before I can add them?

bouweandela · 2022-10-07T09:39:01Z

I've got their GH username: @eleanorgb, however I think they may need to be added to the ESMValTool repo team on here before I can add them?

Anyone with a GitHub account can review any pull request, but if you would like to add her to the organization you can send an email to @axel-lauer.

bouweandela · 2022-10-07T09:51:39Z

https://github.com/orgs/ESMValGroup/teams/esmvaltool-coreteam, I've got a few questions about parallelisation within a diagnostic. Given the number of datasets being processed by this metric, @mo-gregmunday has successfully used a multiprocessing pool to help bring the execution time down significantly by processing each dataset concurrently. Do you think this is an appropriate way to do this within ESMValTool, or is there a more efficient way that we haven’t considered?

It works, but it does have its problems. For example:

each process will start its own dask scheduler which will try to run things in parallel. This may lead to needless context switching, slowing down the application. Also, dask will need to be configured to spill to disk as soon as the memory reaches something like (80 / the number of schedulers you're running) percent of the memory, or you may run out of memory.
dask is not designed to be used like this, so it will hang if you open a file from the parent process and then try to open it again in the child process (see New preprocessor to clip values to a certain range. ESMValCore#403).

A better solution would be to make use of the features provided by dask. However, this may first need better support in ESMValCore. We're currently experimenting with this in ESMValGroup/ESMValCore#1714. In the future, I think we may automatically add a bit of code to every Python diagnostic so it uses the dask scheduler that is configured for ESMValCore. Make sure that you do not needlessly realize the data in the diagnostic script (e.g. use cube.core_data() instead of cube.data, where cube is an iris.cube.Cube, wherever appropriate).

My advice would be to use multiprocessing for now if you have to, but make sure you add it in such a way that it can easily be removed in the future (perhaps add a switch to disable it from the recipe?).

…Tool into climate_patterns_only

mo-gregmunday · 2024-06-04T09:19:56Z

I would like to propose that suggestions that would take a long time are postponed to a follow-up issue

My experience from past pull requests is that this kind of thing ends up never happening, so I'm not too keen. What kind of changes are these that are so time consuming?

@bouweandela @ehogan I've gone ahead and made the changes, so should all be ready!

ehogan

@mo-gregmunday I did a final sanity check on the latest version of the changes and have a few minor comments 👍

esmvaltool/diag_scripts/climate_patterns/climate_patterns.py

esmvaltool/diag_scripts/climate_patterns/sub_functions.py

mo-gregmunday · 2024-06-19T11:25:20Z

https://github.com/orgs/ESMValGroup/teams/esmvaltool-coreteam, I've got a few questions about parallelisation within a diagnostic. Given the number of datasets being processed by this metric, @mo-gregmunday has successfully used a multiprocessing pool to help bring the execution time down significantly by processing each dataset concurrently. Do you think this is an appropriate way to do this within ESMValTool, or is there a more efficient way that we haven’t considered?

It works, but it does have its problems. For example:

each process will start its own dask scheduler which will try to run things in parallel. This may lead to needless context switching, slowing down the application. Also, dask will need to be configured to spill to disk as soon as the memory reaches something like (80 / the number of schedulers you're running) percent of the memory, or you may run out of memory.

dask is not designed to be used like this, so it will hang if you open a file from the parent process and then try to open it again in the child process (see New preprocessor to clip values to a certain range. ESMValCore#403).

A better solution would be to make use of the features provided by dask. However, this may first need better support in ESMValCore. We're currently experimenting with this in ESMValGroup/ESMValCore#1714. In the future, I think we may automatically add a bit of code to every Python diagnostic so it uses the dask scheduler that is configured for ESMValCore. Make sure that you do not needlessly realize the data in the diagnostic script (e.g. use cube.core_data() instead of cube.data, where cube is an iris.cube.Cube, wherever appropriate).

My advice would be to use multiprocessing for now if you have to, but make sure you add it in such a way that it can easily be removed in the future (perhaps add a switch to disable it from the recipe?).

There is a switch in the recipe file which allows the script to be parallelised or not (using multiprocessing). I've not found any need for Dask in terms of optimisation of the script itself - I've vectorised the linear regression operations on cube data which is very fast, and elsewhere in the script I've used cube.core_data() where I can to optimise memory usage.

Jon-Lillis

Changes look good, Greg! I think there may still be some dask optimisation to be done in future but for now I'm happy that my comments have been discussed and addressed, and I think this looks ready to be included in the next release.

Co-authored-by: Emma Hogan <[email protected]>

mo-gregmunday · 2024-06-19T11:36:58Z

Changes look good, Greg! I think there may still be some dask optimisation to be done in future but for now I'm happy that my comments have been discussed and addressed, and I think this looks ready to be included in the next release.

Thanks so much @Jon-Lillis!

ehogan

Great work @mo-gregmunday, many thanks for addressing all my review comments! @bouweandela and / or @valeriupredoi, would it be possible for one of you to merge this asap, please? We are planning to start the process of finalising the release tomorrow! 😁

mo-gregmunday · 2024-06-19T14:43:11Z

Great work @mo-gregmunday, many thanks for addressing all my review comments! @bouweandela and / or @valeriupredoi, would it be possible for one of you to merge this asap, please? We are planning to start the process of finalising the release tomorrow! 😁

Thanks so much @ehogan, @Jon-Lillis and @eleanorgb for your time and hard work reviewing this!! :)

ehogan · 2024-06-19T15:32:21Z

One last comment to add that I did run the recipe at the MO:

with parallelise: true:

INFO    [27875] Time for running the recipe was: 0:16:28.707423
INFO    [27875] Maximum memory used (estimate): 97.7 GB
[...]
INFO    [27875] Run was successful

with parallelise: false:

INFO    [44881] Time for running the recipe was: 0:26:49.341768
INFO    [44881] Maximum memory used (estimate): 43.3 GB
[...]
INFO    [44881] Run was successful

valeriupredoi · 2024-06-21T12:07:59Z

good work, folks! Afraid this merge broke the GA tests https://github.com/ESMValGroup/ESMValTool/actions/runs/9606425046/job/26495922820 - but @ehogan is fixing it by ontroducing package-level importing in #3672

mo-gregmunday added 5 commits September 1, 2022 10:25

Added recipe file, climate patterns only, no EBM parameter script

e242a9c

adding rest of files

fa45e83

Brought everything from only branch

ca5cc12

Quick fix of sub_functions.py

378b91e

blacked all my files

51c9ac5

mo-gregmunday requested a review from Jon-Lillis September 1, 2022 10:34

mo-gregmunday added 9 commits September 1, 2022 11:42

testing documentation fix

1e422f6

another docstring test

da9c7c2

climate_patterns.py docstring fixes

a8ead28

more docstring fixes

b9dd91d

cp_plotting.py docstring fixes

3219b20

rest of docstring fixes

151d1ab

more Codacy induced fixes

4449a7d

more Codacy induced fixes

435409f

more Codacy fixes

1c7b621

Jon-Lillis added 4 commits September 8, 2022 10:54

Add .yml extension to recipe filename

4f24e68

Enable metric to run with parallelise flag set to False

2fe7db1

Add maintainer to recipe file

b501286

Correct use of whitespace

c765763

Jon-Lillis requested changes Sep 9, 2022

View reviewed changes

mo-gregmunday added 2 commits October 7, 2022 11:10

Some fixes/improvements, and recipe fix

eccc492

Merge branch 'climate_patterns_only' of github.com:ESMValGroup/ESMVal…

5624018

…Tool into climate_patterns_only

mo-gregmunday mentioned this pull request Oct 7, 2022

Generating Climate Patterns from CMIP6 Models #2702

Draft

15 tasks

mo-gregmunday requested a review from eleanorgb November 21, 2022 15:35

mo-gregmunday and others added 11 commits May 24, 2024 17:39

Refactored rename_variables.py

4d0b9ba

Quick fix

0b3d877

Update docstring

f5bb5b6

Codacy fix

16b03e9

Fixed memory bug

3b1ab49

Codacy fix

26f337a

Codacy fix

8df4001

Merge branch 'main' into climate_patterns_only

f38a925

Improved plotting.py

78f409b

Merge branch 'climate_patterns_only' of github.com:ESMValGroup/ESMVal…

2db111f

…Tool into climate_patterns_only

Merge branch 'main' into climate_patterns_only

be3d521

Merge branch 'main' into climate_patterns_only

30bf0d7

ehogan requested changes Jun 19, 2024

View reviewed changes

Jon-Lillis approved these changes Jun 19, 2024

View reviewed changes

mo-gregmunday and others added 2 commits June 19, 2024 12:26

Update esmvaltool/diag_scripts/climate_patterns/sub_functions.py

3a8dda5

Co-authored-by: Emma Hogan <[email protected]>

Docstring corrections and recipe correction

8eb09dd

mo-gregmunday added 2 commits June 19, 2024 12:39

Line too long fix

dfb6e70

Updated recipe default

f4bab69

ehogan approved these changes Jun 19, 2024

View reviewed changes

bouweandela merged commit e7c5cd5 into main Jun 20, 2024
8 checks passed

bouweandela deleted the climate_patterns_only branch June 20, 2024 08:43

ehogan mentioned this pull request Jun 20, 2024

Fix failing tests after CMIP6 climate patterns merge #3670

Closed

ehogan added approved by technical reviewer and removed in technical review labels Jun 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CMIP6 climate patterns #2785

CMIP6 climate patterns #2785

mo-gregmunday commented Sep 1, 2022 •

edited by ehogan

Loading

valeriupredoi commented Sep 6, 2022

Jon-Lillis left a comment •

edited

Loading

mo-gregmunday commented Sep 12, 2022

valeriupredoi commented Sep 13, 2022

mo-gregmunday commented Oct 6, 2022

bouweandela commented Oct 7, 2022

bouweandela commented Oct 7, 2022 •

edited

Loading

mo-gregmunday commented Jun 4, 2024

ehogan left a comment

mo-gregmunday commented Jun 19, 2024

Jon-Lillis left a comment

mo-gregmunday commented Jun 19, 2024

ehogan left a comment

mo-gregmunday commented Jun 19, 2024

ehogan commented Jun 19, 2024

valeriupredoi commented Jun 21, 2024

CMIP6 climate patterns #2785

CMIP6 climate patterns #2785

Conversation

mo-gregmunday commented Sep 1, 2022 • edited by ehogan Loading

Description

Checklist

New or updated recipe/diagnostic

valeriupredoi commented Sep 6, 2022

Jon-Lillis left a comment • edited Loading

Choose a reason for hiding this comment

mo-gregmunday commented Sep 12, 2022

valeriupredoi commented Sep 13, 2022

mo-gregmunday commented Oct 6, 2022

bouweandela commented Oct 7, 2022

bouweandela commented Oct 7, 2022 • edited Loading

mo-gregmunday commented Jun 4, 2024

ehogan left a comment

Choose a reason for hiding this comment

mo-gregmunday commented Jun 19, 2024

Jon-Lillis left a comment

Choose a reason for hiding this comment

mo-gregmunday commented Jun 19, 2024

ehogan left a comment

Choose a reason for hiding this comment

mo-gregmunday commented Jun 19, 2024

ehogan commented Jun 19, 2024

valeriupredoi commented Jun 21, 2024

mo-gregmunday commented Sep 1, 2022 •

edited by ehogan

Loading

Jon-Lillis left a comment •

edited

Loading

bouweandela commented Oct 7, 2022 •

edited

Loading