-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recipe test results for ESMValCore v2.11.0rc1 #2421
Comments
Hi @ESMValGroup/technical-lead-development-team @bouweandela @valeriupredoi Any comments on the following evaluation please? (The original output from running the recipes for the first time is above). 1. R diagnostic failuresThe following are R recipes with various errors. Would anyone with R knowledge please take a look?
The errors were either of the below: Error in (models_dataset == reference_dataset) && (models_exp == reference_exp) :
'length = 2' in coercion to 'logical(1)' ^ Operator >remapcon2< not found! 2. Python diagnostic failuresWe have the capacity to address these errors - should we? Or does anyone already know how to solve these?
KeyError: 'Provenance record for /scratch/b/b382148/esmvaltool_output/recipe_martin18grl_20240515_142625/plots/spi_collect/spi_collect/SPI_time_series_Bremen_Observations.png already exists.'
iris.exceptions.ConcatenateError: failed to concatenate into a single cube.
Cube metadata differs for phenomenon: precipitation_flux
TypeError: unhashable type: 'CubeAttrsDict' 3. NCL diagnostic failuresThere is one NCL recipe with an error. Would anyone with NCL knowledge please take a look?
INFO fatal: in uajet_sh850, cannot read plev and latrange 4. Recipes that failed because of missing data
We recognise 5. Recipes that failed because the run took too long
We've increased the time on all of these except for
We also had to increase time on these from the "Recipes that failed of other reasons or are still running" section. 6. Recipes that failed because model data couldn't be downloaded
7. Recipes that failed because of an HDF5 error
This three are all the same as in v2.10 recipe test results
This is a new entry. 8. Recipes that fail because of - we think! - an ESMValCore issue
ValueError: Chunks and shape must be of the same length/dimension. Got chunks=(), shape=(1,) |
great summary and work @chrisbillowsMO and @ehogan 🍺 Here is the issue with those three HDF5-related failures, as posted by @bouweandela back in December last year, when they were working on the 2.10 release: ESMValGroup/ESMValTool#3463 (comment) This is a HDF5 thread unsafe-related issue and it is flaky but it appears it is mostly reproducible (positive flakiness, or was it negative? don't matter). This has to be fixed, most probably by adding a file |
Did you install the Julia dependencies? |
fairly sure no is the answer to that q, bud 😁 |
No, I had missed the |
Successfully tested them 👍 I'll update the comment above to reflect this. |
The following recipes are now running successfully, so I will update the comments above:
Should I update the time for these recipes in SPECIAL_RECIPES in generate.py? What should we do with the recipes that don't run within 8 hours? |
The following recipe is now running successfully, so I will update the comments above:
This is a new recipe since ESMValTool v2.10.0, so it will need adding to SPECIAL_RECIPES in generate.py. |
@bouweandela, @valeriupredoi, would it be possible to get some guidance on what to do now, please? How many of the failures above must we fix before moving onto the ESMValTool freeze and testing stages? Can all the diagnostic and data issues wait until ESMValTool testing? 🤔 |
Super work, guys! Here's me 3 cents (2 cents adjusted for inflation):
|
A possible reason for some of these failures could be iris' new attribute handling: since version 3.8, iris now distinguishes between local and global attributes. We adopted this new behavior in #2398. This was the reason for the errors in |
Apologies @valeriupredoi, you did say this previously, and I promptly forgot! I will update the comment above appropriately 👍 |
Not a worry, Emma, release time is a very busy one 🙂 |
If you suspect it is an ESMValCore issue, I would recommend fixing it before moving on to testing ESMValTool, but otherwise you should be fine to move on.
Yes, that would be helpful for the next release manager.
Are these recipes still running after 8 hours? In my experience, sometimes processes get killed without SLURM telling you. If there are no more log messages in the debug log or diagnostic scripts logs long before the 8 hours are over, it seems likely that the process has silently crashed. If this is the case, you could try reducing the number of workers used by Dask. This can be done by configuring the distributed scheduler, or if there are non-lazy preprocessor functions #674 in the recipe, you can use the default scheduler and create a file called
in it. That will use just 16 threads instead of the default 128 on a default levante compute node, leaving 256GB/16 = 16GB of RAM per thread instead of just 2GB. |
Closing this issue in favour of #2468 😊 |
Recipe test results for v2.11.0rc1
This is the initial output from testing done for releasing ESMValCore v2.11.0rc1. Please see the following comment for our evaluation of the failures.
Recipe running session 2024-05-15
Setup
mamba
versionESMValTool version
Recipes that ran successfully (132 out of 160)
Click to expand
Recipes that failed because the diagnostic script failed (11 out of 160)
Recipes that failed because of missing data (3 out of 160)
Recipes that failed because the run took too long (6 out of 160)
Recipes that failed of other reasons or are still running (7 out of 160)
Recipes that are known to be broken (1 out of 160)
The text was updated successfully, but these errors were encountered: