Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable parallel metp jobs and fix race condition with the gfscleanup job #2907

Merged

Conversation

DavidHuber-NOAA
Copy link
Contributor

@DavidHuber-NOAA DavidHuber-NOAA commented Sep 12, 2024

Description

This brings in a change to EMC_verif-global that offsets the start time of parallel instances when running metp jobs. This prevents Python from attempting to create the same directory in multiple instances.
Simultaneously, this also fixes an issue with the gfscleanup job potentially running before the metp jobs.
Resolves #2906
Resolves #2899

Type of change

  • New feature (adds functionality)

Change characteristics

How has this been tested?

Standalone test on Orion with 40 parallel instances over a 30-cycle span.

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • New and existing tests pass with my changes

@DavidHuber-NOAA
Copy link
Contributor Author

I added a one-line fix to enkf.yaml.j2 that @AntonMFernando-NOAA found. This code has not been used as the eomg jobs are not runnable at this time, but it will help when others try to use code samples.

@DavidHuber-NOAA DavidHuber-NOAA changed the title Enable parallel metp jobs Enable parallel metp jobs and fix race condition with the gfscleanup job Sep 13, 2024
* origin/develop:
  Update config.resources for bufr sounding job postsnd (NOAA-EMC#2917)
  Cleanup job for GEFS (NOAA-EMC#2919)
  Build GDASApp and unset memory in Gaea-C5 xml files (NOAA-EMC#2912)
  add 1 deg ocean/ice info to parm/config/gfs/config.resources (NOAA-EMC#2922)
  Support gefs C48 on Azure (NOAA-EMC#2881)
  Disable native grid writes for non-JEDI experiments; update C384 compression options (NOAA-EMC#2914)
@DavidHuber-NOAA DavidHuber-NOAA marked this pull request as ready for review September 16, 2024 12:59
@DavidHuber-NOAA
Copy link
Contributor Author

I finished running tests on Hera. All metp jobs passed. Marking this PR as ready for review.

@WalterKolczynski-NOAA WalterKolczynski-NOAA added the CI-Wcoss2-Ready **CM use only** PR is ready for CI testing on WCOSS label Sep 23, 2024
@emcbot emcbot added CI-Wcoss2-Building **Bot use only** CI testing is cloning/building on WCOSS and removed CI-Wcoss2-Ready **CM use only** PR is ready for CI testing on WCOSS labels Sep 23, 2024
@emcbot
Copy link

emcbot commented Sep 23, 2024

CI Update on Wcoss2 at 09/23/24 01:50:13 PM
============================================
Cloning and Building global-workflow PR: 2907
with PID: 116296 on host: clogin03

@emcbot emcbot added CI-Wcoss2-Running **Bot use only** CI testing on WCOSS for this PR is in-progress and removed CI-Wcoss2-Building **Bot use only** CI testing is cloning/building on WCOSS labels Sep 23, 2024
@emcbot
Copy link

emcbot commented Sep 23, 2024

Automated global-workflow Testing Results:

Machine: Wcoss2
Start: Mon Sep 23 13:56:25 UTC 2024 on clogin03
---------------------------------------------------
Build: Completed at 09/23/24 02:34:19 PM
Case setup: Completed for experiment C48_ATM_fde2acb4
Case setup: Skipped for experiment C48mx500_3DVarAOWCDA_fde2acb4
Case setup: Skipped for experiment C48_S2SWA_gefs_fde2acb4
Case setup: Completed for experiment C48_S2SW_fde2acb4
Case setup: Completed for experiment C96_atm3DVar_extended_fde2acb4
Case setup: Skipped for experiment C96_atm3DVar_fde2acb4
Case setup: Completed for experiment C96C48_hybatmaerosnowDA_fde2acb4
Case setup: Completed for experiment C96C48_hybatmDA_fde2acb4
Case setup: Completed for experiment C96C48_ufs_hybatmDA_fde2acb4

@emcbot emcbot added the CI-Wcoss2-Passed **Bot use only** CI testing on WCOSS for this PR has completed successfully label Sep 24, 2024
@emcbot emcbot removed the CI-Wcoss2-Running **Bot use only** CI testing on WCOSS for this PR is in-progress label Sep 24, 2024
@emcbot
Copy link

emcbot commented Sep 24, 2024

All CI Test Cases Passed on Wcoss2:

Experiment C48_ATM_fde2acb4 *** SUCCESS *** at 09/23/24 03:56:07 PM
Experiment C48_S2SW_fde2acb4 *** SUCCESS *** at 09/23/24 04:14:13 PM
Experiment C96C48_hybatmDA_fde2acb4 *** SUCCESS *** at 09/23/24 05:07:22 PM
Experiment C96C48_hybatmaerosnowDA_fde2acb4 *** SUCCESS *** at 09/23/24 05:42:27 PM
Experiment C96C48_ufs_hybatmDA_fde2acb4 *** SUCCESS *** at 09/23/24 07:14:22 PM
Experiment C96_atm3DVar_extended_fde2acb4 *** SUCCESS *** at 09/24/24 02:35:40 AM

@WalterKolczynski-NOAA WalterKolczynski-NOAA merged commit a443fd1 into NOAA-EMC:develop Sep 27, 2024
5 checks passed
@DavidHuber-NOAA DavidHuber-NOAA deleted the feature/parallel_metp branch November 4, 2024 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI-Wcoss2-Passed **Bot use only** CI testing on WCOSS for this PR has completed successfully
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Re-enable parallel metp jobs Incorrect gfs job dependencies caused gfsmetpg2g1 failure
4 participants