Skip to content

Enable reduced ensemble size for early cycle in marine DA #3628

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 18 commits into
base: develop
Choose a base branch
from

Conversation

AndrewEichmann-NOAA
Copy link
Contributor

@AndrewEichmann-NOAA AndrewEichmann-NOAA commented May 1, 2025

Description

Enables reduced ensemble size for early cycle in marine DA by adding math to marine DA job scripts to reassign the ensemble size and allow for rotating sampling the background of the full ensemble.

Partially resolves NOAA-EMC/GDASApp#1588 - once NOAA-EMC/GDASApp#1650 is merged, the marine DA will use NMEM_ENS_GFS instead of NMEM_ENS for the number of ensemble members for the early cycle.

Type of change

  • Bug fix (fixes something broken)
  • New feature (adds functionality)
  • Maintenance (code refactor, clean-up, new CI test, etc.)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? NO
  • Does this change require an update to any of the following submodules? NO

How has this been tested?

Multi-model day cycling + forecast test on Hera, expdir here

/scratch1/NCEPDEV/da/Andrew.Eichmann/fv3gfs/reduced-ens/global-workflow/sorc/gdas.cd/build/gdas/test/gw-ci/C48mx500_hybAOWCDA/EXPDIR/C48mx500_hybAOWCDA

COMROOT here

/scratch1/NCEPDEV/stmp4/Andrew.Eichmann/COMROOT/C48mx500_hybAOWCDA

with examination of enkfgfs ensemble and gfs_marinebmat.log to check for proper behavior.

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have documented my code, including function, input, and output descriptions
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • This change is covered by an existing CI test or a new one has been added
  • Any new scripts have been added to the .github/CODEOWNERS file with owners
  • I have made corresponding changes to the system documentation if necessary

@aerorahul
Copy link
Contributor

@AndrewEichmann-NOAA
The description indicates there is a change in GDASapp, but it is not in the PR.
Is this PR ready or should it be a draft?

@AndrewEichmann-NOAA
Copy link
Contributor Author

AndrewEichmann-NOAA commented May 2, 2025

@AndrewEichmann-NOAA The description indicates there is a change in GDASapp, but it is not in the PR. Is this PR ready or should it be a draft?

@aerorahul The GDASApp PR is NOAA-EMC/GDASApp#1650 - If you think we're ready to test I can merge it - it will break bmat in GDASApp so there should be some coordination

@AndrewEichmann-NOAA
Copy link
Contributor Author

@AndrewEichmann-NOAA The description indicates there is a change in GDASapp, but it is not in the PR. Is this PR ready or should it be a draft?

@aerorahul The GDASApp PR is NOAA-EMC/GDASApp#1650 - If you think we're ready to test I can merge it - it will break bmat in GDASApp so there should be some coordination

@aerorahul I should clarify - If the g-w PR is merged before the GDASApp PR, I expect it will not break, but the marine bmat won't rotate the background ensemble members that it copies from COMROOT. Merging the GDASApp PR without the g-w PR will make bmat fail. The GDASApp PR will be ready to test when the g-w PR is merged.

@aerorahul aerorahul added CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera CI-Orion-Ready **CM use only** PR is ready for CI testing on Orion labels May 5, 2025
@emcbot emcbot added CI-Orion-Building **Bot use only** CI testing is cloning/building on Orion CI-Hera-Building **Bot use only** CI testing is cloning/building on Hera CI-Hera-Running **Bot use only** CI testing on Hera for this PR is in-progress and removed CI-Orion-Ready **CM use only** PR is ready for CI testing on Orion CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera CI-Hera-Building **Bot use only** CI testing is cloning/building on Hera labels May 5, 2025
@emcbot
Copy link

emcbot commented May 5, 2025

Experiment C48mx500_3DVarAOWCDA FAILED on Hera in Build# 2 with error logs:

/scratch1/NCEPDEV/global/glopara/CI/3628/RUNTESTS/COMROOT/C48mx500_3DVarAOWCDA_6f00b820/logs/2021032500/gfs_marinebmat.log

Follow link here to view the contents of the above file(s): (gfs_marinebmat.log)

@emcbot emcbot added CI-Hera-Failed **Bot use only** CI testing on Hera for this PR has failed and removed CI-Hera-Running **Bot use only** CI testing on Hera for this PR is in-progress labels May 5, 2025
@emcbot
Copy link

emcbot commented May 5, 2025

Experiment C48mx500_3DVarAOWCDA FAILED on Hera in Build# 2 in
/scratch1/NCEPDEV/global/glopara/CI/3628/RUNTESTS/EXPDIR/C48mx500_3DVarAOWCDA_6f00b820

@emcbot emcbot added CI-Orion-Running **Bot use only** CI testing on Orion for this PR is in-progress and removed CI-Orion-Building **Bot use only** CI testing is cloning/building on Orion labels May 5, 2025
@emcbot
Copy link

emcbot commented May 5, 2025

Experiment C96C48mx500_S2SW_cyc_gfs FAILED on Hera in Build# 2 with error logs:

/scratch1/NCEPDEV/global/glopara/CI/3628/RUNTESTS/COMROOT/C96C48mx500_S2SW_cyc_gfs_6f00b820/logs/2021122100/enkfgfs_earc_tars_01.log

Follow link here to view the contents of the above file(s): (enkfgfs_earc_tars_01.log)

@emcbot
Copy link

emcbot commented May 5, 2025

Experiment C96C48mx500_S2SW_cyc_gfs FAILED on Hera in Build# 2 in
/scratch1/NCEPDEV/global/glopara/CI/3628/RUNTESTS/EXPDIR/C96C48mx500_S2SW_cyc_gfs_6f00b820

@emcbot emcbot added CI-Hera-Failed **Bot use only** CI testing on Hera for this PR has failed and removed CI-Hera-Failed **Bot use only** CI testing on Hera for this PR has failed labels May 5, 2025
@CatherineThomas-NOAA
Copy link
Contributor

Got it @AndrewEichmann-NOAA. Did you run a test at some point with both PRs together?

@AndrewEichmann-NOAA
Copy link
Contributor Author

AndrewEichmann-NOAA commented May 14, 2025

@CatherineThomas-NOAA Yes - it produced an enkfgdas tree in COMROOT with 30 members (since overwritten). I can re-run it if you want. EDIT it produced an enkfgfs tree in COMROOT with 30 members

@AndrewEichmann-NOAA
Copy link
Contributor Author

I started an experiment to confirm things behave as expected

@AndrewEichmann-NOAA
Copy link
Contributor Author

@CatherineThomas-NOAA I ran it again with the updated branches in global-workflow and GDASApp and it produces 30 ensemble members, visible here: /scratch1/NCEPDEV/stmp4/Andrew.Eichmann/COMROOT/C48mx500_hybAOWCDA

@CatherineThomas-NOAA
Copy link
Contributor

Thanks for rerunning that test @AndrewEichmann-NOAA. Looks right to me as well.

@CatherineThomas-NOAA CatherineThomas-NOAA added the JEDI Feature development to support JEDI-based DA label May 15, 2025
@AndrewEichmann-NOAA
Copy link
Contributor Author

Rerunning experiment on Hera:
EXPDIR: /scratch1/NCEPDEV/da/Andrew.Eichmann/fv3gfs/ens-test/global-workflow/sorc/gdas.cd/build/gdas/test/gw-ci/C48mx500_hybAOWCDA/EXPDIR/C48mx500_hybAOWCDA
COMROOT: /scratch1/NCEPDEV/stmp4/Andrew.Eichmann/COMROOT/C48mx500_hybAOWCDA

@AndrewEichmann-NOAA
Copy link
Contributor Author

Experiment ran several cycles, then gfs_metppcp1 failed on the last full cycle:
** ERROR: /scratch1/NCEPDEV/stmp2/Andrew.Eichmann/RUNDIRS/C48mx500_hybAOWCDA/gfs.2021032600/metppcp1.815116/precip_step1/metplus_output/gather_by_VSDB/stat_analysis/ccpa_accum24hr/C48mx500_hybAOWCDA/C48mx500_hybAOWCDA_2021032612_2021032612_00.stat was not generated or zero size
Is this a concern?

@JessicaMeixner-NOAA
Copy link
Contributor

Experiment ran several cycles, then gfs_metppcp1 failed on the last full cycle: ** ERROR: /scratch1/NCEPDEV/stmp2/Andrew.Eichmann/RUNDIRS/C48mx500_hybAOWCDA/gfs.2021032600/metppcp1.815116/precip_step1/metplus_output/gather_by_VSDB/stat_analysis/ccpa_accum24hr/C48mx500_hybAOWCDA/C48mx500_hybAOWCDA_2021032612_2021032612_00.stat was not generated or zero size Is this a concern?

I see it failed twice both in 202103260000 and 202103261800. I see that this PR: #3651 passed hera but the recent PRs tend to be tested on gaea and wcoss2.

I can try this on gaea since I think you are still getting access there -- let me know if this would be helpful. I would usually ask @CatherineThomas-NOAA about a met job as a first pass but she's on leave today. @KateFriedman-NOAA would you be familiar enough to point us in the right direction?

@KateFriedman-NOAA
Copy link
Member

@JessicaMeixner-NOAA @AndrewEichmann-NOAA taking a look at the metplus job failures...

@KateFriedman-NOAA
Copy link
Member

@AndrewEichmann-NOAA @JessicaMeixner-NOAA I see that @AndrewEichmann-NOAA's test is a modified configuration of the C48mx500_hybAOWCDA CI test case with the gfs suite turned on, among other changes. This test case does not run with the gfs suite in CI testing so this is a new problem. This modified configuration is not tested in develop currently. Is there a desire to turn on the gfs suite for this test case? If so, we'll need to test it separate from this PR and iron out metp job issues.

@AndrewEichmann-NOAA
Copy link
Contributor Author

@AndrewEichmann-NOAA @JessicaMeixner-NOAA I see that @AndrewEichmann-NOAA's test is a modified configuration of the C48mx500_hybAOWCDA CI test case with the gfs suite turned on, among other changes. This test case does not run with the gfs suite in CI testing so this is a new problem. This modified configuration is not tested in develop currently. Is there a desire to turn on the gfs suite for this test case? If so, we'll need to test it separate from this PR and iron out metp job issues.

This experiment was a C48mx500_hybAOWCDA case modified to run the early cycle marine DA. If that is separate from the gfs suite, I didn't mean to run that. Would there be a better existing case to test this with?

@JessicaMeixner-NOAA
Copy link
Contributor

@AndrewEichmann-NOAA - while the offset is not tested, a CI test with early cycle turned on (only 2 ensemble members) is https://github.com/NOAA-EMC/global-workflow/blob/develop/dev/ci/cases/pr/C96C48mx500_S2SW_cyc_gfs.yaml

@AndrewEichmann-NOAA
Copy link
Contributor Author

@JessicaMeixner-NOAA Would it be ok to run this with 80/30 members?

@aerorahul aerorahul removed CI-Hera-Failed **Bot use only** CI testing on Hera for this PR has failed CI-Orion-Failed **Bot use only** CI testing on Orion for this PR has failed labels May 22, 2025
@JessicaMeixner-NOAA
Copy link
Contributor

@aerorahul @KateFriedman-NOAA - @AndrewEichmann-NOAA has updated the PR description to clear up the confusions.

For next steps, does @AndrewEichmann-NOAA need another CI test without modifications? Otherwise, I think this is ready for CI. If we'd like another review from @CatherineThomas-NOAA we'll need to wait for her return tomrrow.

@aerorahul
Copy link
Contributor

Please let me know if this is ready for CI. We just merged a number of PR's, most impactful for this is #3642.

@AndrewEichmann-NOAA
Copy link
Contributor Author

@aerorahul I have a C96C48mx500_S2SW_cyc_gfs run that failed overnight due to quota overruns. I restarted and then ran into problems with the repo not updating right. I'll rebuild with the current merge and start again.

@AndrewEichmann-NOAA
Copy link
Contributor Author

AndrewEichmann-NOAA commented May 24, 2025

4eb24dd successfully completed C96C48mx500_S2SW_cyc_gfs on Hera:
EXPDIR: /scratch1/NCEPDEV/da/Andrew.Eichmann/fv3gfs/ens-ci/test/EXPDIR
COMROOT: /scratch1/NCEPDEV/stmp4/Andrew.Eichmann/COMROOT/early-ens-test

@AndrewEichmann-NOAA
Copy link
Contributor Author

AndrewEichmann-NOAA commented May 27, 2025

@aerorahul Ready to roll for CI testing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
JEDI Feature development to support JEDI-based DA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Only 30 members should be used for the early cycle
7 participants