Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update WW3 for fix for MPI reproducibility #911

Merged

Conversation

JessicaMeixner-NOAA
Copy link
Collaborator

@JessicaMeixner-NOAA JessicaMeixner-NOAA commented Nov 12, 2021

PR Checklist

  • Ths PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR. Please consult the ufs-weather-model wiki if you are unsure how to do this.

  • This PR has been tested using a branch which is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR

  • An Issue describing the work contained in this PR has been created either in the subcomponent(s) or in the ufs-weather-model. The Issue should be created in the repository that is most relevant to the changes in contained in the PR. The Issue and the dependent sub-component PR
    are specified below.

  • Results for one or more of the regression tests change and the reasons for the changes are understood and explained below.

  • New or updated input data is required by this PR. If checked, please work with the code managers to update input data sets on all platforms.

Instructions: All subsequent sections of text should be filled in as appropriate.

The information provided below allows the code managers to understand the changes relevant to this PR, whether those changes are in the ufs-weather-model repository or in a subcomponent repository. Ufs-weather-model code managers will use the information provided to add any applicable labels, assign reviewers and place it in the Commit Queue. Once the PR is in the Commit Queue, it is the PR owner's responsiblity to keep the PR up-to-date with the develop branch of ufs-weather-model.

Description

This PR updates WW3 for a fix to the MPI reproducibility issue (#427 NOAA-EMC/WW3#518). The issue was that if ice is updated throughout the run and you are using IC0 (which we are in UFS) points that become ice covered beyond a threshold are "deactivated" however, when using MPI, these points values were not being properly zero'd out as expected. This has been fixed along with two other bug fixes (NOAA-EMC/WW3#512 NOAA-EMC/WW3#519) needed to run WW3 in debug mode (not completely finished or included in this PR, but these two required bug fixes are). Answers will therefore change for wave tests which were effected by this bug (S2SW tests, HAFS tests with waves did not change answers). The "cpld_bmark_mpi_p7" test was added to show that this bug was fixed as it is not apparent in the "cpld_control_wave_p7" test. In addition to continue to test that WW3 is thread safe and passes some of the other tests, cpld_decomp_wave_p7, cpld_mpi_wave_p7, and cpld_2threads_wave_p7 were added.

This PR also uses a new WW3 input directory which reduced the number of spectral points used in WW3 low resolution 1 deg grid (#822) which in turn reduces the computational requirements for tests using the 1 deg grid. Tests using this grid were profiled using ESMF on hera and resources were adjusted. Additionally the ATMW tests were set-up to have machine dependent resources. Any test using the 1 deg wave grid will have answer changes.

A 5 deg grid and it's documentation is included in the new WW3 input that can be used for future low resolution tests.
Lastly, the 1deg, 2deg and 5 deg tests have been updated to work for MOM6 coupling (issue #913), by making sure the namelist option:
&OUTS USSP = 1, IUSSP = 3, STK_WN = 0.04, 0.110, 0.3305 /
was included in the appropriate ww3_grid.inp files.

The new input that its required is currently staged on hera at: /scratch1/NCEPDEV/nems/emc.nemspara/RT/NEMSfv3gfs/input-data-20210930/WW3_input_data_20211113 (updated from original PR)

Co-authors: @aliabdolali @MatthewMasarik-NOAA

Issue(s) addressed

Link the issues to be closed with this PR, whether in this repository, or in another repository.

Testing

How were these changes tested? What compilers / HPCs was it tested with? Are the changes covered by regression tests? (If not, why? Do new tests need to be added?) Have regression tests and unit tests (utests) been run? On which platforms and with which compilers? (Note that unit tests can only be run on tier-1 platforms)

This was tested in earlier versions on multiple platforms (dell, orion, gaea, hera.intel), with the most recent code a baseline was created and shown to match using hera.intel

  • hera.intel
  • hera.gnu
  • orion.intel
  • cheyenne.intel
  • cheyenne.gnu
  • gaea.intel
  • jet.intel
  • wcoss_cray
  • wcoss_dell_p3
  • opnReqTest for newly added/changed feature: skip until oRTs can be updated to use S2SW
  • CI: skip due to changes in this PR

Dependencies

JessicaMeixner-NOAA and others added 30 commits October 1, 2021 19:44
…/ufs-weather-model into feature/addwavetests
* cpld_bmark_mpi_p7 was run separately
@JessicaMeixner-NOAA
Copy link
Collaborator Author

@DeniseWorthen I think the Cheyenne and gaea test failed for the same reason gaea did:
TPN_cpl_bmrk_mpi=40 not TPN_cpl_bmrk_mpi=36 for each.

@JessicaMeixner-NOAA
Copy link
Collaborator Author

@DeniseWorthen I think the Cheyenne and gaea test failed for the same reason gaea did:
TPN_cpl_bmrk_mpi=40 not TPN_cpl_bmrk_mpi=36 for each.

Although I just saw you pushed the Cheyenne log. Should I make this change or not?

@DeniseWorthen
Copy link
Collaborator

@JessicaMeixner-NOAA Please make the change for gaea only while I think about what to do w/ Cheyenne.

@JessicaMeixner-NOAA
Copy link
Collaborator Author

@DeniseWorthen I made the update for gaea, sorry about the issue on both machines for this test.

@DeniseWorthen
Copy link
Collaborator

@JessicaMeixner-NOAA This sort of error is all too easy w/ the number of platforms and number of tasking options w/ the coupled model.

Please go ahead and update the Cheyenne tasks. I will re-run that test and make an empty commit that tests passes a second time.

@BrianCurtis-NOAA
Copy link
Collaborator

Automated RT Failure Notification
Machine: jet
Compiler: intel
Job: BL
Repo location: /lfs4/HFIP/h-nems/emc.nemspara/autort/pr/779412862/20211116163015/ufs-weather-model
Please manually delete: /lfs4/HFIP/h-nems/emc.nemspara/RT_RUNDIRS/emc.nemspara/FV3_RT/rt_168928
Baseline creation and move successful
Repo location: /lfs4/HFIP/h-nems/emc.nemspara/autort/pr/779412862/20211116182248/ufs-weather-model
Please manually delete: /lfs4/HFIP/h-nems/emc.nemspara/RT_RUNDIRS/emc.nemspara/FV3_RT/rt_224474
Test cpld_bmark_mpi_p7 005 failed in run_test failed
Please make changes and add the following label back:
jet-intel-BL

* cpld_bmark_mpi_p7 was re-run after fixing tasking setting
for gaea
* single test was rerun on cheyenne.intel to confirm
test passes w/ tasking fix
* cpld_bmark_mpi_p7 test was repeated after fixing tasking
@JessicaMeixner-NOAA
Copy link
Collaborator Author

Are we waiting on the CI or should we go ahead and start the process of merging the WW3 submodule?

@DeniseWorthen
Copy link
Collaborator

@MinsukJi-NOAA CI will need an update in the next PR, is that right?

@MinsukJi-NOAA
Copy link
Contributor

@MinsukJi-NOAA CI will need an update in the next PR, is that right?

I suggest we skip CI for this PR; it will be fixed in the next PR.

@junwang-noaa
Copy link
Collaborator

junwang-noaa commented Nov 17, 2021 via email

@DeniseWorthen DeniseWorthen merged commit 546cdc0 into ufs-community:develop Nov 17, 2021
@aliabdolali
Copy link
Collaborator

thank you all for doing this.

@JessicaMeixner-NOAA JessicaMeixner-NOAA deleted the feature/addwavetests branch November 17, 2021 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Baseline Updates Current baselines will be updated. New Input Data Req'd This PR requires new data to be sync across platforms
Projects
None yet
7 participants