Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up input-data directories; Fix MERRA2 input; Add tests on Jet; Update CICE cap & fix time manager (was PR#664) #639

Merged
merged 136 commits into from
Jul 20, 2021

Conversation

DeniseWorthen
Copy link
Collaborator

@DeniseWorthen DeniseWorthen commented Jun 14, 2021

PR Checklist

  • Ths PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR. Please consult the ufs-weather-model wiki if you are unsure how to do this.

  • This PR has been tested using a branch which is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR

  • An Issue describing the work contained in this PR has been created either in the subcomponent(s) or in the ufs-weather-model. The Issue should be created in the repository that is most relevant to the changes in contained in the PR. The Issue and the dependent sub-component PR
    are specified below.

  • If new or updated input data is required by this PR, it is clearly stated in the text of the PR.

Instructions: All subsequent sections of text should be filled in as appropriate.

The information provided below allows the code managers to understand the changes relevant to this PR, whether those changes are in the ufs-weather-model repository or in a subcomponent repository. Ufs-weather-model code managers will use the information provided to add any applicable labels, assign reviewers and place it in the Commit Queue. Once the PR is in the Commit Queue, it is the PR owner's responsiblity to keep the PR up-to-date with the develop branch of ufs-weather-model.

Description

Creates new input-data and BM_IC directories by removing un-used coupled inputs from FV3_input_frac and BM_IC-20210212. Moves BM_IC and BM7_IC from FV3_input_frac to new BM_IC-YYYYMMDD directory.

  1. Two new input directories are currently staged in /scratch1/NCEPDEV/stmp4/Denise.Worthen/input-data-20210630 and /scratch1/NCEPDEV/stmp4/Denise.Worthen/BM_IC-20210630. These were initially created by rsync-ing the input-data-20210614 and BM_IC-20210212 directories from the baseline area on 20210614 and then making changes as required to remove un-used inputs or reorganize current BM inputs. The date of 0630 is arbitrary.

  2. Copies of nems.configure and model_configure in all input directories were removed, the copy-in of configure files in fv3_conf/*_run.IN scripts for the standalone tests were removed. All standalone tests for both intel and gnu passed.

  3. Copies of data_table in all input directories were removed

  4. New W3 inputs are required for the BM_IC_YYYYMMDD directory for use in the 35d tests; these have been added from /scratch2/NCEPDEV/climate/Jessica.Meixner/WW3ICGEFS/RestartFiles.

  5. The P7 surface ICs have been corrected and verified to be the same as those in
    prototype7-input-data-20210608/FV3_input_frac/BM7_IC

  6. The correct Merra2 ICs have been added to FV3_input_data_INCCN_aeroclim/MERRA2.

  7. The file mom6_increment.nc has been added to MOM6_IC/100/2011100100. No new input data should be needed for PR MOM6 IAU and atmos stochy restart test #668

  8. The c384 ugwd fix files have been copied from /scratch1/NCEPDEV/nems/emc.nemspara/RT/NEMSfv3gfs/prototype7-input-data-20210608/FV3_input_data384/INPUT_L127 to FV3_input_data384/INPUT_L127

The current input-data-20210614 is 311G; the current BM_IC-20210212 is 195G
The new input-data-20210630 is 121G; the new BM_IC-20210630 is 199G.

Issue(s) addressed

Fixes #638
Fixes #675
Fixes #680
Fixes #681

Fixes CICE #30
Fixes #647

Testing

Testing at commit f16dcb4 against develop-20210712 shows the following results:

  1. Hera.intel

The following fail because of the correction to the MERRA2 input data and the addition of AOD variables to the forecast files:

control_merra2_debug 061 failed in check_result
control_merra2 030 failed in check_result

The following fail because of the correction to the MERRA2 input data:

control_csawmgt_debug 064 failed in check_result
control_csawmg_debug 063 failed in check_result
control_csawmg 032 failed in check_result
control_csawmgt 033 failed in check_result

The following fails because of the fix to the global_ca variable:

cpld_ca 005 failed in check_result

The following fails because of the fix to the surface ICs:

cpld_bmark_wave_v16_noahmp 014 failed in check_result
  1. Hera.gnu all tests pass

Tests were repeated after merging the CICE update PR and the same results were obtained.

NOTE: commit 330bb0e changed dt_atmos from 225s to 300s for all bmark_v16 tests. This will now change all bmark_v16 baselines.

How were these changes tested? What compilers / HPCs was it tested with? Are the changes covered by regression tests? (If not, why? Do new tests need to be added?) Have regression tests and unit tests (utests) been run? On which platforms and with which compilers? (Note that unit tests can only be run on tier-1 platforms)

Dependencies

PR #654
CICE PR #32
Icepack PR #5
NEMS PR # 106

DeniseWorthen and others added 30 commits March 27, 2021 12:30
This reverts commit 7b826d4.
Copy link
Collaborator

@climbfuji climbfuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleanup work! Good to see that all these tests that didn't work on Cheyenne or Jet in the past are now ok. We fixed a lot of bugs in the past few months, it seems.

@BrianCurtis-NOAA
Copy link
Collaborator

Machine: jet
Compiler: intel
Job: BL
Repo location: /lfs4/HFIP/h-nems/emc.nemspara/autort/pr/669873788/20210719171516/ufs-weather-model
Please manually delete: /lfs4/HFIP/h-nems/emc.nemspara/RT_RUNDIRS/emc.nemspara/FV3_RT/rt_166076
Test control_csawmg 024 failed failed
Test control_csawmg 024 failed in run_test failed
Please make changes and add the following label back:
jet-intel-BL

@BrianCurtis-NOAA
Copy link
Collaborator

Machine: cheyenne
Compiler: intel
Job: BL
Repo location: /glade/scratch/dtcufsrt/autort/tests/auto/pr/669873788/20210719113021/ufs-weather-model
Please manually delete: /glade/scratch/dtcufsrt/FV3_RT/rt_68302
Baseline creation and move successful
Repo location: /glade/scratch/dtcufsrt/autort/tests/auto/pr/669873788/20210719122352/ufs-weather-model
Please manually delete: /glade/scratch/dtcufsrt/FV3_RT/rt_44039
Test cpld_bmark_wave_v16_p7b 014 failed in run_test failed
Please make changes and add the following label back:
cheyenne-intel-BL

@@ -36,7 +36,7 @@ export JNPES=6
export WARM_START=.T.
export NGGPS_IC=.F.
export EXTERNAL_IC=.F.
# DH* The correct setting would be .F.? However the official
# DH* The correct setting would be .F.? However the official
# regression test baseline uses MAKE_NH=.T.
#export MAKE_NH=.F.
export MAKE_NH=.T.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you don't change the MAKE_NH to be .F.? Maybe we need to change it in next PR when baseline will be updated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to set it to F then I created a new baseline and tried to compare against it using the tests as-is (changing only this value). It didn't reproduce and I didn't understand how the test was being set up so I left it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regional control is using FMS, so the history files contain fields at previous output time which do not exist in the restart run history files. Do you compare the restart files from those two runs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all I did was to change the make_nh, create a new baseline and then ran the regional_control and regional_restart tests against that baseline.

Copy link
Collaborator

@junwang-noaa junwang-noaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clean up!

@DeniseWorthen
Copy link
Collaborator Author

DeniseWorthen commented Jul 19, 2021

The jet baselines all created except for control_csawmg, which is failing after 5 hours with

 FATAL from PE   140: NaN in input field of mpp_reproducing_sum(_2d), this indicates numerical instability

This test was previously running on Jet, albeit with the incorrect Merra input.

@junwang-noaa
Copy link
Collaborator

junwang-noaa commented Jul 19, 2021 via email

@DeniseWorthen
Copy link
Collaborator Author

In my earlier testing, all the csawmg tests changed because the merra2 input changed. I don't understand the test name, but it has USE_MERRA2=.T..

@junwang-noaa
Copy link
Collaborator

I guess I found the issue. The two tests control_csawmg and control_csawmgt have different IAER (1111 and 111) and they have USE_MERRA2=.T. and are using the merra2 data /scratch1/NCEPDEV/nems/emc.nemspara/RT/NEMSfv3gfs/input-data-20210614/FV3_input_data_INCCN_aeroclim/MERRA2. Now only IAER=1011 (control_merra2 related tests) is using MERRA2 data. I think we still need the USE_MERRA2=.T. logic to set up aerosol data under MERRA2 directory for all those tests.

@DeniseWorthen
Copy link
Collaborator Author

DeniseWorthen commented Jul 20, 2021

For the cheyenne.intel wave_p7b test failure, I tried multiple times but got the same MPT shepherd terminated message.

I then made a non-wave p7b test in order to put it in debug mode. That test also fails slightly earlier (in err file) but with no additional information. That run directory is /glade/scratch/worthen/FV3_RT/rt_22461/cpld_bmark_v16_p7b

@DeniseWorthen
Copy link
Collaborator Author

I guess I found the issue. The two tests control_csawmg and control_csawmgt have different IAER (1111 and 111) and they have USE_MERRA2=.T. and are using the merra2 data /scratch1/NCEPDEV/nems/emc.nemspara/RT/NEMSfv3gfs/input-data-20210614/FV3_input_data_INCCN_aeroclim/MERRA2. Now only IAER=1011 (control_merra2 related tests) is using MERRA2 data. I think we still need the USE_MERRA2=.T. logic to set up aerosol data under MERRA2 directory for all those tests.

I didn't change anything in the csawmg tests. I only added them to cheyenne.

@junwang-noaa
Copy link
Collaborator

I guess I found the issue. The two tests control_csawmg and control_csawmgt have different IAER (1111 and 111) and they have USE_MERRA2=.T. and are using the merra2 data /scratch1/NCEPDEV/nems/emc.nemspara/RT/NEMSfv3gfs/input-data-20210614/FV3_input_data_INCCN_aeroclim/MERRA2. Now only IAER=1011 (control_merra2 related tests) is using MERRA2 data. I think we still need the USE_MERRA2=.T. logic to set up aerosol data under MERRA2 directory for all those tests.

I didn't change anything in the csawmg tests. I only added them to cheyenne.

Sorry, I thought the USE_MERRA2 was removed from control_run.IN. It is OK to use if IAER=1011 in cpld_bmark_tiled_run.IN for now, but in general other IAER options are using MERRA2 data too.

*remove csamwg test on jet in rt.conf; this test fails with
FATAL from PE   140: NaN in input field of mpp_reproducing_sum(_2d),
this indicates numerical instability

*repeat of control_2threads test which timed out on first verification
run
* job fails at startup with message MPT: shepherd terminated:
 r5i4n4.ib0.cheyenne.ucar.edu - job aborting
@DeniseWorthen
Copy link
Collaborator Author

All platforms are now complete and I think we can start updating the submodules.

@DeniseWorthen DeniseWorthen merged commit 22613e8 into ufs-community:develop Jul 20, 2021
@DeniseWorthen DeniseWorthen deleted the feature/updateBMIC branch June 15, 2022 11:57
epic-cicd-jenkins pushed a commit that referenced this pull request Apr 17, 2023
## DESCRIPTION OF CHANGES: 
1. Add a new experiment configuration variable named `DEBUG` to enable more in-depth debugging output from workflow scripts.  Set default value of `DEBUG` in `config_defaults.sh` to `"FALSE"`.
2. In experiment generation scripts, change circumstances under which different messages are printed to screen (e.g. when `VERBOSE` is `"TRUE"`, when `DEBUG` is `"TRUE"`, or always).
3. In experiment generation scripts, for clarity add new informational messages and modify some existing ones.
4. In various scripts, change "set -x" to "set +x" to reduce output clutter.  This can be changed back as necessary (e.g. for debugging).

Note that if `DEBUG` is set to `"TRUE"`, `VERBOSE` will get reset to `"TRUE"` if necessary in order to also print out all the `VERBOSE` messages.

## TESTS CONDUCTED: 
Ran the WE2E test `grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2` as-is as well as with modifications to the default values of `VERBOSE` and `DEBUG`, as follows:
1. `grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2` as-is, i.e. using default values `VERBOSE="TRUE"` and `DEBUG="FALSE"`.
2. `grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2` modified with `VERBOSE="FALSE"` (and with default of `DEBUG="FALSE"`).
3. `grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2` modified with `DEBUG="TRUE"` (and with default of `VERBOSE="TRUE"`).
4. `grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2` modified with `DEBUG="TRUE"` and `VERBOSE="FALSE"` (which should get reset to `"TRUE"`).

All tests were successful.  The experiment generation log files (`log.generate_FV3LAM_wflow.sh`) were compared and differed in the expected ways.

## DOCUMENTATION:
Necessary documentation of `DEBUG` is in `config_defaults.sh`.  Created Issue #[640 ](https://github.com/NOAA-EMC/regional_workflow/issues/640)to also update rst documentation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Baseline Updates Current baselines will be updated. New Input Data Req'd This PR requires new data to be sync across platforms Waiting for Reviews The PR is waiting for reviews from associated component PR's.
Projects
None yet
7 participants