Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[develop] Updates to devclean.sh script and plotting scripts and tasks #1100

Merged

Conversation

natalie-perlin
Copy link
Collaborator

@natalie-perlin natalie-perlin commented Jul 3, 2024

DESCRIPTION OF CHANGES:

  • ./devclean.sh script that cleans SRW builds is updated, all the cleaning tasks are done for the directories under the main SRW tree

  • Documentation updated for the devclean.sh script changes

  • plotting scripts updated to have geographical data visible over the colored fields

  • plotting task updated to allow graphics output for individual ensemble members

  • use python3 to checkout external sub-modules in a checkout_externals script; python3 is a default for other scripts; some systems such as MacOS no longer come with python2

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update (UPDATED)

TESTS CONDUCTED:

Tested various combinations of cleaning up tasks on MacOS

  • hera.intel
  • orion.intel
  • hercules.intel
  • cheyenne.intel
  • cheyenne.gnu
  • derecho.intel
  • gaea.intel
  • gaeac5.intel
  • jet.intel
  • wcoss2.intel
  • NOAA Cloud (indicate which platform)
  • Jenkins
  • fundamental test suite
  • comprehensive tests (specify which if a subset was used)

DEPENDENCIES:

natalie-perlin#12 (Merged)

DOCUMENTATION:

ISSUE:

devclean.sh script update addresses the issue in #1073 (devclean.sh demonstrates very unsafe directory removal behavior)

CHECKLIST

  • My code follows the style guidelines in the Contributor's Guide
  • I have performed a self-review of my own code using the Code Reviewer's Guide
  • I have commented my code, particularly in hard-to-understand areas
  • My changes need updates to the documentation. I have made corresponding changes to the documentation
  • My changes do not require updates to the documentation (explain).
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • Any dependent changes have been merged and published

LABELS (optional):

A Code Manager needs to add the following labels to this PR:

  • Work In Progress
  • bug
  • enhancement
  • documentation
  • release
  • high priority
  • run_ci
  • run_we2e_fundamental_tests
  • run_we2e_comprehensive_tests
  • Needs Cheyenne test
  • Needs Jet test
  • Needs Hera test
  • Needs Orion test
  • help wanted

CONTRIBUTORS (optional):

@mkavulich

@MichaelLueken MichaelLueken changed the title Updates to devclean.sh script and plotting scripts and tasks [develop] Updates to devclean.sh script and plotting scripts and tasks Jul 8, 2024
@MichaelLueken MichaelLueken linked an issue Jul 8, 2024 that may be closed by this pull request
@MichaelLueken MichaelLueken added documentation Improvements or additions to documentation enhancement New feature or request labels Jul 8, 2024
Copy link
Collaborator

@MichaelLueken MichaelLueken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@natalie-perlin -

I've noted a small grammatical issue in one of the devclean.sh comments.

Please make sure to remove the changes to devclean.sh, parm/wflow_plot.yaml, scripts/exregional_plot_allvars.py, and scripts/exregional_plot_allvars_diff.py from PR #1089.

You can use the following command to back out the changes to these files in the rrfs_ics_lbcs branch:

git checkout 28cbbc8 devclean.sh parm/wflow/plot.yaml scripts/exregional_plot_allvars.py scripts/exregional_plot_allvars_diff.py

This command will checkout the version of the four files at the 28cbbc8 hash of your branch, the hash before you began applying changes related to the PR.

devclean.sh Outdated Show resolved Hide resolved
Co-authored-by: Michael Lueken <[email protected]>
EdwardSnyder-NOAA and others added 11 commits August 15, 2024 19:00
* Adds logic to handle GCP's default conda env, which conflicts with the SRW App's conda env. Fixes a Parallel Works naming convention bug in the script.
* It also addresses a known issue with a Ruby warning on PW instances that prevents the run_WE2E_tests.py from exiting gracefully. The solution we use in our bootstrap for /contrib doesn't seem to work for the /lustre directory, which is why the warning is hardcoded into the monitor_jobs.py script.
* The new spack-stack build on Azure is missing a gnu library, so added the path to this missing library to the proper run scripts and cleaned up the wflow noaacloud lua file.
* Removed log and error files from the qsub wrapper script so that qsub can generate these files with the job id in the files name. Also, fixed typo in the wrapper script.
As part of the data governance initiative, all s3 buckets will need some sort of versioning control. To meet these needs the AWS S3 Bucket was reorganized with the develop data stored under a 'develop-date' folder and the verification sample case and the document case (current_release_data) moved under a new folder called 'experiment-user-cases'.

---------

Co-authored-by: Michael Lueken <[email protected]>
…s tested in PULL_REQUEST_TEMPLATE (ufs-community#1096)

* Update ufs-weather-model hash to b5a1976 (July 30)
* Add hera.gnu, remove cheyenne.intel, cheyenne.gnu, and gaeac5.intel, and alphabetize the machines in the TESTS CONDUCTED section of the PULL_REQUEST_TEMPLATE
* Correct behavior of Jenkins Functional WorkflowTaskTests. Currently, TASK_DEPTH is set to null, resulting in no tests being run during the Functional WorkflowTaskTests stage. Replaced env with params in Jenkinsfile for setting TASK_DEPTH. Testing shows that this will correctly set TASK_DEPTH to the default value of 9 and allow the tests to run
* Removed extraneous entries from the verification scripts to remove KeyError messages in the associated verification log files
* Reapplied necessary modification to modulefiles/tasks/noaacloud/plot_allvars.local.lua to allow plotting tasks to run on NOAA cloud platforms
 - Format, remove old comment
 - Only remove conda if the location in conda_loc is the same as the
default conda install
 - Default case (no flags provided) now does nothing except print usage
and exit
 - Include "--build" flag to remove build artifacts
 - Include short aliases for all the removal options
Updates and simplifications of the devclean.sh script
Safety and simplification updates for devclean.sh
@natalie-perlin
Copy link
Collaborator Author

All pull request by @mkavulich has been merged natalie-perlin#12, and all the questions resolved.

Copy link
Collaborator

@MichaelLueken MichaelLueken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@natalie-perlin -

While running the fundamental WE2E tests on Hera, the grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot test failed. In PR #1098, the default LOAD_MODULES_RUN_TASK_FP was changed to LOAD_MODULES_RUN_TASK. When this change was applied to parm/wflow/plot.yaml in your scripts_plots_updates branch, the test successfully ran. Please make this modification in your parm/wflow/plot.yaml file.

Additionally, please convert one of the currently existing ensemble WE2E tests to exercise the plotting capability that is being added in this PR. Adding a new feature without adding the ability to test it is bad practice.

parm/wflow/plot.yaml Outdated Show resolved Hide resolved
@natalie-perlin
Copy link
Collaborator Author

@mkavulich @MichaelLueken -
All updated. I've added plotting fields for individual ensemble members to the test grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16

It is however stuck in the queue, and shown as "PENDING" for few hours:

[Natalie.Perlin@Hera:/scratch2/NCEPDEV/stmp1/Natalie.Perlin/SRW/ufs-srweather-app]$ squeue -u Natalie.Perlin
JOBID PARTITION NAME USER STATE TIME TIME_LIMIT NODES NODELIST(REASON)
65257017 hera make_grid Natalie.Perlin PENDING 0:00 20:00 1 (Priority)
65257038 hera run_MET_PcpCombine_obs_A Natalie.Perlin PENDING 0:00 30:00 1 (Priority)
65257039 hera run_MET_PcpCombine_obs_A Natalie.Perlin PENDING 0:00 30:00 1 (Priority)
65257040 hera run_MET_PcpCombine_obs_A Natalie.Perlin PENDING 0:00 30:00 1 (Priority)
65257045 hera run_MET_Pb2nc_obs Natalie.Perlin PENDING 0:00 30:00 1 (Priority)

Not sure if there is an issue of running jobs on Hera. I could try using a different platform that may have a lighter compute load and may complete test faster... let me know!

Attached is a snapshot of the built documentation showing the update made for devclean.sh script.
devclean_doc_updates

@natalie-perlin
Copy link
Collaborator Author

@MichaelLueken - the test grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16 with plots for each ensemble has completed successfully on Hera, in /scratch2/NCEPDEV/stmp1/Natalie.Perlin/SRW/expt_dirs/grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16

Copy link
Collaborator

@MichaelLueken MichaelLueken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@natalie-perlin -

Thanks for making the requested changes! I have successfully run the fundamental WE2E tests as well:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2  COMPLETE              11.70
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE               8.55
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot  COMPLETE              26.05
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024081  COMPLETE              46.78
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240819195  COMPLETE              27.05
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024081919530  COMPLETE              48.16
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             168.29

and the plots for the ensemble test successfully completed as well:

plot_allvars_mem001_f000_202105121200     SUCCEEDED         205.0           1.37
plot_allvars_mem001_f001_202105121200     SUCCEEDED         204.0           1.36
plot_allvars_mem001_f002_202105121200     SUCCEEDED         204.0           1.36
plot_allvars_mem001_f003_202105121200     SUCCEEDED         205.0           1.37
plot_allvars_mem001_f004_202105121200     SUCCEEDED         204.0           1.36
plot_allvars_mem001_f005_202105121200     SUCCEEDED         205.0           1.37
plot_allvars_mem001_f006_202105121200     SUCCEEDED         205.0           1.37
plot_allvars_mem002_f000_202105121200     SUCCEEDED         205.0           1.37
plot_allvars_mem002_f001_202105121200     SUCCEEDED         204.0           1.36
plot_allvars_mem002_f002_202105121200     SUCCEEDED         204.0           1.36
plot_allvars_mem002_f003_202105121200     SUCCEEDED         204.0           1.36
plot_allvars_mem002_f004_202105121200     SUCCEEDED         206.0           1.37
plot_allvars_mem002_f005_202105121200     SUCCEEDED         205.0           1.37
plot_allvars_mem002_f006_202105121200     SUCCEEDED         205.0           1.37

It would be a good idea to modify the description for the config.grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16.yaml configuration to include that it also tests the ability to produce graphics for each ensemble member.

Since the fundamental WE2E tests are all running successfully (with the addition of adding plotting to the ensemble test in the fundamental suite), I will re-approve this work now.

@natalie-perlin
Copy link
Collaborator Author

@natalie-perlin -

Thanks for making the requested changes! I have successfully run the fundamental WE2E tests as well:


It would be a good idea to modify the description for the `config.grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16.yaml` configuration to include that it also tests the ability to produce graphics for each ensemble member.

Since the fundamental WE2E tests are all running successfully (with the addition of adding plotting to the ensemble test in the fundamental suite), I will re-approve this work now.

@MichaelLueken - where the description of the test config.grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16.yaml
could be changed? Is it in the documentation or any comments in the code?

@MichaelLueken
Copy link
Collaborator

@MichaelLueken - where the description of the test config.grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16.yaml could be changed? Is it in the documentation or any comments in the code?

@natalie-perlin - If you go directly to tests/WE2E/test_configs/grids_extrn_mdls_suites_community/config.grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16.yaml, the description is at the top of the file:

metadata:
  description: |-
    This test is to ensure that the workflow running in community mode
    completes successfully on the RRFS_CONUS_25km grid using the GFS_v16
    physics suite with ICs and LBCs derived from the NAM.
    This test also runs with two ensemble members, and ensures the MET
    ensemble-specific tasks run successfully.

It would be a good idea to include that either plots are produced or graphics are created for each ensemble member in the description section above.

@natalie-perlin
Copy link
Collaborator Author

@MichaelLueken @mkavulich - updated the test description to the following:

This test also runs with two ensemble members, runs ploting tasks for each
    ensemble member, and ensures the MET ensemble-specific tasks run successfully.

Copy link
Collaborator

@mkavulich mkavulich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@natalie-perlin Thanks for the productive discussion and back-and-forth, I'm glad we were able to reach a compromise

@MichaelLueken MichaelLueken added the run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests label Aug 22, 2024
@MichaelLueken
Copy link
Collaborator

The Jenkins tests successfully passed on all machines, but the test stage was ultimately terminated for running past the 8 hour limit on Jet.

The WE2E coverage tests were manually ran on Jet and all tests successfully passed:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
community_20240822191353                                           COMPLETE              22.88
custom_ESGgrid_20240822191354                                      COMPLETE              29.55
custom_ESGgrid_Great_Lakes_snow_8km_20240822191355                 COMPLETE              26.54
custom_GFDLgrid_20240822191357                                     COMPLETE              14.60
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2021032018_202408  COMPLETE              14.85
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2022060112_48h_20  COMPLETE             109.96
get_from_HPSS_ics_RAP_lbcs_RAP_20240822191400                      COMPLETE              20.22
grid_RRFS_AK_3km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_20240822191400  COMPLETE             638.33
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20  COMPLETE              84.71
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE               8.62
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_2024  COMPLETE             962.58
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE            1932.84

Moving forward with merging this PR now.

@MichaelLueken MichaelLueken merged commit 83f173c into ufs-community:develop Aug 23, 2024
3 of 5 checks passed
@natalie-perlin natalie-perlin deleted the scripts_plots_updates branch September 5, 2024 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

devclean.sh demonstrates very unsafe directory removal behavior
7 participants