Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[develop]: RUN_FCST failures when using Jinja-templated values in predef_grid_params.yaml #1006

Open
gspetro-NOAA opened this issue Jan 22, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@gspetro-NOAA
Copy link
Collaborator

Expected behavior

See GitHub Discussion #1000 for full context.
Since ush/predef_grid_params.yaml is expecting hard-coded values for its grid parameters, not Jinja-templated YAML {{...}} entries, experiment generation should fail with an appropriate error message when grid parameters are set to Jinja-templated values (e.g., WRTCMP_write_tasks_per_group: '{{ LAYOUT_Y }}'). Alternatively, the code should be refactored so that ush/predef_grid_params.yaml accepts Jinja-templated values.

Current behavior

If the user sets WRTCMP_write_tasks_per_group: '{{ LAYOUT_Y }}', the experiment is generated, but the value of NNODES_RUN_FCST cannot be properly calculated, and the experiment fails at run_fcst with an error message similar to the following:

Submission of run_fcst_mem000 failed! qsub: directive error: -l select={{ task_run_fcst.NNODES_RUN_FCST // 1 }}:mpiprocs=128:ncpus=128

var_defns.sh in the failed SRW run shows NNODES_RUN_FCST='{{ (PE_MEMBER01 + PPN_RUN_FCST - 1) // PPN_RUN_FCST }}'
Hardcoding WRTCMP_write_tasks_per_group allows the experiment to run.

Machines affected

Probably all, but certainly Derecho. See GitHub Discussion #1000 for full context.

Steps To Reproduce

Set the grid:

"RRFS_CONUS_25km":
  GRID_GEN_METHOD: "ESGgrid"
  ESGgrid_LON_CTR: -97.5
  ESGgrid_LAT_CTR: 38.5
  ESGgrid_DELX: 25000.0
  ESGgrid_DELY: 25000.0
  ESGgrid_NX: 219
  ESGgrid_NY: 131
  ESGgrid_PAZI: 0.0
  ESGgrid_WIDE_HALO_WIDTH: 6
  DT_ATMOS: 150
  LAYOUT_X: 5
  LAYOUT_Y: 2
  BLOCKSIZE: 40
  QUILTING:
    WRTCMP_write_groups: 1
    WRTCMP_write_tasks_per_group: '{{ LAYOUT_Y }}'
    WRTCMP_output_grid: "lambert_conformal"
    WRTCMP_cen_lon: '{{ task_make_grid.ESGgrid_LON_CTR }}'
    WRTCMP_cen_lat: '{{ task_make_grid.ESGgrid_LAT_CTR }}'
    WRTCMP_stdlat1: '{{ task_make_grid.ESGgrid_LAT_CTR }}'
    WRTCMP_stdlat2: '{{ task_make_grid.ESGgrid_LAT_CTR }}'
    WRTCMP_nx: 217
    WRTCMP_ny: 128
    WRTCMP_lon_lwr_left: -122.719528
    WRTCMP_lat_lwr_left: 21.138123
    WRTCMP_dx: 25000.0
    WRTCMP_dy: 25000.0

After generating the experiment, the var_defns.sh file shows:

NNODES_RUN_FCST='{{ (PE_MEMBER01 + PPN_RUN_FCST - 1) // PPN_RUN_FCST }}'

and the test fails with qsub: directive error: -l select={{ task_run_fcst.NNODES_RUN_FCST // 1 }}:mpiprocs=128:ncpus=128.

To correct this behavior, it is necessary to hard code WRTCMP_write_tasks_per_group to a particular value in ush/predef_grid_params.yaml.

Detailed Description of Fix (optional)

Additional Information (optional)

Possible Implementation (optional)

Output (optional)

@gspetro-NOAA gspetro-NOAA added the bug Something isn't working label Jan 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: No status
Development

No branches or pull requests

1 participant