NNODES_RUN_FCST in task_run_fcst #1000
-
Hi, UFS/SRW experts - I created a new grid (12 km over NE US) at Derecho with SRW v2.2 public release. The new grid is specified in task_make_grid section in config.yaml. I also specify WRTCMP related parameters in task_run_fcst section. The SRW experiments with the new grid went well. I then add this new grid to PREDEF_GRID_NAME by updating valid_param_vals.yaml and predef_grid_params.yaml. The parameters removed from config.yaml and added to predef_grid_params.yaml are listed at the end of the msg. The experiment is halted at the fcst step. FV3LAM_wflow.log shows: Submission of run_fcst_mem000 failed! qsub: directive error: -l select={{ task_run_fcst.NNODES_RUN_FCST // 1 }}:mpiprocs=128:ncpus=128 var_defns.sh in the failed SRW run shows My quick fix is to hardcode NNODES_RUN_FCST to 1 in config.yaml and the SRW exp now runs ok. Thanks -Sarah GRID_GEN_METHOD: ESGgrid |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 6 replies
-
Hi @SarahLu-NOAA , Just wanted to let you know that I am working on getting a subject matter expert (SME) to assist with your question. Hopefully we can get you an answer by early next week, but in the meantime, I didn't want to leave you hanging. Best, |
Beta Was this translation helpful? Give feedback.
-
Hello @SarahLu-NOAA - I was able to replicate what you are seeing with your new grid, but using the
Upon doing so, the
and the test failed with I was able to correct this behavior by setting In order to correct this, you will need to set Please let me know if you continue to encounter this issue even with this modification. |
Beta Was this translation helpful? Give feedback.
-
@mkavulich @MichaelLueken FYI I added an Issue #1006 to document the lack of failure on experiment generation. |
Beta Was this translation helpful? Give feedback.
Hi @SarahLu-NOAA,
Yes, you will need to hard code WRTCMP_write_tasks_per_group to 4 in ush/predef_grid_params.yaml. Doing so should let your experiment run.
I've been seeing long queue times on Derecho this week as well. It could be due to final updates to AMS presentations, so hopefully the queue times will alleviate at the end of the month (though, that doesn't help in the interim).
The issue is that the ush/predef_grid_params.yaml file is expecting hard coded values for the listed parameters, not YAML {{...}} entries. Now, I can't speak on why this is the case, but NNODES_RUN_FCST is affected by PE_MEMBER01, which itself is affected by WRTCMP_write_tasks_per_group, and if this paramete…