Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Geogrid and metgrid fail with 0 exit status #252

Open
Peter9192 opened this issue May 24, 2024 · 4 comments
Open

Geogrid and metgrid fail with 0 exit status #252

Peter9192 opened this issue May 24, 2024 · 4 comments

Comments

@Peter9192
Copy link

We're trying to run WPS and WRF in an automated setting. We want the workflow to fail as soon as one of the steps fail. For this, we rely on the exit status of each of the programs.

We've noticed that under certain conditions, geogrid and metgrid fail with a 0 exit status. This happened when we ran geogrid with the wrong vtable, and again when metgrid didn't find the ./geo_em.d01.nc files.

For example:

+ /home/pkalverla1/wrf-model/WPS/geogrid.exe
Parsed 50 entries in GEOGRID.TBL
Processing domain 1 of 1
ERROR: Could not open /projects/0/prjs0914/wrf-data/default/static/WPS_GEOG/landfire_data/index

Despite this error message, the exit status was 0. Similarly for metgrid later on:

+ /home/pkalverla1/wrf-model/WPS/metgrid.exe
Processing domain 1 of 1
ERROR: Couldn't open file ./geo_em.d01.nc for input.

It seems the error originates here

if (iostatus /= 0) then
if (is_optional(idx)) then
is_not_found(idx) = .true.
call mprintf(.true.,INFORM,'Could not read ''index'' file %s for field %s', s1=trim(test_fname), &
s2=trim(source_fieldname(idx)))
call mprintf(.true.,INFORM,'This field is optional and will not be processed.')
else
call mprintf(.true.,ERROR,'Could not open %s', s1=trim(test_fname))
end if
cycle ENTRY_LOOP
end if

The ERROR level is handled in mprintf:

if (level == ERROR) then
#ifdef _GEOGRID
call parallel_abort()
#endif
#ifdef _METGRID
call parallel_abort()
#endif
stop
end if

I'm not very experienced in writing Fortran code, but I wonder if this could be solved by adding an integer status code to the stop command, or by using error stop instead, as suggested here. If so, I'm happy to open a PR.

@weiwangncar
Copy link
Collaborator

@Peter9192 First of all, you're not using standard dataset we have in the release. In the standard release, we do not have landfire_data. Second, if this dataset is optional to your run, you should set 'optional = yes' in the geogrid/GEOGRID.TBL under landfire section (see other sections as examples), or remove it from the table. For future report like this, please post it in the support Forum at https://forum.mmm.ucar.edu/.

@Peter9192
Copy link
Author

Dear @weiwangncar. Thanks for the reply.

I know this is not the standard dataset, and I know I can fix the issue by using the standard. This is not a troubleshooting request. The example is simply to illustrate a failing use case.

My point is about the behaviour of geogrid and metgrid. They print an error, but return a zero exit status. I think they should return a non-zero exit status, since they did not complete succesfully. Do you agree?

@weiwangncar
Copy link
Collaborator

@Peter9192 I'll let others to comment.

@Peter9192
Copy link
Author

Thanks; could you re-open the issue and/or notify others in that case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants