Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine_diag_table_yamls.py opens incorrectly formatted diag yamls, returns hard to understand error message when it fails to parse them #54

Closed
uwagura opened this issue Jul 19, 2024 · 3 comments

Comments

@uwagura
Copy link

uwagura commented Jul 19, 2024

This issue is just formally laying out the problem I described at the 07/12 workflow group meeting. When you pass the a yaml file to combine_diag_table_yamls.py that is formatted as "key:value" instead of "key: value", the script fails with the following error: TypeError: string indices must be integers. When run within fre/bronx, this error is quite difficult to debug, and users may not think to check the formatting of their yaml file.

It looks like combine_diag_table_yamls is outputting this error because the yaml.safe_load() function called on line 121 reads the incorrect key value pairs as one long string, rather than as a dictionary. When it finds the key base_datewithin the string on line 123, it tries to index the string using another string on the next line, leading to the string indices must be integerserror.

Note that this error only occurs when you pass in a yaml file that is just a series of incorrect key value pairs. If the file contains a correct key value pair or a list after the incorrect ones, the error will instead say yaml.scanner.ScannerError: mapping values are not allowed here. I guess this is because it is trying to interpret the incorrect key value pairs as a single string that serves as a key to the next key in the yaml file.

If combine_diag_table_yamls.py is not called, frerun seems to read the yaml file without any errors, even if it contains an incorrect key-value pair. It may be good to ensure that any other script which reads yaml files outputs clearer error messages, in case a user gets one of the above error messages while running the experiment. Alternatively, it may be a good idea to add error checking so that frerun throws an error before writing the run script.

I think the simplest solution to this issue would be just adding a print statement that just tells the user to recheck the key value pairs on the relevant lines in their yaml file if a ScannerError or TypeError occurs. I can make this change or any other solution you think would be better if you need me to. Let me know what you think!

@uwagura
Copy link
Author

uwagura commented Jul 24, 2024

If combine_diag_table_yamls.py is not called, frerun seems to read the yaml file without any errors, even if it contains an incorrect key-value pair. It may be good to ensure that any other script which reads yaml files outputs clearer error messages, in case a user gets one of the above error messages while running the experiment. Alternatively, it may be a good idea to add error checking so that frerun throws an error before writing the run script.

@ceblanton , @uramirez8707 , regarding this issue: I'm currently running regional mom6 with an incorrectly formatted yaml, and the run hasn't had any issues even though I'm almost two hours in. I took a look at the output from the model so far, and it appears to be stuck on the following step: Starting to initialize diag_manager at 20240724 112529.966.

I'm not sure if this is because of the incorrect yaml, but it's strange to me that frerun was able to run this long without throwing any errors. Do either of you know what may be going on here?

@uramirez8707
Copy link
Contributor

Yes, libyaml hangs when using an incorrect yaml.

Frerun should be validating the yaml file before attempting to run the model.

@uramirez8707
Copy link
Contributor

Fixed in #55

@ceblanton , @uramirez8707 , regarding this issue: I'm currently running regional mom6 with an incorrectly formatted yaml, and the run hasn't had any issues even though I'm almost two hours in. I took a look at the output from the model so far, and it appears to be stuck on the following step: Starting to initialize diag_manager at 20240724 112529.966.

Fixed in NOAA-GFDL/FMS#1563

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants