Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prepexp returned non-zero exit status 1 #1261

Open
JanStreffing opened this issue Dec 27, 2024 · 5 comments · May be fixed by #1262
Open

prepexp returned non-zero exit status 1 #1261

JanStreffing opened this issue Dec 27, 2024 · 5 comments · May be fixed by #1262
Labels
error handling better error output required

Comments

@JanStreffing
Copy link
Contributor

JanStreffing commented Dec 27, 2024

I'm on branch: https://github.com/esm-tools/esm_tools/tree/feat/awiesm3-v3.4, currently on commit: 37d3b08.

After creating temporary workarounds for issues #1259 and #1260, I am stuck with this error:

nproc: 768
tasks: 768
nproc: 384
tasks: 1152
nproc: 1
tasks: 1153
nproc: 4
tasks: 1157
resubmit tasks: 1157
Traceback (most recent call last):
  File "/home/a/a270092/.local/bin/esm_runscripts", line 33, in <module>
    sys.exit(load_entry_point('esm-tools', 'console_scripts', 'esm_runscripts')())
  File "/home/a/a270092/esm_tools/src/esm_runscripts/cli.py", line 289, in main
    setup()
  File "/home/a/a270092/esm_tools/src/esm_runscripts/sim_objects.py", line 154, in __call__
    resubmit.maybe_resubmit(self.config)
  File "/home/a/a270092/esm_tools/src/esm_runscripts/resubmit.py", line 141, in maybe_resubmit
    nextrun = resubmit_recursively(config, jobtype=jobtype)
  File "/home/a/a270092/esm_tools/src/esm_runscripts/resubmit.py", line 183, in resubmit_recursively
    resubmit_batch_or_shell(config, submission_type, cluster)
  File "/home/a/a270092/esm_tools/src/esm_runscripts/resubmit.py", line 21, in resubmit_batch_or_shell
    config = config["general"]["batch"].write_simple_runscript(
  File "/home/a/a270092/esm_tools/src/esm_runscripts/batch_system.py", line 465, in write_simple_runscript
    config = config["general"]["batch"].write_het_par_wrappers(config)
  File "/home/a/a270092/esm_tools/src/esm_runscripts/batch_system.py", line 53, in write_het_par_wrappers
    return self.bs.write_het_par_wrappers(config)
  File "/home/a/a270092/esm_tools/src/esm_runscripts/slurm.py", line 263, in write_het_par_wrappers
    start_core = config[model]["start_core"]
KeyError: 'start_core'
Traceback (most recent call last):
  File "/home/a/a270092/.local/bin/esm_runscripts", line 33, in <module>
    sys.exit(load_entry_point('esm-tools', 'console_scripts', 'esm_runscripts')())
  File "/home/a/a270092/esm_tools/src/esm_runscripts/cli.py", line 289, in main
    setup()
  File "/home/a/a270092/esm_tools/src/esm_runscripts/sim_objects.py", line 119, in __call__
    self.config = prepexp.run_job(self.config)
  File "/home/a/a270092/esm_tools/src/esm_runscripts/prepexp.py", line 27, in run_job
    evaluate(config, "prepexp", "prepexp_recipe")
  File "/home/a/a270092/esm_tools/src/esm_runscripts/helpers.py", line 71, in evaluate
    config = esm_plugin_manager.work_through_recipe(
  File "/home/a/a270092/esm_tools/src/esm_plugin_manager/esm_plugin_manager.py", line 159, in work_through_recipe
    config = getattr(submodule, workitem)(config)
  File "/home/a/a270092/esm_tools/src/esm_runscripts/prepexp.py", line 260, in call_esm_runscripts_from_prepexp
    _call_esm_runscripts_internally(config, new_command, scriptsdir)
  File "/home/a/a270092/esm_tools/src/esm_runscripts/prepexp.py", line 193, in _call_esm_runscripts_internally
    subprocess.check_call(command.split(), cwd=exedir)
  File "/sw/spack-levante/mambaforge-4.11.0-0-Linux-x86_64-sobz6z/lib/python3.9/subprocess.py", line 373, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['esm_runscripts', 'awiesm3-v3.4-levante-TCO95L91-CORE2_1y.yaml', '-e', 'test9', '--open-run', '--no-motd', '--last-jobtype', 'prepcompute', '-t', 'prepcompute']' returned non-zero exit status 1.

Any idea what the underlying fault could be?

@JanStreffing
Copy link
Contributor Author

adding @mandresm, @pgierz, @hajlaci

@pgierz
Copy link
Member

pgierz commented Dec 28, 2024

You have a key error: start_core.

I think it might be worthwhile to implement some sanity checks, we already know at submit time all of the info you would need to have, so it should be easy to throw some friendlier errors...

@JanStreffing
Copy link
Contributor Author

This is a none issue. The error results when one does not give a default nproc value, and does not have any in the runscript either.

@mandresm
Copy link
Contributor

mandresm commented Jan 6, 2025

I say we label this as "better error handling" and add a esm_parser.user_error for it. @pgierz, should I take care of this or would you rather do it yourself?

@mandresm mandresm reopened this Jan 6, 2025
@mandresm mandresm added the error handling better error output required label Jan 6, 2025
@pgierz
Copy link
Member

pgierz commented Jan 6, 2025

See #1262

@pgierz pgierz linked a pull request Jan 6, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
error handling better error output required
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants