Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Configure hooks are busted now that context manager is used to generate slurm.conf ondemand #47

Open
2 tasks
NucciTheBoss opened this issue Jul 27, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@NucciTheBoss
Copy link
Member

The component of the Slurm snap configuration hook responsible for modifying slurm.conf is broken as of #41. It seems that calculating the length of NodeMap is broken, or the conditional len(config.nodes) != 0 doesn't actually return the boolean value expected. This indicates to me that the default SlurmConfig object generated by the context manager in slurmutils doesn't actually preseed the configuration fields correctly. This needs further triage on slurmutils to validate that the issue is in fact caused by slurmutils and not the Slurm snap itself.

Current workaround is to just set the slurm.conf file directly rather than through the Slurm snap configuration hooks.

TODOs

  • Improve integration test coverage of the Slurm snap to also cover the configuration hooks used to modify the slurm.conf file.
  • Fix issue in upstream slurmutils.

Relevant logs

From hpc-libs integration tests

Traceback (most recent call last):
  File "/root/lib/charms/hpc_libs/v0/slurm_ops.py", line 150, in _call
    return subprocess.check_output(cmd, input=stdin, stderr=subprocess.PIPE, text=True).strip()
  File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['snap', 'set', 'slurm', 'slurm.slurmctld-host="test-slurm-ops"', 'slurm.cluster-name="test-cluster"']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/venv/lib/python3.10/site-packages/_pytest/runner.py", line 341, in from_call
    result: Optional[TResult] = func()
  File "/root/venv/lib/python3.10/site-packages/_pytest/runner.py", line 262, in <lambda>
    lambda: ihook(item=item, **kwds), when=when, reraise=reraise
  File "/root/venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 513, in __call__
    return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
  File "/root/venv/lib/python3.10/site-packages/pluggy/_manager.py", line 120, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
  File "/root/venv/lib/python3.10/site-packages/pluggy/_callers.py", line 182, in _multicall
    return outcome.get_result()
  File "/root/venv/lib/python3.10/site-packages/pluggy/_result.py", line 100, in get_result
    raise exc.with_traceback(exc.__traceback__)
  File "/root/venv/lib/python3.10/site-packages/pluggy/_callers.py", line 103, in _multicall
    res = hook_impl.function(*args)
  File "/root/venv/lib/python3.10/site-packages/_pytest/runner.py", line 177, in pytest_runtest_call
    raise e
  File "/root/venv/lib/python3.10/site-packages/_pytest/runner.py", line 169, in pytest_runtest_call
    item.runtest()
  File "/root/venv/lib/python3.10/site-packages/_pytest/python.py", line 1792, in runtest
    self.ihook.pytest_pyfunc_call(pyfuncitem=self)
  File "/root/venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 513, in __call__
    return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
  File "/root/venv/lib/python3.10/site-packages/pluggy/_manager.py", line 120, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
  File "/root/venv/lib/python3.10/site-packages/pluggy/_callers.py", line 139, in _multicall
    raise exception.with_traceback(exception.__traceback__)
  File "/root/venv/lib/python3.10/site-packages/pluggy/_callers.py", line 103, in _multicall
    res = hook_impl.function(*args)
  File "/root/venv/lib/python3.10/site-packages/_pytest/python.py", line 194, in pytest_pyfunc_call
    result = testfunction(**testargs)
  File "/root/slurm_ops/test_manager.py", line 42, in test_slurm_config
    slurmctld.config.set({"slurmctld-host": "test-slurm-ops", "cluster-name": "test-cluster"})
  File "/root/lib/charms/hpc_libs/v0/slurm_ops.py", line 252, in set
    _snap("set", "slurm", *args)
  File "/root/lib/charms/hpc_libs/v0/slurm_ops.py", line 163, in _snap
    return _call("snap", *args)
  File "/root/lib/charms/hpc_libs/v0/slurm_ops.py", line 1[54](https://github.com/charmed-hpc/hpc-libs/actions/runs/10117689148/job/27983406375#step:6:55), in _call
    raise SlurmOpsError(f"command {cmd[0]} failed. Reason:\n{e.stderr}")
lib.charms.hpc_libs.v0.slurm_ops.SlurmOpsError: command snap failed. Reason:
error: cannot perform the following tasks:
- Run configure hook of "slurm" snap (run hook "configure": 
-----
Traceback (most recent call last):
  File "/snap/slurm/503/snap/hooks/configure", line 9, in <module>
    sys.exit(configure(Snap()))
  File "/snap/slurm/503/lib/python3.10/site-packages/slurmhelpers/hooks.py", line 125, in configure
    slurm.update_config(options["slurm"])
  File "/snap/slurm/503/lib/python3.10/site-packages/slurmhelpers/models.py", line 406, in update_config
    with slurmconfig.edit(self.config_file) as sconf:
  File "/snap/slurm/503/usr/lib/python3.10/contextlib.py", line 142, in __exit__
    next(self.gen)
  File "/snap/slurm/503/lib/python3.10/site-packages/slurmutils/editors/slurmconfig.py", line 185, in edit
    dump(content=config, file=file)
  File "/snap/slurm/503/lib/python3.10/site-packages/slurmutils/editors/_editor.py", line 39, in dump_base
    return loc.write_text(marshaller(content), encoding="ascii")
  File "/snap/slurm/503/lib/python3.10/site-packages/slurmutils/editors/slurmconfig.py", line 73, in _marshaller
    if len(config.nodes) != 0:
  File "/snap/slurm/503/lib/python3.10/site-packages/slurmutils/models/slurm.py", line [73](https://github.com/charmed-hpc/hpc-libs/actions/runs/10117689148/job/27983406375#step:6:74)0, in nodes
    return NodeMap(self._register["nodes"])
KeyError: 'nodes'
-----)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant