You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The component of the Slurm snap configuration hook responsible for modifying slurm.conf is broken as of #41. It seems that calculating the length of NodeMap is broken, or the conditional len(config.nodes) != 0 doesn't actually return the boolean value expected. This indicates to me that the default SlurmConfig object generated by the context manager in slurmutils doesn't actually preseed the configuration fields correctly. This needs further triage on slurmutils to validate that the issue is in fact caused by slurmutils and not the Slurm snap itself.
Current workaround is to just set the slurm.conf file directly rather than through the Slurm snap configuration hooks.
TODOs
Improve integration test coverage of the Slurm snap to also cover the configuration hooks used to modify the slurm.conf file.
Fix issue in upstream slurmutils.
Relevant logs
From hpc-libs integration tests
Traceback (most recent call last):
File "/root/lib/charms/hpc_libs/v0/slurm_ops.py", line 150, in _callreturn subprocess.check_output(cmd, input=stdin, stderr=subprocess.PIPE, text=True).strip()
File "/usr/lib/python3.10/subprocess.py", line 421, in check_outputreturn run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/usr/lib/python3.10/subprocess.py", line 526, in runraise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['snap', 'set', 'slurm', 'slurm.slurmctld-host="test-slurm-ops"', 'slurm.cluster-name="test-cluster"']' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/venv/lib/python3.10/site-packages/_pytest/runner.py", line 341, in from_call
result: Optional[TResult] = func()
File "/root/venv/lib/python3.10/site-packages/_pytest/runner.py", line 262, in <lambda>lambda: ihook(item=item, **kwds), when=when, reraise=reraise
File "/root/venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 513, in __call__returnself._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
File "/root/venv/lib/python3.10/site-packages/pluggy/_manager.py", line 120, in _hookexecreturnself._inner_hookexec(hook_name, methods, kwargs, firstresult)
File "/root/venv/lib/python3.10/site-packages/pluggy/_callers.py", line 182, in _multicallreturn outcome.get_result()
File "/root/venv/lib/python3.10/site-packages/pluggy/_result.py", line 100, in get_resultraise exc.with_traceback(exc.__traceback__)
File "/root/venv/lib/python3.10/site-packages/pluggy/_callers.py", line 103, in _multicall
res = hook_impl.function(*args)
File "/root/venv/lib/python3.10/site-packages/_pytest/runner.py", line 177, in pytest_runtest_callraise e
File "/root/venv/lib/python3.10/site-packages/_pytest/runner.py", line 169, in pytest_runtest_call
item.runtest()
File "/root/venv/lib/python3.10/site-packages/_pytest/python.py", line 1792, in runtestself.ihook.pytest_pyfunc_call(pyfuncitem=self)
File "/root/venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 513, in __call__returnself._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
File "/root/venv/lib/python3.10/site-packages/pluggy/_manager.py", line 120, in _hookexecreturnself._inner_hookexec(hook_name, methods, kwargs, firstresult)
File "/root/venv/lib/python3.10/site-packages/pluggy/_callers.py", line 139, in _multicallraise exception.with_traceback(exception.__traceback__)
File "/root/venv/lib/python3.10/site-packages/pluggy/_callers.py", line 103, in _multicall
res = hook_impl.function(*args)
File "/root/venv/lib/python3.10/site-packages/_pytest/python.py", line 194, in pytest_pyfunc_call
result = testfunction(**testargs)
File "/root/slurm_ops/test_manager.py", line 42, in test_slurm_config
slurmctld.config.set({"slurmctld-host": "test-slurm-ops", "cluster-name": "test-cluster"})
File "/root/lib/charms/hpc_libs/v0/slurm_ops.py", line 252, in set
_snap("set", "slurm", *args)
File "/root/lib/charms/hpc_libs/v0/slurm_ops.py", line 163, in _snapreturn _call("snap", *args)
File "/root/lib/charms/hpc_libs/v0/slurm_ops.py", line 1[54](https://github.com/charmed-hpc/hpc-libs/actions/runs/10117689148/job/27983406375#step:6:55), in _call
raise SlurmOpsError(f"command {cmd[0]} failed. Reason:\n{e.stderr}")
lib.charms.hpc_libs.v0.slurm_ops.SlurmOpsError: command snap failed. Reason:error: cannot perform the following tasks:
- Run configure hook of "slurm" snap (run hook "configure":
-----
Traceback (most recent call last):
File "/snap/slurm/503/snap/hooks/configure", line 9, in <module>
sys.exit(configure(Snap()))
File "/snap/slurm/503/lib/python3.10/site-packages/slurmhelpers/hooks.py", line 125, in configure
slurm.update_config(options["slurm"])
File "/snap/slurm/503/lib/python3.10/site-packages/slurmhelpers/models.py", line 406, in update_configwith slurmconfig.edit(self.config_file) as sconf:
File "/snap/slurm/503/usr/lib/python3.10/contextlib.py", line 142, in __exit__next(self.gen)
File "/snap/slurm/503/lib/python3.10/site-packages/slurmutils/editors/slurmconfig.py", line 185, in edit
dump(content=config, file=file)
File "/snap/slurm/503/lib/python3.10/site-packages/slurmutils/editors/_editor.py", line 39, in dump_basereturn loc.write_text(marshaller(content), encoding="ascii")
File "/snap/slurm/503/lib/python3.10/site-packages/slurmutils/editors/slurmconfig.py", line 73, in _marshalleriflen(config.nodes) !=0:
File "/snap/slurm/503/lib/python3.10/site-packages/slurmutils/models/slurm.py", line [73](https://github.com/charmed-hpc/hpc-libs/actions/runs/10117689148/job/27983406375#step:6:74)0, in nodes
return NodeMap(self._register["nodes"])
KeyError: 'nodes'
-----)
The text was updated successfully, but these errors were encountered:
The component of the Slurm snap configuration hook responsible for modifying slurm.conf is broken as of #41. It seems that calculating the length of
NodeMap
is broken, or the conditionallen(config.nodes) != 0
doesn't actually return the boolean value expected. This indicates to me that the defaultSlurmConfig
object generated by the context manager in slurmutils doesn't actually preseed the configuration fields correctly. This needs further triage on slurmutils to validate that the issue is in fact caused by slurmutils and not the Slurm snap itself.Current workaround is to just set the slurm.conf file directly rather than through the Slurm snap configuration hooks.
TODOs
Relevant logs
From hpc-libs integration tests
The text was updated successfully, but these errors were encountered: