-
Notifications
You must be signed in to change notification settings - Fork 7
Conversation
Hmm... I'm having some trouble with getting NHC to install properly 😢 unit-compute-2: 16:07:37 INFO unit.compute/2.juju-log Running legacy hooks/install.
unit-compute-2: 16:07:57 INFO unit.compute/2.juju-log #### Installing NHC
unit-compute-2: 16:07:57 WARNING unit.compute/2.install tar: lbnl-nhc-1.4.3.tar.gz: Cannot open: No such file or directory
unit-compute-2: 16:07:57 WARNING unit.compute/2.install tar: Error is not recoverable: exiting now
unit-compute-2: 16:07:57 ERROR unit.compute/2.juju-log Uncaught exception while in charm code:
Traceback (most recent call last):
File "/var/lib/juju/agents/unit-compute-2/charm/./src/charm.py", line 340, in <module>
main.main(SlurmdCharm)
File "/var/lib/juju/agents/unit-compute-2/charm/venv/ops/main.py", line 548, in main
manager.run()
File "/var/lib/juju/agents/unit-compute-2/charm/venv/ops/main.py", line 527, in run
self._emit()
File "/var/lib/juju/agents/unit-compute-2/charm/venv/ops/main.py", line 516, in _emit
_emit_charm_event(self.charm, self.dispatcher.event_name)
File "/var/lib/juju/agents/unit-compute-2/charm/venv/ops/main.py", line 147, in _emit_charm_event
event_to_emit.emit(*args, **kwargs)
File "/var/lib/juju/agents/unit-compute-2/charm/venv/ops/framework.py", line 348, in emit
framework._emit(event)
File "/var/lib/juju/agents/unit-compute-2/charm/venv/ops/framework.py", line 860, in _emit
self._reemit(event_path)
File "/var/lib/juju/agents/unit-compute-2/charm/venv/ops/framework.py", line 950, in _reemit
custom_handler(event)
File "/var/lib/juju/agents/unit-compute-2/charm/./src/charm.py", line 83, in _on_install
if self._slurmd_manager.install():
File "/var/lib/juju/agents/unit-compute-2/charm/src/slurmd_ops.py", line 58, in install
if not self._install_nhc_from_tarball():
File "/var/lib/juju/agents/unit-compute-2/charm/src/slurmd_ops.py", line 93, in _install_nhc_from_tarball
full_path = base_path / os.listdir(base_path)[0]
IndexError: list index out of range Looks like the NHC tarball is not being properly staged which is causing tar to fail. |
Solved the NHC issue I was seeing in #37. Looks like NHC tarball currently isn't being properly primed within the charm 😅 |
Needed to add the part `charm: {}` to make the charm pack correctly. jedel1043 and I found that this part declaration must be included in `charmcraft.yaml`, or it will fail to pack the charm correctly. nhc will be there, but the charm won't :'( Signed-off-by: Jason C. Nucciarone <[email protected]>
Previously, if tar failed to extract the contents of the nhc tarball to `/tmp/nhc`, _install_nhc_from_tarball would throw an unhandled excepting that would cause the charm to bork. Now we catch if tar fails to extract the contents of the tarball, log the error output for the asministrator to read, and return False so that the slurmd operator can properly handle and install failure. Signed-off-by: Jason C. Nucciarone <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just two small comments and I'm ready to merge. I already tested your branch locally against the changes I made to slurmctld experimental, so we'll worry about further polishing as we further integrate COS into the charms.
Let me know if you have any questions 🔎
This also transitions our systemd notices to a new version, since the old version doesn't support services with dots on their names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - we can land as is and start focusing on integrating COS into the slurmctld operator.
canonical/operator-libs-linux#128 is close to landing so we won't have to carry an out-of-tree version of juju-systemd-notices for too long.
Description
Integrates the Slurm snap into the codebase.
How was the code tested?
WIP
Related issues and/or tasks
Checklist