-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(slurm_ops)!: implement SackdManager for sackd
service
#55
feat(slurm_ops)!: implement SackdManager for sackd
service
#55
Conversation
…ines Currently the Slurm prometheus exporter is only activated on slurmctld machines, so it doesn't make sense to install the exporter on all nodes. If other nodes need the Slurm exporter, then we can add directly to the package list for that specific service type. Signed-off-by: Jason C. Nucciarone <[email protected]>
BREAKING CHANGES: No longer uses the PPA `ubuntu-hpc/slurm-wlm-23.02` as it only supports Jammy, and it does not provide a `sackd` package. Instead, `_AptManager` now pulls the relevant Slurm packages from either `ubuntu-hpc/experimental` or `universe`. . Eventually we'll need to identify how we want to enable the latest and greatest Slurm version in deb format, but we'll boil that pot later. Signed-off-by: Jason C. Nucciarone <[email protected]>
Signed-off-by: Jason C. Nucciarone <[email protected]>
Signed-off-by: Jason C. Nucciarone <[email protected]>
New version contains fixes for adding third-party repositories on Noble Numbat. Signed-off-by: Jason C. Nucciarone <[email protected]>
Variable substitution in the environment file prevents us from needing to set the service type to `Type=forking` in systemd. `sackd` is started the way it should be with the `--systemd` rather than run as a child process in the shell. `slurmd` needs some further work around `juju-systemd-notices` before it can use this approach as well. Other changes: * Clean up docstrings to be accurate and reflect the current implementation of the library. Signed-off-by: Jason C. Nucciarone <[email protected]>
Slurm 23.11 requires permissions to be properly set on the `StateSaveLocation`. The default is `/var/spool` which isn't set up the way Slurm wants. Instead provide `/var/lib/slurm/checkpoint` which is set up the way Slurm wants by the Debian package. Signed-off-by: Jason C. Nucciarone <[email protected]>
Environment variable getters can reuturn str or None, not just str. None is returned if the environment variable is not defined in the corresponding environment file. Signed-off-by: Jason C. Nucciarone <[email protected]>
Also bump dependency versions. `cryptography` is now pinned to the v44 API version rather than v43. Signed-off-by: Jason C. Nucciarone <[email protected]>
Require the `--conf-server` to be provided for start-up. Variable substitution cannot be used in environment files. Declared environment variables are treated as independent key-value pairs. Signed-off-by: Jason C. Nucciarone <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks great!
+1 🔥 |
Epic! One thing I'd like to do in the future is rework how our integration tests are run. While the unit tests help us cover a lot of ground, it would still be nice to ensure that everything is working as expected with the daemons. It'll take time, and probably a separate PR, but we could style the |
This PR introduces the
SackdManager
class which can be used in a downstream sackd-operator to manage thesackd
service on machines.BREAKING CHANGES: This PR drops the PPA
ubuntu-hpc/slurm-wlm-23.02
as it doesn't provide asackd
package, and the Slurm version that we can pull from Universe on Noble is newer than what is provided by the PPA. Theubuntu-hpc/slurm-wlm-23.02
PPA also doesn't provide any packages that work on Noble.Dropping this PPA means that you will now get an older version of Slurm on Jammy machines, but that isn't a huge issue as we're planning to push the Slurm charms to Noble anyway.
Misc.
Also this PR makes it such that
prometheus-slurm-exporter
is only installed onslurmctld
machines since the slurmctld operator is the only Slurm charm that currently integrates with COS via thecos-agent
relation. Helps minimize our install size just a little bit.