Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(slurm_ops)!: implement SackdManager for sackd service #55

Merged

Conversation

NucciTheBoss
Copy link
Member

@NucciTheBoss NucciTheBoss commented Nov 20, 2024

This PR introduces the SackdManager class which can be used in a downstream sackd-operator to manage the sackd service on machines.

BREAKING CHANGES: This PR drops the PPA ubuntu-hpc/slurm-wlm-23.02 as it doesn't provide a sackd package, and the Slurm version that we can pull from Universe on Noble is newer than what is provided by the PPA. The ubuntu-hpc/slurm-wlm-23.02 PPA also doesn't provide any packages that work on Noble.

Dropping this PPA means that you will now get an older version of Slurm on Jammy machines, but that isn't a huge issue as we're planning to push the Slurm charms to Noble anyway.

Misc.

Also this PR makes it such that prometheus-slurm-exporter is only installed on slurmctld machines since the slurmctld operator is the only Slurm charm that currently integrates with COS via the cos-agent relation. Helps minimize our install size just a little bit.

…ines

Currently the Slurm prometheus exporter is only activated on slurmctld machines,
so it doesn't make sense to install the exporter on all nodes. If other nodes need the
Slurm exporter, then we can add directly to the package list for that specific service type.

Signed-off-by: Jason C. Nucciarone <[email protected]>
BREAKING CHANGES: No longer uses the PPA `ubuntu-hpc/slurm-wlm-23.02` as it
 only supports Jammy, and it does not provide a `sackd` package. Instead,
 `_AptManager` now pulls the relevant Slurm packages from either
 `ubuntu-hpc/experimental` or `universe`.
 .
 Eventually we'll need to identify how we want to enable the latest and
 greatest Slurm version in deb format, but we'll boil that pot later.

Signed-off-by: Jason C. Nucciarone <[email protected]>
@NucciTheBoss NucciTheBoss added the enhancement New feature or request label Nov 20, 2024
NucciTheBoss and others added 6 commits December 6, 2024 09:37
New version contains fixes for adding third-party repositories
on Noble Numbat.

Signed-off-by: Jason C. Nucciarone <[email protected]>
Variable substitution in the environment file prevents us from
needing to set the service type to `Type=forking` in systemd.
`sackd` is started the way it should be with the `--systemd` rather
than run as a child process in the shell.

`slurmd` needs some further work around `juju-systemd-notices` before it
can use this approach as well.

Other changes:

* Clean up docstrings to be accurate and reflect the current implementation of the library.

Signed-off-by: Jason C. Nucciarone <[email protected]>
Slurm 23.11 requires permissions to be properly set on the `StateSaveLocation`.
The default is `/var/spool` which isn't set up the way Slurm wants. Instead
provide `/var/lib/slurm/checkpoint` which is set up the way Slurm wants
by the Debian package.

Signed-off-by: Jason C. Nucciarone <[email protected]>
Environment variable getters can reuturn str or None, not just str.
None is returned if the environment variable is not defined in
the corresponding environment file.

Signed-off-by: Jason C. Nucciarone <[email protected]>
Also bump dependency versions. `cryptography` is now pinned to the v44
API version rather than v43.

Signed-off-by: Jason C. Nucciarone <[email protected]>
@NucciTheBoss NucciTheBoss marked this pull request as ready for review December 6, 2024 17:29
@NucciTheBoss NucciTheBoss requested a review from dsloanm December 6, 2024 17:33
Require the `--conf-server` to be provided for start-up. Variable
substitution cannot be used in environment files. Declared environment
variables are treated as independent key-value pairs.

Signed-off-by: Jason C. Nucciarone <[email protected]>
Copy link
Contributor

@jedel1043 jedel1043 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks great!

@jamesbeedy
Copy link

+1 🔥

@NucciTheBoss
Copy link
Member Author

Epic! :shipit:

One thing I'd like to do in the future is rework how our integration tests are run. While the unit tests help us cover a lot of ground, it would still be nice to ensure that everything is working as expected with the daemons. It'll take time, and probably a separate PR, but we could style the slurm_ops tests similar to the Slurm snap gambol tests, but use slurm_ops to drive Slurm's operations.

@NucciTheBoss NucciTheBoss merged commit 18db3f6 into charmed-hpc:main Dec 6, 2024
4 checks passed
@NucciTheBoss NucciTheBoss deleted the nuccitheboss/impl-sackd-manager branch December 11, 2024 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants