Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug/_AptManager]: StateSaveLocation file permissions are too restrictive #49

Open
NucciTheBoss opened this issue Nov 13, 2024 · 0 comments · May be fixed by #53
Open

[Bug/_AptManager]: StateSaveLocation file permissions are too restrictive #49

NucciTheBoss opened this issue Nov 13, 2024 · 0 comments · May be fixed by #53
Labels
bug Something isn't working

Comments

@NucciTheBoss
Copy link
Member

The slurmctld service gets all busted and disgusted when starting up after install because the set permissions on the StateSaveLocation are too restrictive. The current permissions are 0o600 which means that only the user slurm can read/write to the directory:

Path("/var/lib/slurm/slurm.state").mkdir(mode=0o600, exist_ok=True)

However, when we try to start the slurmctld service via systemctl, it fails with the error message below:

× slurmctld.service - Slurm controller daemon
     Loaded: loaded (/lib/systemd/system/slurmctld.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/slurmctld.service.d
             └─10-slurmctld-nofile.conf
     Active: failed (Result: exit-code) since Wed 2024-11-13 14:44:41 UTC; 2s ago
       Docs: man:slurmctld(8)
    Process: 266530 ExecStart=/usr/sbin/slurmctld -D -s $SLURMCTLD_OPTIONS (code=exited, status=1/FAILURE)
   Main PID: 266530 (code=exited, status=1/FAILURE)
        CPU: 6ms
Nov 13 14:44:41 juju-058453-2 systemd[1]: Started Slurm controller daemon.
Nov 13 14:44:41 juju-058453-2 slurmctld[266530]: slurmctld: fatal: Incorrect permissions on state save loc: /var/lib/slurm/checkp>
Nov 13 14:44:41 juju-058453-2 systemd[1]: slurmctld.service: Main process exited, code=exited, status=1/FAILURE
Nov 13 14:44:41 juju-058453-2 systemd[1]: slurmctld.service: Failed with result 'exit-code'.

It looks that by slightly "loosening" the permissions to be 0o700 where the slurm user can read, write, and execute from StateSaveLocation is the preferred file permission mode. Looking at the slurmd package, it creates a checkpoint directory with the permissions set to 0o755, so it looks like this is what the preferred file permission mode is for StateSaveLocation.

@NucciTheBoss NucciTheBoss added the bug Something isn't working label Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant