Skip to content

Ansible role for installing and managing the Slurm Workload Manager

Notifications You must be signed in to change notification settings

ahmam/ansible-slurm

This branch is 2 commits behind galaxyproject/ansible-slurm:main.

Folders and files

NameName
Last commit message
Last commit date
Nov 11, 2021
Jul 16, 2023
Jul 16, 2023
Nov 11, 2021
Jul 16, 2023
Aug 9, 2021
Jan 30, 2019
Aug 9, 2021

Repository files navigation

Slurm

Install and configure a Slurm cluster on RHEL/CentOS or Debian/Ubuntu servers

Role Variables

All variables are optional. If nothing is set, the role will install the Slurm client programs, munge, and create a slurm.conf with a single localhost node and debug partition. See the defaults and example playbooks for examples.

For the various roles a slurm node can play, you can either set group names, or add values to a list, slurm_roles.

  • group slurmservers or slurm_roles: ['controller']
  • group slurmexechosts or slurm_roles: ['exec']
  • group slurmdbdservers or slurm_roles: ['dbd']

General config options for slurm.conf go in slurm_config, a hash. Keys are Slurm config option names.

Partitions and nodes go in slurm_partitions and slurm_nodes, lists of hashes. The only required key in the hash is name, which becomes the PartitionName or NodeName for that line. All other keys/values are placed on to the line of that partition or node.

Options for the additional configuration files acct_gather.conf, cgroup.conf and gres.conf may be specified in the slurm_acct_gather_config, slurm_cgroup_config (both of them hashes) and slurm_gres_config (list of hashes) respectively.

Set slurm_upgrade to true to upgrade the installed Slurm packages.

You can use slurm_user (a hash) and slurm_create_user (a bool) to pre-create a Slurm user so that uids match.

Note that this role requires root access, so enable become either globally in your playbook / on the commandline or just for the role like shown below.

Dependencies

None.

Example Playbooks

Minimal setup, all services on one node:

- name: Slurm all in One
  hosts: all
  vars:
    slurm_roles: ['controller', 'exec', 'dbd']
  roles:
    - role: galaxyproject.slurm
      become: True

More extensive example:

- name: Slurm execution hosts
  hosts: all
  roles:
    - role: galaxyproject.slurm
      become: True
  vars:
    slurm_cgroup_config:
      CgroupMountpoint: "/sys/fs/cgroup"
      CgroupAutomount: yes
      ConstrainCores: yes
      TaskAffinity: no
      ConstrainRAMSpace: yes
      ConstrainSwapSpace: no
      ConstrainDevices: no
      AllowedRamSpace: 100
      AllowedSwapSpace: 0
      MaxRAMPercent: 100
      MaxSwapPercent: 100
      MinRAMSpace: 30
    slurm_config:
      AccountingStorageType: "accounting_storage/none"
      ClusterName: cluster
      GresTypes: gpu
      JobAcctGatherType: "jobacct_gather/none"
      MpiDefault: none
      ProctrackType: "proctrack/cgroup"
      ReturnToService: 1
      SchedulerType: "sched/backfill"
      SelectType: "select/cons_res"
      SelectTypeParameters: "CR_Core"
      SlurmctldHost: "slurmctl"
      SlurmctldLogFile: "/var/log/slurm/slurmctld.log"
      SlurmctldPidFile: "/var/run/slurmctld.pid"
      SlurmdLogFile: "/var/log/slurm/slurmd.log"
      SlurmdPidFile: "/var/run/slurmd.pid"
      SlurmdSpoolDir: "/var/spool/slurmd"
      StateSaveLocation: "/var/spool/slurmctld"
      SwitchType: "switch/none"
      TaskPlugin: "task/affinity,task/cgroup"
      TaskPluginParam: Sched
    slurm_create_user: yes
    slurm_gres_config:
      - File: /dev/nvidia[0-3]
        Name: gpu
        NodeName: gpu[01-10]
        Type: tesla
    slurm_munge_key: "../../../munge.key"
    slurm_nodes:
      - name: "gpu[01-10]"
        CoresPerSocket: 18
        Gres: "gpu:tesla:4"
        Sockets: 2
        ThreadsPerCore: 2
    slurm_partitions:
      - name: gpu
        Default: YES
        MaxTime: UNLIMITED
        Nodes: "gpu[01-10]"
    slurm_roles: ['exec']
    slurm_user:
      comment: "Slurm Workload Manager"
      gid: 888
      group: slurm
      home: "/var/lib/slurm"
      name: slurm
      shell: "/usr/sbin/nologin"
      uid: 888

License

MIT

Author Information

View contributors on GitHub

About

Ansible role for installing and managing the Slurm Workload Manager

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jinja 100.0%