Skip to content
This repository has been archived by the owner on Aug 9, 2024. It is now read-only.

feat!: overhaul slurmd charm API #34

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 16 additions & 4 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
- name: woke
uses: get-woke/woke-action@v0
with:
Expand All @@ -35,18 +35,29 @@ jobs:
runs-on: ubuntu-22.04
steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
- name: Install dependencies
run: python3 -m pip install tox
- name: Run linters
run: tox -e lint

type:
name: Type check with pyright
runs-on: ubuntu-22.04
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Install dependencies
run: python3 -m pip install tox
- name: Run pyright
run: tox -e type

unit-test:
name: Unit tests
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
- name: Install dependencies
run: python3 -m pip install tox
- name: Run tests
Expand All @@ -63,10 +74,11 @@ jobs:
needs:
- inclusive-naming-check
- lint
- type
- unit-test
steps:
- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v4
- name: Setup operator environment
uses: charmed-kubernetes/actions-operator@main
with:
Expand Down
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,6 @@ __pycache__/
*.py[cod]
.idea
.vscode/
version

# Disable woke checking for nhc.conf.tmpl
src/templates/nhc.conf.tmpl
NucciTheBoss marked this conversation as resolved.
Show resolved Hide resolved
34 changes: 27 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,33 @@ This operator should be used with Juju 3.x or greater.
```shell
$ juju deploy slurmctld --channel edge
$ juju deploy slurmd --channel edge
$ juju deploy slurmdbd --channel edge
$ juju deploy mysql --channel 8.0/edge
$ juju deploy mysql-router slurmdbd-mysql-router --channel dpe/edge
$ juju integrate slurmctld:slurmd slurmd:slurmd
$ juju integrate slurmdbd-mysql-router:backend-database mysql:database
$ juju integrate slurmdbd:database slurmdbd-mysql-router:database
$ juju integrate slurmctld:slurmdbd slurmdbd:slurmdbd
$ juju integrate slurmctld:slurmd slurmd:slurmctld
```

### Operations
This charm hardens and simplifies operations by codifying common administration operations as charm actions.

#### Partition Configuration
Specify partition parameters using the charm configuration, `partition-config`.

##### Use the `partition-config` to set custom partition parameters.
```bash
$ juju config slurmd partition-config="State=INACTIVE"
```

#### Node Configuration Parameters
You can get and set the node configuration using the `node-config` action.

##### Use the `node-config` action to get the node configuration for the unit.
```bash
$ juju run --quiet slurmd/0 node-config --format json | jq ".[].results.node.config"
"NodeName=juju-462521-4 NodeAddr=10.240.222.28 State=UNKNOWN RealMemory=64012 CPUs=12 ThreadsPerCore=2 CoresPerSocket=6 SocketsPerBoard=1"
```

##### Use the `node-config` action to set a custom weight value for the node.
```bash
$ juju run --quiet slurmd/0 node-config parameters="Weight=5000" --format json | jq ".[].results.node.config"
"NodeName=juju-462521-4 NodeAddr=10.240.222.28 State=UNKNOWN RealMemory=64012 CPUs=12 ThreadsPerCore=2 CoresPerSocket=6 SocketsPerBoard=1 Weight=5000"
```

## Project & Community
Expand Down
15 changes: 0 additions & 15 deletions actions.yaml

This file was deleted.

108 changes: 88 additions & 20 deletions charmcraft.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,29 @@
# Copyright 2020 Omnivector, LLC
# See LICENSE file for licensing details.

name: slurmd
type: charm

summary: |
Slurmd, the compute node daemon of Slurm.

description: |
This charm provides slurmd, munged, and the bindings to other utilities
that make lifecycle operations a breeze.

slurmd is the compute node daemon of SLURM. It monitors all tasks running
on the compute node, accepts work (tasks), launches tasks, and kills
running tasks upon request.

links:
contact: https://matrix.to/#/#hpc:ubuntu.com

issues:
- https://github.com/charmed-hpc/slurmd-operator/issues

source:
- https://github.com/charmed-hpc/slurmd-operator

assumes:
- juju

bases:
- build-on:
- name: ubuntu
Expand All @@ -10,25 +32,71 @@ bases:
- name: ubuntu
channel: "22.04"
architectures: [amd64]
- name: centos
channel: "7"
architectures: [amd64]

parts:
charm:
build-packages: [git]
charm-python-packages: [setuptools]

# Create a version file and pack it into the charm. This is dynamically generated
# as part of the build process for a charm to ensure that the git revision of the
# charm is always recorded in this version file.
version-file:
plugin: nil
build-packages:
- git
- wget
override-build: |
VERSION=$(git -C $CRAFT_PART_SRC/../../charm/src describe --dirty --always)
echo "Setting version to $VERSION"
echo $VERSION > $CRAFT_PART_INSTALL/version
stage:
- version
wget https://github.com/mej/nhc/releases/download/1.4.3/lbnl-nhc-1.4.3.tar.gz
craftctl default

provides:
slurmctld:
interface: slurmd
limit: 1

config:
options:
partition-config:
type: string
default: ""
description: >
Additional partition configuration parameters, specified as a space separated `key=value`
in a single line. Find a list of all possible partition configuration parameters
[here](https://slurm.schedmd.com/slurm.conf.html#SECTION_PARTITION-CONFIGURATION).


Example usage:
```bash
$ juju config slurmd partition-config="DefaultTime=45:00 MaxTime=1:00:00"
```

NucciTheBoss marked this conversation as resolved.
Show resolved Hide resolved
nhc-conf:
default: ""
type: string
description: >
Multiline string.
These lines are appended to the `nhc.conf` maintained by the charm.

Example usage:
```bash
$ juju config slurmd nhc-conf="$(cat extra-nhc.conf)"
```

actions:
node-configured:
description: Remove a node from DownNodes when the reason is `New node`.
NucciTheBoss marked this conversation as resolved.
Show resolved Hide resolved

node-config:
description: >
Set or return node configuration parameters.

To get the current node configuration for this unit:
``bash
$ juju run slurmd/0 node-parameters
```

To set node level configuration parameters for the unit `slurmd/0`:
``bash
$ juju run slurmd/0 node-config parameters="Weight=200 Gres=gpu:tesla:1,gpu:kepler:1,bandwidth:lustre:no_consume:4G"
```

params:
parameters:
type: string
NucciTheBoss marked this conversation as resolved.
Show resolved Hide resolved
description: >
Node configuration parameter as defined [here](https://slurm.schedmd.com/slurm.conf.html#SECTION_NODE-CONFIGURATION).

show-nhc-config:
description: Display `nhc.conf`.
40 changes: 0 additions & 40 deletions config.yaml

This file was deleted.

41 changes: 4 additions & 37 deletions dispatch
Original file line number Diff line number Diff line change
@@ -1,44 +1,11 @@
#!/bin/bash
# This hook installs the dependencies needed to run the charm,
# creates the dispatch executable, regenerates the symlinks for start and
# upgrade-charm, and kicks off the operator framework.

set -e

# Source the os-release information into the env
. /etc/os-release

if ! [[ -f '.installed' ]]
then
if [[ $ID == 'centos' ]]
then
# Install dependencies and build custom python
yum -y install epel-release
yum -y install wget gcc make tar bzip2-devel zlib-devel xz-devel openssl-devel libffi-devel sqlite-devel ncurses-devel

export PYTHON_VERSION=3.8.16
wget https://www.python.org/ftp/python/${PYTHON_VERSION}/Python-${PYTHON_VERSION}.tar.xz -P /tmp
tar xvf /tmp/Python-${PYTHON_VERSION}.tar.xz -C /tmp
cd /tmp/Python-${PYTHON_VERSION}
./configure --enable-optimizations
make -C /tmp/Python-${PYTHON_VERSION} -j $(nproc) altinstall
cd $OLDPWD
rm -rf /tmp/Python*

elif [[ $ID == 'ubuntu' ]]
then
# Necessary to compile and install NHC
apt-get install --assume-yes make
fi
touch .installed
fi

# set the correct python bin path
if [[ $ID == "centos" ]]
then
PYTHON_BIN="/usr/bin/env python3.8"
else
PYTHON_BIN="/usr/bin/env python3"
# Necessary to compile and install NHC
apt-get install --assume-yes make
touch .installed
NucciTheBoss marked this conversation as resolved.
Show resolved Hide resolved
fi

JUJU_DISPATCH_PATH="${JUJU_DISPATCH_PATH:-$0}" PYTHONPATH=lib:venv $PYTHON_BIN ./src/charm.py
JUJU_DISPATCH_PATH="${JUJU_DISPATCH_PATH:-$0}" PYTHONPATH=lib:venv /usr/bin/env python3 ./src/charm.py
Loading
Loading