milex-scheduler
is a package that simplifies the process of scheduling and
running jobs on a SLURM cluster. It provides an abstraction layer over the SLURM
shell scripts and provides the following features:
- Reproducibility of job configurations
- User-agnostic job scheduling
- Job configurations saved in a human-readable format (JSON)
- Automated Job scheduling and submission
- Bundling and submitting multiple jobs together with a single command
- Dependency between jobs managed using names instead of SLURM specific job IDs
- Submitting jobs remotely across SSH connections
You can gind the full documentation here.
git clone [email protected]:Ciela-Institute/milex_scheduler.git
cd milex_scheduler
pip install -e .
milex-configuration
This command will configure the user-specific details required to connect to remote SLURM machines and activate virtual environments. More details can be found in the Milex Configuration section.
[project.scripts]
my-script = "my_package.module:main"
my-script-cli = "my_package.module:cli"
More details can be found in the Register a script section.
milex-schedule my-script --name=my-script \
# Application args
--my_job_arg1=arg1 \
# SLURM args
--time=00-01:00 \
--cpus_per_task=1 \
--gres=gpu:1 \
--mem=16G
This command schedules my-script
to run for 1 hour, using 1 CPU, 1 GPU, and
16GB of memory. Note that both the application-specific arguments and SLURM
arguments are passed in the same command.
The --name
argument is optional and is used to specify the name of the job. If
not provided, the name of the script is used.
Once the job is scheduled, you can submit it at any time just by using the name of the application.
milex-submit my-script --machine=machine
This command submits my-script
on the machine
name specified in your
configuration (see Milex Configuration). You can also
schedule and submit a job at the same time to skip a step.
milex-schedule my-script --submit --machine=machine\
# Application and SLURM args
...
Use the --append
keyword to include additional jobs in a bundle. Use the
--name
keyword to specify the name of the bundle.
milex-schedule job1 --name=my-bundle
milex-schedule job2 --append --name=my-bundle
You can then submit the bundle using the milex-submit
command.
milex-submit my-bundle --machine=machine
In case --append
is not used, two bundles will instead be created.
Each job will have a unique timestamps.
$MILEX
└─ jobs
├─ my-bundle_210901120000
└─ my-bundle_210901120001
Furthermore, only the last bundle created will be submitted in the last example, instead of both. This is because the default behavior is to submit the last bundle created.
Dependencies can be set by specifying the name of the job
using the --dependencies
argument.
milex-schedule job2 --append --name=my-bundle --dependencies job1
In this example, job2
will only be submitted once job1
has completed.
Multiple dependencies can be added as follows
milex-schedule job3 --append --name=my-bundle --dependencies job1 job2
Notes:
- Any dependency loop will be detected and raise an error (e.g. if job1 depends on job2 and vice versa).
- Order in which jobs are appended is not important. Jobs are sorted in topological order before submission.