diff --git a/.gitignore b/.gitignore index bd87454..56b89bb 100644 --- a/.gitignore +++ b/.gitignore @@ -49,3 +49,4 @@ docs/ # translation temp files po/*~ +*~ diff --git a/episodes/01-introduction.md b/episodes/01-introduction.md index bfefa6a..2b7eb47 100644 --- a/episodes/01-introduction.md +++ b/episodes/01-introduction.md @@ -5,44 +5,50 @@ exercises: 30 --- ::: questions + - "How do I run a simple command with Maestro?" + ::: :::objectives + - "Create a Maestro YAML file" -::: +::: ## What is the workflow I'm interested in? -In this lesson we will make an experiment that takes an application which runs -in parallel and investigate it's scalability. To do that we will need to gather -data, in this case that means running the application multiple times with -different numbers of CPU cores and recording the execution time. Once we've -done that we need to create a visualisation of the data to see how it compares -against the ideal case. +In this lesson we will make an experiment that takes an application +which runs in parallel and investigate it's scalability. To do that we +will need to gather data, in this case that means running the +application multiple times with different numbers of CPU cores and +recording the execution time. Once we've done that we need to create a +visualization of the data to see how it compares against the ideal +case. -From the visualisation we can then decide at what scale it -makes most sense to run the application at in production to maximise the use of +From the visualization we can then decide at what scale it makes most +sense to run the application at in production to maximize the use of our CPU allocation on the system. -We could do all of this manually, but there are useful tools to help us manage -data analysis pipelines like we have in our experiment. Today we'll learn about -one of those: Maestro. +We could do all of this manually, but there are useful tools to help +us manage data analysis pipelines like we have in our +experiment. Today we'll learn about one of those: Maestro. -In order to get started with Maestro, let's begin by taking a simple command -and see how we can run that via Maestro. Let's choose the command `hostname` -which prints out the name of the host where the command is executed: +In order to get started with Maestro, let's begin by taking a simple +command and see how we can run that via Maestro. Let's choose the +command `hostname` which prints out the name of the host where the +command is executed: ```bash -janeh@pascal83:~$ hostname +hostname ``` + ```output pascal83 ``` -That prints out the result but Maestro relies on files to know the status of -your workflow, so let's redirect the output to a file: +That prints out the result but Maestro relies on files to know the +status of your workflow, so let's redirect the output to a file: ```bash janeh@pascal83:~$ hostname > hostname_login.txt @@ -50,95 +56,110 @@ janeh@pascal83:~$ hostname > hostname_login.txt ## Writing a Maestro YAML -Edit a new text file named `hostname.yaml`. +Edit a new text file named `hostname.yaml`. The file extension is a +recursive initialism for ["YAML Ain't Markup Language"][yaml-lang], a +popular format for configuration files and key-value data +serialization. For more, see the Wikipedia page, esp. [YAML +Syntax](https://en.wikipedia.org/wiki/YAML#Syntax). -Contents of `hostname.yaml`: +[yaml-lang]: https://yaml.org + +Contents of `hostname.yaml` (spaces matter!): ```yml description: - name: Hostnames - description: Report a node's hostname. + name: Hostnames + description: Report a node's hostname. study: - - name: hostname-login - description: Write the login node's hostname to a file - run: - cmd: | - hostname > hostname_login.txt + - name: hostname-login + description: Write the login node's hostname to a file. + run: + cmd: | + hostname > hostname_login.txt ``` ::: callout ## Key points about this file -1. The name of `hostname.yaml` is not very important; it gives us information - about file contents and type, but maestro will behave the same if you rename - it to `hostname` or `foo.txt`. -1. The file specifies fields in a hierarchy. For example, `name`, `description`, - and `run` are all passed to `study` and are at the same level in the hierarchy. - `description` and `study` are both at the top level in the hierarchy. -1. Indentation indicates the hierarchy and should be consistent. For example, all - the fields passed directly to `study` are indented relative to `study` and - their indentation is all the same. -1. The commands executed during the study are given under `cmd`. Starting this - entry with `|` and a newline character allows us to specify multiple commands. -1. The example YAML file above is pretty minimal; all fields shown are required. -1. The names given to `study` can include letters, numbers, and special characters. - +1. The name of `hostname.yaml` is not very important; it gives us + information about file contents and type, but maestro will behave + the same if you rename it to `hostname` or `foo.txt`. +2. The file specifies fields in a hierarchy. For example, `name`, + `description`, and `run` are all passed to `study` and are at the + same level in the hierarchy. `description` and `study` are both at + the top level in the hierarchy. +3. Indentation indicates the hierarchy and should be consistent. For + example, all the fields passed directly to `study` are indented + relative to `study` and their indentation is all the same. +4. The commands executed during the study are given under + `cmd`. Starting this entry with `|` and a newline character allows + us to specify multiple commands. +5. The example YAML file above is pretty minimal; all fields shown are + required. +6. The names given to `study` can include letters, numbers, and + special characters. ::: -Back in the shell we'll run our new rule. At this point, we may see an error if -a required field is missing or if our indentation is inconsistent. +Back in the shell we'll run our new rule. At this point, we may see an +error if a required field is missing or if our indentation is +inconsistent. ```bash -$ maestro run hostname.yaml +janeh@pascal83:~$ maestro run hostname.yaml ``` ::: callout ## `bash: maestro: command not found...` -If your shell tells you that it cannot find the command `maestro` then we need -to make the software available somehow. In our case, this means activating the -python virtual environment where maestro is installed. +If your shell tells you that it cannot find the command `maestro` then +we need to make the software available somehow. In our case, this +means activating the python virtual environment where maestro is +installed. + ```bash source /usr/global/docs/training/janeh/maestro_venv/bin/activate ``` -You can tell this command has already been run when `(maestro_venv)` appears -before your command prompt: - +You can tell this command has already been run when `(maestro_venv)` +appears before your command prompt: ```bash janeh@pascal83:~$ source /usr/global/docs/training/janeh/maestro_venv/bin/activate (maestro_venv) janeh@pascal83:~$ ``` -Now that the `maestro_venv` virtual environment has been activated, the `maestro` -command should be available, but let's double check +Now that the `maestro_venv` virtual environment has been activated, +the `maestro` command should be available, but let's double check ```bash (maestro_venv) janeh@pascal83:~$ which maestro ``` + ```output /usr/global/docs/training/janeh/maestro_venv/bin/maestro ``` -::: +::: ## Running maestro -Once you have `maestro` available to you, run `maestro run hostname.yaml` -and enter `y` when prompted +Once you have `maestro` available to you, +run `maestro run hostname.yaml` and enter `y` when prompted: ```bash (maestro_venv) janeh@pascal83:~$ maestro run hostname.yaml +``` + +```output [2024-03-20 15:39:34: INFO] INFO Logging Level -- Enabled [2024-03-20 15:39:34: WARNING] WARNING Logging Level -- Enabled [2024-03-20 15:39:34: CRITICAL] CRITICAL Logging Level -- Enabled [2024-03-20 15:39:34: INFO] Loading specification -- path = hostname.yaml -[2024-03-20 15:39:34: INFO] Directory does not exist. Creating directories to /g/g0/janeh/Hostnames_20240320-153934/logs +[2024-03-20 15:39:34: INFO] Directory does not exist. Creating directories to ~/Hostnames_20240320-153934/logs [2024-03-20 15:39:34: INFO] Adding step 'hostname-login' to study 'Hostnames'... [2024-03-20 15:39:34: INFO] ------------------------------------------ @@ -148,32 +169,36 @@ Submission throttle limit = 0 Use temporary directory = False Hash workspaces = False Dry run enabled = False -Output path = /g/g0/janeh/Hostnames_20240320-153934 +Output path = ~/Hostnames_20240320-153934 ------------------------------------------ Would you like to launch the study? [yn] y Study launched successfully. ``` -and look at the outputs. You should have a new directory whose name includes a -date and timestamp and that starts with the `name` given under `description` -at the top of `hostname.yaml`. +and look at the outputs. You should have a new directory whose name +includes a date and timestamp and that starts with the `name` given +under `description` at the top of `hostname.yaml`. In that directory will be a subdirectory for every `study` run from -`hostname.yaml`. The subdirectories for each study include all output files -for that study +`hostname.yaml`. The subdirectories for each study include all output +files for that study. ```bash (maestro_venv) janeh@pascal83:~$ cd Hostnames_20240320-153934/ (maestro_venv) janeh@pascal83:~/Hostnames_20240320-153934$ ls ``` + ```output batch.info Hostnames.pkl Hostnames.txt logs status.csv hostname-login Hostnames.study.pkl hostname.yaml meta ``` + ```bash (maestro_venv) janeh@pascal83:~/Hostnames_20240320-153934$ cd hostname-login/ (maestro_venv) janeh@pascal83:~/Hostnames_20240320-153934/hostname-login$ ls -```output +``` + +``` output hostname-login.2284862.err hostname-login.2284862.out hostname-login.sh hostname_login.txt ``` @@ -181,10 +206,10 @@ hostname-login.2284862.err hostname-login.2284862.out hostname-login.sh hostn To which file will the login node's hostname, `pascal83`, be written? -1. hostname-login.2284862.err -2. hostname-login.2284862.out -3. hostname-login.sh -4. hostname_login.txt +1. `hostname-login.2284862.err` +2. `hostname-login.2284862.out` +3. `hostname-login.sh` +4. `hostname_login.txt` :::::: solution (4) hostname_login.txt @@ -198,44 +223,47 @@ we'll see that output, if the run worked! ::: challenge This one is tricky! In the example above, `pascal83` was written to -`.../Hostnames_{date}_{time}/hostname-login/hostname_login.txt`. +`~/Hostnames_{date}_{time}/hostname-login/hostname_login.txt`. Where would `Hello` be written for the following YAML? ```yml description: - name: MyHello - description: Report a node's hostname. + name: MyHello + description: Report a node's hostname. study: - - name: give-salutation - description: Write the login node's hostname to a file - run: - cmd: | - echo "hello" > greeting.txt + - name: give-salutation + description: Write the login node's hostname to a file + run: + cmd: | + echo "hello" > greeting.txt ``` - -1. `.../give-salutation_{date}_{time}/greeting/greeting.txt` -2. `.../greeting_{date}_{time}/give_salutation/greeting.txt` -3. `.../MyHello_{date}_{time}/give-salutation/greeting.txt` -4. `.../MyHello_{date}_{time}/greeting/greeting.txt` +1. `~/give-salutation_{date}_{time}/greeting/greeting.txt` +2. `~/greeting_{date}_{time}/give_salutation/greeting.txt` +3. `~/MyHello_{date}_{time}/give-salutation/greeting.txt` +4. `~/MyHello_{date}_{time}/greeting/greeting.txt` :::::: solution (3) `.../MyHello_{date}_{time}/give-salutation/greeting.txt` -The toplevel folder created starts with the `name` field under `description`; here, that's `MyHello`. -Its subdirectory is named after the `study`; here, that's `give-salutation`. -The file created is `greeting.txt` and this stores the output of `echo "hello"`. +The top-level folder created starts with the `name` field under +`description`; here, that's `MyHello`. Its subdirectory is named after +the `study`; here, that's `give-salutation`. The file created is +`greeting.txt` and this stores the output of `echo "hello"`. :::::: ::: ::: keypoints -- "You execute `maestro run` with a YAML file including information about your run." -- "Your run includes a description and at least one study (a step in your run)." -- "Your maestro run creates a directory with subdirectories and outputs for each study." +- You execute `maestro run` with a YAML file including information + about your run. +- Your run includes a description and at least one study (a step in + your run). +- Your maestro run creates a directory with subdirectories and + outputs for each study. ::: diff --git a/episodes/02-maestro_on_the_cluster.md b/episodes/02-maestro_on_the_cluster.md index 19bdfa2..d006ed4 100644 --- a/episodes/02-maestro_on_the_cluster.md +++ b/episodes/02-maestro_on_the_cluster.md @@ -10,77 +10,97 @@ exercises: 20 ::: -# How do I run Maestro on the cluster? +:::::::::::: callout -What happens when we want to run on the cluster ("to run a batch job")rather -than the login node? The cluster we are using uses Slurm, and Maestro has -built in support for Slurm. We just need to tell Maestro which resources we -need Slurm to grab for our run. +If you've opened a new terminal, make sure Maestro is available. -First, we need to add a `batch` block to our YAML file, where we'll provide the -names of the machine, bank, and queue in which your jobs should run. +```bash +source /usr/global/docs/training/janeh/maestro_venv/bin/activate +``` + +::::::::::::::::: + +## How do I run Maestro on the cluster? + +What happens when we want to run on the cluster ("to run a batch job") +rather than the login node? The cluster we are using uses Slurm, and +Maestro has built in support for Slurm. We just need to tell Maestro +which resources we need Slurm to grab for our run. + +First, we need to add a `batch` block to our YAML file, where we'll +provide the names of the machine, bank, and queue in which your jobs +should run. ```yml batch: - type: slurm - host: quartz # enter the machine you'll run on - bank: guests # enter the bank to charge - queue: pdebug # enter the partition in which your job should run + type: slurm + host: quartz # enter the machine you'll run on + bank: guests # enter the bank to charge + queue: pdebug # enter the partition in which your job should run ``` -Second, we need to specify the number of nodes, number of processes, and walltime -for *each step* in our YAML file. This information goes under each -step's `run` field. Here we specify 1 node, 1 process, and a time limit of 30 seconds: +Second, we need to specify the number of nodes, number of processes, +and walltime for _each step_ in our YAML file. This information goes +under each step's `run` field. Here we specify 1 node, 1 process, and +a time limit of 30 seconds: ```yml (...) - run: - cmd: | - hostname >> hostname.txt - nodes: 1 - procs: 1 - walltime: "00:00:30" + run: + cmd: | + hostname >> hostname.txt + nodes: 1 + procs: 1 + walltime: "00:00:30" ``` -Whereas `run` previously held only info about the command we wanted to execute, steps run on the cluster include a specification of the resources needed in order to execute. **Note** that the format of the walltime includes quotation marks -- "{Hours}:{Minutes}:{Seconds}". +Whereas `run` previously held only info about the command we wanted to +execute, steps run on the cluster include a specification of the +resources needed in order to execute. __Note__ that the format of the +walltime includes quotation marks -- `"{Hours}:{Minutes}:{Seconds}"`. With these changes, our updated YAML file might look like ```yml description: - name: Hostnames - description: Report a node's hostname. + name: Hostnames + description: Report a node's hostname. batch: - type: slurm - host: quartz # machine to run on - bank: guests # bank - queue: pdebug # partition + type: slurm + host: quartz # machine to run on + bank: guests # bank + queue: pdebug # partition study: - - name: hostname-login - description: Write the login node's hostname to a file - run: - cmd: | - hostname > hostname_login.txt - - name: hostname_batch - description: Write the node's hostname to a file - run: - cmd: | - hostname >> hostname.txt - nodes: 1 - procs: 1 - walltime: "00:00:30" + - name: hostname-login + description: Write the login node's hostname to a file + run: + cmd: | + hostname > hostname_login.txt + - name: hostname_batch + description: Write the node's hostname to a file + run: + cmd: | + hostname >> hostname.txt + nodes: 1 + procs: 1 + walltime: "00:00:30" ``` -Note that we left the rule `hostname-login` as is. Because we do not specify any info for slurm under this original step's `run` field -- like nodes, processes, or walltime -- this step will continue running on the login node and only `hostname_batch` will be handed off to slurm. +Note that we left the rule `hostname-login` as is. Because we do not +specify any info for slurm under this original step's `run` field -- +like nodes, processes, or walltime -- this step will continue running +on the login node and only `hostname_batch` will be handed off to +slurm. ::: challenge + ## Running on the cluster -Modify your YAML file, `hostname.yaml` to execute `hostname` on the _cluster_. -Run with 1 node and 1 process using the bank `guest` on the partition -`psummer` on `quartz`. +Modify your YAML file, `hostname.yaml` to execute `hostname` on the +_cluster_. Run with 1 node and 1 process using the bank `guest` on +the partition `psummer` on `quartz`. If you run this multiple times, do you always run on the same node? (Is the hostname printed always the same?) @@ -117,7 +137,20 @@ study: ``` -When you run this job, a directory called `Hostname_...` will be created. If you look in the subdirectory `hostname_batch`, you'll find a file called `hostname.txt` with info about the compute node where the `hostname` command ran. If you run the job multiple times, you will probably land on different nodes; this means you'll see different node numbers in different hostname.txt files. If you see the same number more than once, don't worry! If you get an answer other than `pascal83`, you're doing it correctly. :) +Go ahead and run the job: + +```bash +maestro run batch-hostname.yaml +``` + +A directory called `Hostname_...` will be created. If you look in the +subdirectory `hostname_batch`, you'll find a file called +`hostname.txt` with info about the compute node where the `hostname` +command ran. If you run the job multiple times, you will probably land +on different nodes; this means you'll see different node numbers in +different `hostname.txt` files. If you see the same number more than +once, don't worry! If you get any answer other than `pascal83`, you're +doing it correctly. :) :::::: @@ -125,16 +158,26 @@ When you run this job, a directory called `Hostname_...` will be created. If you ## Outputs from a batch job -When running in batch, `maestro run...` will create a new directory with the -same naming scheme as seen in episode 1, and that directory will contain -subdirectories for all studies. The `hostname_batch` subdirectory has four -output files, but this time the file ending with extension `.sh` is a slurm -submission script +When running in batch, `maestro run ...` will create a new directory +with the same naming scheme as seen in episode 1, and that directory +will contain subdirectories for all studies. The `hostname_batch` +subdirectory has four output files, but this time the file ending with +extension `.sh` is a slurm submission script ```bash -(maestro_venv) janeh@pascal83:~/Hostnames_20240320-170150/hostname_batch$ ls +cd Hostnames_20240320-170150/hostname_batch +ls +``` + +```output hostname.err hostname.out hostname.slurm.sh hostname.txt -(maestro_venv) janeh@pascal83:~/Hostnames_20240320-170150/hostname_batch$ cat hostname.slurm.sh +``` + +```bash +cat hostname.slurm.sh +``` + +```bash #!/bin/bash #SBATCH --nodes=1 #SBATCH --partition=pvis @@ -148,19 +191,23 @@ hostname.err hostname.out hostname.slurm.sh hostname.txt hostname > hostname.txt ``` -Maestro uses the info from your YAML file to write this script and then -submits it to the scheduler for you. Soon after you run on the cluster via -`maestro run hostname.yaml`, you should be able to see the job -running or finishing up in the queue with the command `squeue -u > amdahl.out ``` + +and in `amdahl.out`, you probably see something like + +```output Doing 30.000000 seconds of 'work' on 1 processor, - which should take 30.000000 seconds with 0.800000 parallel proportion of the workload. +which should take 30.000000 seconds with 0.800000 +parallel proportion of the workload. - Hello, World! I am process 0 of 1 on pascal17. I will do all the serial 'work' for 5.324555 seconds. - Hello, World! I am process 0 of 1 on pascal17. I will do parallel 'work' for 22.349517 seconds. + Hello, World! I am process 0 of 1 on pascal17. + I will do all the serial 'work' for 5.324555 seconds. + + Hello, World! I am process 0 of 1 on pascal17. + I will do parallel 'work' for 22.349517 seconds. Total execution time (according to rank 0): 27.755552 seconds ``` Notice that this output refers to only "1 processor" and mentions only one process. We requested two processes, but only a single one -reports back! Additionally, we requested two *nodes*, but only one +reports back! Additionally, we requested two _nodes_, but only one is mentioned in the above output (`pascal17`). -So what's going on? If your job were really *using* both tasks -and nodes that were assigned to it, then both would have written -to `amdahl.out`. +So what's going on? + +If your job were really _using_ both tasks and nodes that were +assigned to it, then both would have written to `amdahl.out`. The `amdahl` binary is enabled to run in parallel but it's also able to run in serial. If we want it to run in parallel, we'll have to tell @@ -206,43 +248,44 @@ it so more directly. ::: Here's the takeaway from the challenges above: It's not enough to have -both parallel resources and a binary/executable/program that is enabled to -run in parallel. We actually need to invoke MPI in order to force -our parallel program to use parallel resources. +both parallel resources and a binary/executable/program that is +enabled to run in parallel. We actually need to invoke MPI in order to +force our parallel program to use parallel resources. ## Maestro and MPI -We didn't really run an MPI application in the last section as we only ran on -one processor. How do we request to run using multiple processes for a single -step? +We didn't really run an MPI application in the last section as we only +ran on one processor. How do we request to run using multiple +processes for a single step? -The answer is that we have to tell Slurm that we want to use MPI. In the Intro -to HPC lesson, the episodes introducing Slurm and running parallel jobs showed -that commands to run in parallel need to use `srun`. `srun` talks to MPI and -allows multiple processors to coordinate work. A call to `srun` might look -something like +The answer is that we have to tell Slurm that we want to use MPI. In +the Intro to HPC lesson, the episodes introducing Slurm and running +parallel jobs showed that commands to run in parallel need to use +`srun`. `srun` talks to MPI and allows multiple processors to +coordinate work. A call to `srun` might look something like ```bash srun -N {# of nodes} -n {number of processes} amdahl >> amdahl.out ``` -To make this easier, Maestro offers the shorthand `$(LAUNCHER)`. Maestro -will replace instances of `$(LAUNCHER)` with a call to `srun`, specifying -as many nodes and processes we've already told Slurm we want to use. +To make this easier, Maestro offers the shorthand +`$(LAUNCHER)`. Maestro will replace instances of `$(LAUNCHER)` with a +call to `srun`, specifying as many nodes and processes we've already +told Slurm we want to use. ::: challenge Update `amdahl.yaml` to include `$(LAUNCHER)` in the call to `amdahl` so that your study's `cmd` field includes -``` +```bash $(LAUNCHER) amdahl >> amdahl.out ``` -Run maestro with the updated YAML and explore the outputs. How many tasks -are mentioned in `amdahl.out`? In the Slurm submission script created by -Maestro (included in the same subdirectory as `amdahl.out`), what text -was used to replace `$(LAUNCHER)`? +Run maestro with the updated YAML and explore the outputs. How many +tasks are mentioned in `amdahl.out`? In the Slurm submission script +created by Maestro (included in the same subdirectory as +`amdahl.out`), what text was used to replace `$(LAUNCHER)`? :::::: solution @@ -271,36 +314,38 @@ study: walltime: "00:00:30" ``` -Your output file `Amdahl_.../amdahl/amdahl.out` should include -"Doing 30.000000 seconds of 'work' on 2 processors" and the submission -script `Amdahl_.../amdahl/amdahl.slurm.sh` should include the line -"srun -n 2 -N 2 amdahl >> amdahl.out". Maestro substituted -`srun -n 2 -N 2` for `$(LAUNCHER)`! +Your output file `Amdahl_.../amdahl/amdahl.out` should include "Doing +30.000000 seconds of 'work' on 2 processors" and the submission script +`Amdahl_.../amdahl/amdahl.slurm.sh` should include the line +`srun -n 2 -N 2 amdahl >> amdahl.out`. +Maestro substituted `srun -n 2 -N 2` for `$(LAUNCHER)`! :::::: ::: ::: callout + ## Commenting Maestro YAML files -In the solution from the last challenge, the line beginning `#` is a comment -line. Hopefully you are already in the habit of adding comments to your own -scripts. Good comments make any script more readable, and this is just as -true with our YAML files. +In the solution from the last challenge, the line beginning `#` is a +comment line. Hopefully you are already in the habit of adding +comments to your own scripts. Good comments make any script more +readable, and this is just as true with our YAML files. ::: - ## Customizing amdahl output -Another thing about our application `amdahl` is that we ultimately want to -process the output to generate our scaling plot. The output right now is useful -for reading but makes processing harder. `amdahl` has an option that actually -makes this easier for us. To see the `amdahl` options we can use +Another thing about our application `amdahl` is that we ultimately +want to process the output to generate our scaling plot. The output +right now is useful for reading but makes processing harder. `amdahl` +has an option that actually makes this easier for us. To see the +`amdahl` options we can use ```bash -(maestro_venv) janeh@pascal83:~$ amdahl --help +amdahl --help ``` + ```output usage: amdahl [-h] [-p [PARALLEL_PROPORTION]] [-w [WORK_SECONDS]] [-t] [-e] @@ -309,15 +354,16 @@ options: -p [PARALLEL_PROPORTION], --parallel-proportion [PARALLEL_PROPORTION] Parallel proportion should be a float between 0 and 1 -w [WORK_SECONDS], --work-seconds [WORK_SECONDS] - Total seconds of workload, should be an integer greater than 0 + Total seconds of workload, should be an integer > 0 -t, --terse Enable terse output -e, --exact Disable random jitter ``` -The option we are looking for is `--terse`, and that will make `amdahl` print -output in a format that is much easier to process, JSON. JSON format in a file -typically uses the file extension `.json` so let's add that option to our -`shell` command _and_ change the file format of the `output` to match our new -command: + +The option we are looking for is `--terse`, and that will make +`amdahl` print output in a format that is much easier to process, +JSON. JSON format in a file typically uses the file extension `.json` +so let's add that option to our `shell` command _and_ change the file +format of the `output` to match our new command: ```yml description: @@ -326,8 +372,8 @@ description: batch: type: slurm - host: quartz # machine to run on - bank: guests # bank + host: quartz # machine to run on + bank: guests # bank queue: pdebug # partition study: @@ -344,7 +390,7 @@ study: There was another parameter for `amdahl` that caught my eye. `amdahl` has an option `--parallel-proportion` (or `-p`) which we might be interested in -changing as it changes the behaviour of the code, and therefore has an impact on +changing as it changes the behavior of the code, and therefore has an impact on the values we get in our results. Let's try specifying a parallel proportion of 90%: @@ -355,8 +401,8 @@ description: batch: type: slurm - host: quartz # machine to run on - bank: guests # bank + host: quartz # machine to run on + bank: guests # bank queue: pdebug # partition study: @@ -419,10 +465,10 @@ env: OUTPUT_PATH: ./Episode3 ``` -This `env` block goes above our `study` block; `env` is at the same level -of indentation as `study`. In this case, directories created by runs using -this `OUTPUT_PATH` will all be grouped inside the directory `Episode3`, to help us group runs by where we are in the lesson. - +This `env` block goes above our `study` block; `env` is at the same level of +indentation as `study`. In this case, directories created by runs using this +`OUTPUT_PATH` will all be grouped inside the directory `Episode3`, to help us +group runs by where we are in the lesson. ::: challenge @@ -463,20 +509,22 @@ study: ## Dry-run (`--dry`) mode -It's often useful to run Maestro in `--dry` mode, which causes Maestro to create scripts -and the directory structure without actually running jobs. You will see this parameter -if you run `maestro run --help`. +It's often useful to run Maestro in `--dry` mode, which causes Maestro to +create scripts and the directory structure without actually running jobs. +You will see this parameter if you run `maestro run --help`. ::: challenge -Do a couple `dry-run`s using the script created in the last challenge. This should help you -verify that a new directory "Episode3" gets created for runs from this episode. +Do a couple `dry` runs using the script created in the last challenge. This +should help you verify that a new directory "Episode3" gets created for runs +from this episode. -**Note**: `--dry-run` is an input for `maestro run`, **not** for `amdahl`. To do a dry -run, you shouldn't need to update your YAML file at all. Instead, you just run +__Note__: `--dry` is an input for `maestro run`, __not__ for `amdahl`. To +do a dry run, you shouldn't need to update your YAML file at all. Instead, you +just run -``` -maestro run --dry-run +```bash +maestro run --dry «YAML filename» ``` :::::: solution @@ -486,19 +534,19 @@ After running ```bash maestro run --dry amdahl.yaml ``` + a directory path of the form `Episode3/Amdahl_{DATE}_{TIME}/amdahl` should be created. :::::: ::: - ::: keypoints - "Adding `$(LAUNCHER)` before commands signals to Maestro to use MPI via `srun`." -- "New Maestro runs can be grouped within a new directory specified by the environment -variable `OUTPUT_PATH`" -- You can use `--dry` to verify that the expected directory structure and scripts -are created by a given Maestro YAML file. +- "New Maestro runs can be grouped within a new directory specified by the + environment variable `OUTPUT_PATH`" +- You can use `--dry` to verify that the expected directory structure and + scripts are created by a given Maestro YAML file. ::: diff --git a/episodes/04-placeholders.md b/episodes/04-placeholders.md index 02cf2ab..94b60fe 100644 --- a/episodes/04-placeholders.md +++ b/episodes/04-placeholders.md @@ -5,25 +5,41 @@ exercises: 30 --- ::: questions + - "How do I make a generic rule?" + ::: ::: objectives + - "Learn to use variables as placeholders" - "Learn to run many similar Maestro runs at once" + ::: -::: callout +:::::::::::: callout + +If you've opened a new terminal, make sure Maestro is available. + +```bash +source /usr/global/docs/training/janeh/maestro_venv/bin/activate +``` + +::::::::::::::::: + ## D.R.Y. (Don't Repeat Yourself) +::: callout + In many programming languages, the bulk of the language features are there to allow the programmer to describe long-winded computational -routines as short, expressive, beautiful code. Features in Python, -R, or Java, such as user-defined variables and functions are useful in -part because they mean we don't have to write out (or think about) -all of the details over and over again. This good habit of writing -things out only once is known as the "Don't Repeat Yourself" -principle or D.R.Y. +routines as short, expressive, beautiful code. Features in Python, R, +or Java, such as user-defined variables and functions are useful in +part because they mean we don't have to write out (or think about) all +of the details over and over again. This good habit of writing things +out only once is known as the "Don't Repeat Yourself" principle or +D.R.Y. + ::: Maestro YAML files are a form of code and, in any code, repetition can @@ -31,8 +47,7 @@ lead to problems (e.g. we rename a data file in one part of the YAML but forget to rename it elsewhere). In this episode, we'll set ourselves up with ways to avoid repeating -ourselves by using *environment variables* as *placeholders*. - +ourselves by using _environment variables_ as _placeholders_. ## Placeholders @@ -40,9 +55,11 @@ Over the course of this lesson, we want to use the `amdahl` binary to show how the execution time of a program changes with the number of processes used. In on our current setup, to run amdahl for multiple values of `procs`, we would need to run our workflow, change `procs`, -rerun, and so forth. We'd be repeating our workflow a lot, so let's first try fixing that by defining multiple rules. +rerun, and so forth. We'd be repeating our workflow a lot, so let's +first try fixing that by defining multiple rules. -At the end of our last episode, our `amdahl.yaml` file contained the sections +At the end of our last episode, our `amdahl.yaml` file contained the +sections ```yml (...) @@ -63,7 +80,10 @@ study: walltime: "00:00:30" ``` -Let's call our existing step `amdahl-1` (`name` under `study`) and create a second step called `amdahl-2` which is exactly the same, except that it will define `procs: 8`. While we're at it, let's update `OUTPUT_PATH` so that it is `./Episode4`. +Let's call our existing step `amdahl-1` (`name` under `study`) and +create a second step called `amdahl-2` which is exactly the same, +except that it will define `procs: 8`. While we're at it, let's update +`OUTPUT_PATH` so that it is `./Episode4`. The updated part of the script now looks like @@ -95,8 +115,8 @@ study: ::: challenge -Update `amdahl.yaml` to include the new info shown above. Run a dry run -to see what your output directory structure looks like. +Update `amdahl.yaml` to include the new info shown above. Run a dry +run to see what your output directory structure looks like. ::: @@ -104,17 +124,20 @@ Now let's start to get rid of some of the redundancy in our new workflow. First off, defining the parallel proportion (`-p .999`) in two places -makes our lives harder. Now if we want to change this value, we have to -update it in two places, but we can make this easier by using an +makes our lives harder. Now if we want to change this value, we have +to update it in two places, but we can make this easier by using an environment variable. -Let's create another environment variable in the `variables` second under -`env`. We can define a new parallel proportion as `P: .999`. Then, under -`run`'s `cmd` for each step, we can call this environment variable with the -syntax `$(P)`. `$(P)` holds the place of and will be substituted by `.999` -when Maestro creates a Slurm submission script for us. +Let's create another environment variable in the `variables` second +under `env`. We can define a new parallel proportion as `P: +.999`. Then, under `run`'s `cmd` for each step, we can call this +environment variable with the syntax `$(P)`. `$(P)` holds the place of +and will be substituted by `.999` when Maestro creates a Slurm +submission script for us. -Let's also create an environment variable for our output file, `amdahl.json` called `OUTPUT` and then call that variable from our `cmd` fields. +Let's also create an environment variable for our output file, +`amdahl.json` called `OUTPUT` and then call that variable from our +`cmd` fields. Our updated section will now look like this: @@ -146,20 +169,20 @@ study: walltime: "00:00:30" ``` -We've added two new placeholders to make our YAML script to make it a tad -bit more efficient. Note that we had already been using a placeholder given to -us by Maestro: $(LAUNCHER) holds the place of a call to -`srun ` - +We've added two new placeholders to make our YAML script to make it a +tad bit more efficient. Note that we had already been using a +placeholder given to us by Maestro: $(LAUNCHER) holds the place of a +call to `srun `. ::: challenge -Run your updated `amdahl.yaml` and check results, to verify your workflow is -working with the changes you've made so far. +Run your updated `amdahl.yaml` and check results, to verify your +workflow is working with the changes you've made so far. ::::::solution The full YAML text is + ```yml description: name: Amdahl @@ -201,18 +224,21 @@ study: ## Maestro's global.parameters -We're almost ready to perform our scaling study -- to see how the execution time -changes as we use more processors in the job. Unfortunately, we're still -repeating ourselves a lot because, in spite of the environment variables -we created, most of the information defined for steps `amdahl-1` and `amdahl-2` -is the same. Only the `procs` field changes! - -A great way to avoid repeating ourselves here by defining a **parameter** that -lists multiple values of tasks and runs a separate job step for each value. We do -this by adding a `global.parameters` section at the bottom of the script. We -then define individual parameters within this section. Each parameter includes -a list of `values` (Each element is used in its own job step.) and a `label`. -(The `label` helps define how the output directory structure is named.) +We're almost ready to perform our scaling study -- to see how the +execution time changes as we use more processors in the +job. Unfortunately, we're still repeating ourselves a lot because, in +spite of the environment variables we created, most of the information +defined for steps `amdahl-1` and `amdahl-2` is the same. Only the +`procs` field changes! + +A great way to avoid repeating ourselves here by defining a +__parameter__ that lists multiple values of tasks and runs a separate +job step for each value. We do this by adding a `global.parameters` +section at the bottom of the script. We then define individual +parameters within this section. Each parameter includes a list of +`values` (Each element is used in its own job step.) and a `label`. +(The `label` helps define how the output directory structure is +named.) This is what it looks like to define a global parameter: @@ -223,20 +249,21 @@ global.parameters: label: TASKS.%% ``` -Note that the label should include `%%` as above; the `%%` is itself a placeholder! -The directory created for the output of each job step will be identified by the -value of each parameter it used, and the parameter's value will be inserted to -replace the `%%`. +Note that the label should include `%%` as above; the `%%` is itself a +placeholder! The directory created for the output of each job step +will be identified by the value of each parameter it used, and the +parameter's value will be inserted to replace the `%%`. -Next, we should update the line under `run` -> `cmd` defining `procs` to include the name -of the parameter enclosed in `$()`: +Next, we should update the line under `run` -> `cmd` defining `procs` +to include the name of the parameter enclosed in `$()`: ```yml procs: $(TASKS) ``` -If we make this change for steps `amdahl-1` *and* `amdahl-2`, they will now -look *exactly* the same, so we can simply condense them to one step. +If we make this change for steps `amdahl-1` _and_ `amdahl-2`, they +will now look _exactly_ the same, so we can simply condense them to +one step. The full YAML file will look like @@ -276,21 +303,23 @@ global.parameters: ::: challenge -Run `maestro run --dry amdahl.yaml` using the above YAML file -and investigate the resulting directory structure. How does -the list of task values under `global.parameters` change the -output directory organization? +Run `maestro run --dry amdahl.yaml` using the above YAML file and +investigate the resulting directory structure. How does the list of +task values under `global.parameters` change the output directory +organization? ::::::solution -Under your current working directory, you should see a directory structure -created with the following format -- `Episode4/Amdahl_-