Skip to content

Commit

Permalink
Merge pull request #2 from tkphd/tinkering
Browse files Browse the repository at this point in the history
Linting of Markdown & YAML, and wrapping long lines
  • Loading branch information
xorJane authored Jun 4, 2024
2 parents e0745a2 + cca9771 commit d5bd8e9
Show file tree
Hide file tree
Showing 11 changed files with 825 additions and 525 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -49,3 +49,4 @@ docs/
# translation temp files
po/*~

*~
200 changes: 114 additions & 86 deletions episodes/01-introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,140 +5,161 @@ exercises: 30
---

::: questions

- "How do I run a simple command with Maestro?"

:::

:::objectives

- "Create a Maestro YAML file"
:::

:::

## What is the workflow I'm interested in?

In this lesson we will make an experiment that takes an application which runs
in parallel and investigate it's scalability. To do that we will need to gather
data, in this case that means running the application multiple times with
different numbers of CPU cores and recording the execution time. Once we've
done that we need to create a visualisation of the data to see how it compares
against the ideal case.
In this lesson we will make an experiment that takes an application
which runs in parallel and investigate it's scalability. To do that we
will need to gather data, in this case that means running the
application multiple times with different numbers of CPU cores and
recording the execution time. Once we've done that we need to create a
visualization of the data to see how it compares against the ideal
case.

From the visualisation we can then decide at what scale it
makes most sense to run the application at in production to maximise the use of
From the visualization we can then decide at what scale it makes most
sense to run the application at in production to maximize the use of
our CPU allocation on the system.

We could do all of this manually, but there are useful tools to help us manage
data analysis pipelines like we have in our experiment. Today we'll learn about
one of those: Maestro.
We could do all of this manually, but there are useful tools to help
us manage data analysis pipelines like we have in our
experiment. Today we'll learn about one of those: Maestro.

In order to get started with Maestro, let's begin by taking a simple command
and see how we can run that via Maestro. Let's choose the command `hostname`
which prints out the name of the host where the command is executed:
In order to get started with Maestro, let's begin by taking a simple
command and see how we can run that via Maestro. Let's choose the
command `hostname` which prints out the name of the host where the
command is executed:

```bash
janeh@pascal83:~$ hostname
hostname
```

```output
pascal83
```

That prints out the result but Maestro relies on files to know the status of
your workflow, so let's redirect the output to a file:
That prints out the result but Maestro relies on files to know the
status of your workflow, so let's redirect the output to a file:

```bash
janeh@pascal83:~$ hostname > hostname_login.txt
```

## Writing a Maestro YAML

Edit a new text file named `hostname.yaml`.
Edit a new text file named `hostname.yaml`. The file extension is a
recursive initialism for ["YAML Ain't Markup Language"][yaml-lang], a
popular format for configuration files and key-value data
serialization. For more, see the Wikipedia page, esp. [YAML
Syntax](https://en.wikipedia.org/wiki/YAML#Syntax).

Contents of `hostname.yaml`:
[yaml-lang]: https://yaml.org

Contents of `hostname.yaml` (spaces matter!):

```yml
description:
name: Hostnames
description: Report a node's hostname.
name: Hostnames
description: Report a node's hostname.

study:
- name: hostname-login
description: Write the login node's hostname to a file
run:
cmd: |
hostname > hostname_login.txt
- name: hostname-login
description: Write the login node's hostname to a file.
run:
cmd: |
hostname > hostname_login.txt
```
::: callout
## Key points about this file
1. The name of `hostname.yaml` is not very important; it gives us information
about file contents and type, but maestro will behave the same if you rename
it to `hostname` or `foo.txt`.
1. The file specifies fields in a hierarchy. For example, `name`, `description`,
and `run` are all passed to `study` and are at the same level in the hierarchy.
`description` and `study` are both at the top level in the hierarchy.
1. Indentation indicates the hierarchy and should be consistent. For example, all
the fields passed directly to `study` are indented relative to `study` and
their indentation is all the same.
1. The commands executed during the study are given under `cmd`. Starting this
entry with `|` and a newline character allows us to specify multiple commands.
1. The example YAML file above is pretty minimal; all fields shown are required.
1. The names given to `study` can include letters, numbers, and special characters.

1. The name of `hostname.yaml` is not very important; it gives us
information about file contents and type, but maestro will behave
the same if you rename it to `hostname` or `foo.txt`.
2. The file specifies fields in a hierarchy. For example, `name`,
`description`, and `run` are all passed to `study` and are at the
same level in the hierarchy. `description` and `study` are both at
the top level in the hierarchy.
3. Indentation indicates the hierarchy and should be consistent. For
example, all the fields passed directly to `study` are indented
relative to `study` and their indentation is all the same.
4. The commands executed during the study are given under
`cmd`. Starting this entry with `|` and a newline character allows
us to specify multiple commands.
5. The example YAML file above is pretty minimal; all fields shown are
required.
6. The names given to `study` can include letters, numbers, and
special characters.

:::

Back in the shell we'll run our new rule. At this point, we may see an error if
a required field is missing or if our indentation is inconsistent.
Back in the shell we'll run our new rule. At this point, we may see an
error if a required field is missing or if our indentation is
inconsistent.

```bash
$ maestro run hostname.yaml
janeh@pascal83:~$ maestro run hostname.yaml
```

::: callout

## `bash: maestro: command not found...`

If your shell tells you that it cannot find the command `maestro` then we need
to make the software available somehow. In our case, this means activating the
python virtual environment where maestro is installed.
If your shell tells you that it cannot find the command `maestro` then
we need to make the software available somehow. In our case, this
means activating the python virtual environment where maestro is
installed.

```bash
source /usr/global/docs/training/janeh/maestro_venv/bin/activate
```

You can tell this command has already been run when `(maestro_venv)` appears
before your command prompt:

You can tell this command has already been run when `(maestro_venv)`
appears before your command prompt:

```bash
janeh@pascal83:~$ source /usr/global/docs/training/janeh/maestro_venv/bin/activate
(maestro_venv) janeh@pascal83:~$
```

Now that the `maestro_venv` virtual environment has been activated, the `maestro`
command should be available, but let's double check
Now that the `maestro_venv` virtual environment has been activated,
the `maestro` command should be available, but let's double check

```bash
(maestro_venv) janeh@pascal83:~$ which maestro
```

```output
/usr/global/docs/training/janeh/maestro_venv/bin/maestro
```
:::

:::

## Running maestro

Once you have `maestro` available to you, run `maestro run hostname.yaml`
and enter `y` when prompted
Once you have `maestro` available to you,
run `maestro run hostname.yaml` and enter `y` when prompted:

```bash
(maestro_venv) janeh@pascal83:~$ maestro run hostname.yaml
```

```output
[2024-03-20 15:39:34: INFO] INFO Logging Level -- Enabled
[2024-03-20 15:39:34: WARNING] WARNING Logging Level -- Enabled
[2024-03-20 15:39:34: CRITICAL] CRITICAL Logging Level -- Enabled
[2024-03-20 15:39:34: INFO] Loading specification -- path = hostname.yaml
[2024-03-20 15:39:34: INFO] Directory does not exist. Creating directories to /g/g0/janeh/Hostnames_20240320-153934/logs
[2024-03-20 15:39:34: INFO] Directory does not exist. Creating directories to ~/Hostnames_20240320-153934/logs
[2024-03-20 15:39:34: INFO] Adding step 'hostname-login' to study 'Hostnames'...
[2024-03-20 15:39:34: INFO]
------------------------------------------
Expand All @@ -148,43 +169,47 @@ Submission throttle limit = 0
Use temporary directory = False
Hash workspaces = False
Dry run enabled = False
Output path = /g/g0/janeh/Hostnames_20240320-153934
Output path = ~/Hostnames_20240320-153934
------------------------------------------
Would you like to launch the study? [yn] y
Study launched successfully.
```

and look at the outputs. You should have a new directory whose name includes a
date and timestamp and that starts with the `name` given under `description`
at the top of `hostname.yaml`.
and look at the outputs. You should have a new directory whose name
includes a date and timestamp and that starts with the `name` given
under `description` at the top of `hostname.yaml`.

In that directory will be a subdirectory for every `study` run from
`hostname.yaml`. The subdirectories for each study include all output files
for that study
`hostname.yaml`. The subdirectories for each study include all output
files for that study.

```bash
(maestro_venv) janeh@pascal83:~$ cd Hostnames_20240320-153934/
(maestro_venv) janeh@pascal83:~/Hostnames_20240320-153934$ ls
```

```output
batch.info Hostnames.pkl Hostnames.txt logs status.csv
hostname-login Hostnames.study.pkl hostname.yaml meta
```

```bash
(maestro_venv) janeh@pascal83:~/Hostnames_20240320-153934$ cd hostname-login/
(maestro_venv) janeh@pascal83:~/Hostnames_20240320-153934/hostname-login$ ls
```output
```

``` output
hostname-login.2284862.err hostname-login.2284862.out hostname-login.sh hostname_login.txt
```

::: challenge

To which file will the login node's hostname, `pascal83`, be written?

1. hostname-login.2284862.err
2. hostname-login.2284862.out
3. hostname-login.sh
4. hostname_login.txt
1. `hostname-login.2284862.err`
2. `hostname-login.2284862.out`
3. `hostname-login.sh`
4. `hostname_login.txt`

:::::: solution
(4) hostname_login.txt
Expand All @@ -198,44 +223,47 @@ we'll see that output, if the run worked!
::: challenge

This one is tricky! In the example above, `pascal83` was written to
`.../Hostnames_{date}_{time}/hostname-login/hostname_login.txt`.
`~/Hostnames_{date}_{time}/hostname-login/hostname_login.txt`.

Where would `Hello` be written for the following YAML?

```yml
description:
name: MyHello
description: Report a node's hostname.
name: MyHello
description: Report a node's hostname.
study:
- name: give-salutation
description: Write the login node's hostname to a file
run:
cmd: |
echo "hello" > greeting.txt
- name: give-salutation
description: Write the login node's hostname to a file
run:
cmd: |
echo "hello" > greeting.txt
```

1. `.../give-salutation_{date}_{time}/greeting/greeting.txt`
2. `.../greeting_{date}_{time}/give_salutation/greeting.txt`
3. `.../MyHello_{date}_{time}/give-salutation/greeting.txt`
4. `.../MyHello_{date}_{time}/greeting/greeting.txt`
1. `~/give-salutation_{date}_{time}/greeting/greeting.txt`
2. `~/greeting_{date}_{time}/give_salutation/greeting.txt`
3. `~/MyHello_{date}_{time}/give-salutation/greeting.txt`
4. `~/MyHello_{date}_{time}/greeting/greeting.txt`

:::::: solution

(3) `.../MyHello_{date}_{time}/give-salutation/greeting.txt`

The toplevel folder created starts with the `name` field under `description`; here, that's `MyHello`.
Its subdirectory is named after the `study`; here, that's `give-salutation`.
The file created is `greeting.txt` and this stores the output of `echo "hello"`.
The top-level folder created starts with the `name` field under
`description`; here, that's `MyHello`. Its subdirectory is named after
the `study`; here, that's `give-salutation`. The file created is
`greeting.txt` and this stores the output of `echo "hello"`.

::::::
:::

::: keypoints

- "You execute `maestro run` with a YAML file including information about your run."
- "Your run includes a description and at least one study (a step in your run)."
- "Your maestro run creates a directory with subdirectories and outputs for each study."
- You execute `maestro run` with a YAML file including information
about your run.
- Your run includes a description and at least one study (a step in
your run).
- Your maestro run creates a directory with subdirectories and
outputs for each study.

:::
Loading

0 comments on commit d5bd8e9

Please sign in to comment.