carpentries-incubator · multimeric · Jun 15, 2023 · Jun 15, 2023 · Jun 30, 2023 · Jun 30, 2023
diff --git a/config.yaml b/config.yaml
@@ -69,6 +69,7 @@ episodes:
 - branch.Rmd
 - parallel.Rmd
 - quarto.Rmd
+- hpc.Rmd
 
 # Information for Learners
 learners: 

diff --git a/episodes/hpc.Rmd b/episodes/hpc.Rmd
@@ -0,0 +1,295 @@
+---
+title: 'Deploying Targets on HPC'
+teaching: 10
+exercises: 2
+---
+
+```{R, echo=FALSE}
+# Exit sensibly when Slurm isn't installed 
+if (!nzchar(Sys.which("sbatch"))){
+  knitr::knit_exit("sbatch was not detected. Likely Slurm is not installed. Exiting.")
+}
+```
+
+:::::::::::::::::::::::::::::::::::::: questions 
+
+- Why would we use HPC to run Targets workflows?
+- How can we run Targets workflows on Slurm?
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: objectives
+
+- Be able to generate a report using `targets`
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: instructor
+
+Episode summary: Show how to write reports with Quarto
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+```{r}
+#| label: setup
+#| echo: FALSE
+#| message: FALSE
+#| warning: FALSE
+library(targets)
+library(tarchetypes)
+library(quarto) # don't actually need to load, but put here so renv catches it
+source("https://raw.githubusercontent.com/joelnitta/targets-workshop/main/episodes/files/functions.R?token=$(date%20+%s)") # nolint
+
+# Increase width for printing tibbles
+options(width = 140)
+```
+
+## Advantages of HPC
+
+If your analysis involves computationally intensive or long-running tasks such as training machine learning models or processing very large amounts of data, it will quickly become infeasible to use a single machine to run this.
+If you are part of an organisation with access to a High Performance Computing (HPC) cluster, you can easily leverage the numerous machines with Targets to scale up your analysis.
+This differs from the exucution we have learned so far, which spawns extra R processes on the *same machine* to speed up execution.
+
+## Configuring Targets for Slurm
+
+Fortunately, using HPC is as simple as changing the Targets `controller`.
+In this section we will assume that our HPC uses Slurm as its job scheduler, but you can easily use other schedulers such as PBS/TORQUE, Sun Grid Engine (SGE) or LSF.
+
+In the Parallel Processing section, we used the following configuration:
+```{R}
+library(crew)
+tar_option_set(
+  controller = crew_controller_local(workers = 2)
+)
+```
+To configure this for Slurm, we just swap out the controller with a new one from the `crew.cluster` package:
+
+```{R}
+library(crew.cluster)
+tar_option_set(
+  controller = crew_controller_slurm(
+    workers = 3,
+    script_lines = "module load R"
+  )
+)
+```
+
+There are a number of options you can pass to `crew_controller_slurm()` to fine-tune the Slurm execution, [which you can find here](https://wlandau.github.io/crew.cluster/reference/crew_controller_slurm.html).
+Here we are only using two:
+
+  * `workers` sets the number of jobs that are submitted to Slurm to process targets.
+  * `script_lines` adds some lines to the Slurm submit script used by Targets. This is useful for loading Environment Modules and adding `#SBATCH` options.
+
+Let's run the modified workflow:
+
+```{R, eval=FALSE}
+source("R/packages.R")
+source("R/functions.R")
+
+library(crew.cluster)
+tar_option_set(
+  controller = crew_controller_slurm(
+    workers = 3,
+    script_lines = "module load R"
+  )
+)
+
+tar_plan(
+  # Load raw data
+  tar_file_read(
+    penguins_data_raw,
+    path_to_file("penguins_raw.csv"),
+    read_csv(!!.x, show_col_types = FALSE)
+  ),
+  # Clean data
+  penguins_data = clean_penguin_data(penguins_data_raw),
+  # Build models
+  models = list(
+    combined_model = lm(
+      bill_depth_mm ~ bill_length_mm, data = penguins_data),
+    species_model = lm(
+      bill_depth_mm ~ bill_length_mm + species, data = penguins_data),
+    interaction_model = lm(
+      bill_depth_mm ~ bill_length_mm * species, data = penguins_data)
+  ),
+  # Get model summaries
+  tar_target(
+    model_summaries,
+    glance_with_mod_name_slow(models),
+    pattern = map(models)
+  ),
+  # Get model predictions
+  tar_target(
+    model_predictions,
+    augment_with_mod_name_slow(models),
+    pattern = map(models)
+  )
+)
+```
+
+::: challenge
+## Increasing Resources
+
+Q: How would you modify your `_targets.R` if your targets needed 200GB of RAM?
+
+::: hint
+Check the arguments for [`crew_controller_slurm`](https://wlandau.github.io/crew.cluster/reference/crew_controller_slurm.html#arguments-1).
+:::
+::: solution
+```R
+tar_option_set(
+  controller = crew_controller_slurm(
+    workers = 3,
+    script_lines = "module load R",
+    # Added this
+    slurm_memory_gigabytes_per_cpu = 200,
+    slurm_cpus_per_task = 1
+  )
+)
+```
+:::
+:::
+
+## HPC Workers
+
+Despite what you might expect, `crew` does not submit one Slurm job for each target. 
+Instead, it uses persistent workers, meaning that you define a pool of workers when configuring the workflow.
+In our example above we used 3 workers.
+For each worker, `crew` submits a single Slurm job, and these workers will process multiple targets over their lifetime.
+
+We can verify that this has happened using `sacct`:
+
+```{bash}
+sacct
+```
+
+The upside of this approach is that we don't have to work out the minutae of how long each target takes to build, or what resources it needs.
+It also means that we don't submit a lot of jobs, making our Slurm usage more efficient and easy to monitor.
+
+The downside of this mechanism is that **the resources of the worker have to be sufficient to build each of your targets**.
+
+::: challenge
+## Choosing a Worker
+
+Q: Say we have two targets. One uses 100 GB of RAM and 1 CPU, and the other needs 10 GB of RAM and 8 CPUs to run a multi-threaded function. What worker configuration do we use?
+
+::: solution
+We need to choose the maximum of all resources if we have a single worker.
+It will need 100 GB of RAM and 8 CPUs.
+To do this we might use a controller a bit like this:
+```{R, results="hide"}
+crew_controller_slurm(
+  name = "cpu_worker",
+  workers = 3,
+  script_lines = "
+#SBATCH --cpus-per-task=8
+module load R",
+  slurm_memory_gigabytes_per_cpu = 100
+)
+```
+:::
+:::
+
+## Heterogeneous Workers
+
+In some cases we may prefer heterogeneous workers, especially if some of our targets need a GPU and others need a CPU.
+To do this, we firstly define each worker configuration by adding the `name` argument to `crew_controller_slurm`.
+Note that this time we aren't passing it into `tar_option_set`:
+
+```{R, results="hide"}
+library(crew.cluster)
+crew_controller_slurm(
+  name = "cpu_worker",
+  workers = 3,
+  script_lines = "module load R",
+  slurm_memory_gigabytes_per_cpu = 200,
+  slurm_cpus_per_task = 1
+)
+```
+
+Then we specify this controller by name in each target definition:
+
+```{R, results="hide"}
+tar_target(
+  name = cpu_task,
+  command = run_model2(data),
+  resources = tar_resources(
+    crew = tar_resources_crew(controller = "cpu_worker")
+  )
+)
+```
+
+::: challenge
+## Mixing GPU and CPU targets
+
+Q: Say we have the following targets workflow. How would we modify it so that `gpu_task` is only run in a GPU Slurm job?
+```{R, eval=FALSE}
+graphics_devices <- function(){
+  system2("lshw", c("-class", "display"), stdout=TRUE, stderr=FALSE)
+}
+
+tar_plan(
+  tar_target(
+    cpu_hardware,
+    graphics_devices()
+  ),
+  tar_target(
+    gpu_hardware,
+    graphics_devices()
+  )
+)
+```
+
+::: hint
+You will need to define two different crew controllers.
+:::
+::: solution
+```R
+graphics_devices <- function(){
+  system2("lshw", c("-class", "display"), stdout=TRUE, stderr=FALSE)
+}
+
+library(crew.cluster)
+crew_controller_slurm(
+  name = "cpu_worker"
+  workers = 3,
+  script_lines = "module load R",
+  slurm_memory_gigabytes_per_cpu = 200,
+  slurm_cpus_per_task = 1
+)
+crew_controller_slurm(
+  name = "gpu_worker"
+  workers = 3,
+  script_lines = "#SBATCH --gres=gpu:1
+module load R",
+  slurm_memory_gigabytes_per_cpu = 200,
+  slurm_cpus_per_task = 1
+)
+
+tar_plan(
+  tar_target(
+    cpu_hardware,
+    graphics_devices(),
+    resources = tar_resources(
+      crew = tar_resources_crew(controller = "cpu_worker")
+    )
+  ),
+  tar_target(
+    gpu_hardware,
+    graphics_devices(),
+    resources = tar_resources(
+      crew = tar_resources_crew(controller = "gpu_worker")
+    )
+  )
+)
+```
+:::
+:::
+
+::::::::::::::::::::::::::::::::::::: keypoints 
+
+- `crew.cluster::crew_controller_slurm()` is used to configure a workflow to use Slurm
+- Crew uses persistent workers on HPC, and you need to choose your resources accordingly
+- You can create heterogeneous workers by using multiple calls to `crew_controller_slurm(name=)`
+
+::::::::::::::::::::::::::::::::::::::::::::::::
diff --git a/learners/setup.md b/learners/setup.md
@@ -31,4 +31,4 @@ install.packages(
 
 There is a [Posit Cloud](https://posit.cloud/) instance with RStudio and all necessary packages pre-installed available, so you don't need to install anything on your own computer. You may need to create an account (free).
 
-Click this link to open: <https://posit.cloud/content/6064275>
+Click this link to open: <https://posit.cloud/content/6064275>
Original file line number	Diff line number	Diff line change
Expand Up		@@ -31,4 +31,4 @@ install.packages(

		There is a [Posit Cloud](https://posit.cloud/) instance with RStudio and all necessary packages pre-installed available, so you don't need to install anything on your own computer. You may need to create an account (free).

		Click this link to open: <https://posit.cloud/content/6064275>
		Click this link to open: <https://posit.cloud/content/6064275>