diff --git a/episodes/03-disk-image.md b/episodes/03-disk-image.md index 32cb876..b0b88f9 100644 --- a/episodes/03-disk-image.md +++ b/episodes/03-disk-image.md @@ -49,6 +49,7 @@ We create a bucket for these logs with gcloud storage buckets create gs:/// --location europe-west4 ``` +### Go installed ### Enabling services @@ -80,7 +81,7 @@ When services are enabled, some "service accounts" with specific roles get creat gcloud projects get-iam-policy ``` -Often, the resources need to work with each other. In this case, the Cloud Build service needs to have two additional roles to access Compute resources (the image disk belongs to that category) +Often, the resources need to work with each other. In this case, the Cloud Build service needs to have two additional roles to access Compute resources (the image disk belongs to that category). Add them with @@ -130,9 +131,15 @@ To create the image with the script, you must have `go` installed. Run the script with ```bash -go run ./cli --project-name= --image-name=pfnano-disk-image --zone=europe-west4-a --gcs-path=gs:// --disk-size-gb=50 --container-image=ghcr.io/katilp/pfnano-image-build:main --timeout 100m +go run ./cli --project-name= --image-name=pfnano-disk-image --zone=europe-west4-a --gcs-path=gs:// --disk-size-gb=50 --container-image=docker.io/cernopendata/cernopendata-client:latest --container-image=docker.io/rootproject/root:latest --container-image=ghcr.io/cms-dpoa/pfnano-image-build:main --timeout 100m ``` +:::::::::::::::::::::::::::::::::::::::::: callout + +Note that while images can in most cases be "pulled" from Dockerhub specifying only the image name (e.g. `cernopendata/cernopendata-client` and `rootproject/root`), in this script you must give the full registry address starting with `docker.io` **and** specify a tag (i.e. `:latest`). + +:::::::::::::::::::::::::::::::::::::::::::::::: + :::::::::::::::::::::::::::::::::::::::::: spoiler ### Error: Quota exceeded? diff --git a/episodes/05-workflow.md b/episodes/05-workflow.md index 81e66f8..69336b2 100644 --- a/episodes/05-workflow.md +++ b/episodes/05-workflow.md @@ -93,6 +93,14 @@ replicaset.apps/argo-server-5f7b589d6f 1 1 1 24s replicaset.apps/workflow-controller-864c88655d 1 1 1 24s ``` +## About Argo Workflows + +The data processing example is defined as an Argo workflow. You can learn about Argo Workflows in their [documentation](https://argo-workflows.readthedocs.io/en/latest/). + +Every step in the workflow runs in a container, and there several ways to pass the information between the steps. + +The example configuration in `argo/argo_bucket_run.yaml` has comments to help you to understand how the files and/or parameters can be passed from a step to another. + ## Submit a test job Edit the parameters in the `argo/argo_bucket_run.yaml` so that they are @@ -100,19 +108,15 @@ Edit the parameters in the `argo/argo_bucket_run.yaml` so that they are ``` parameters: - name: nEvents - #FIXME # Number of events in the dataset to be processed (-1 is all) value: 1000 - name: recid - #FIXME # Record id of the dataset to be processed value: 30511 - name: nJobs - #FIXME # Number of jobs the processing workflow should be split into value: 2 - name: bucket - #FIXME # Name of cloud storage bucket for storing outputs value: ``` @@ -142,6 +146,22 @@ gs:///pfnano/30511/scatter/pfnanooutput1.root gs:///pfnano/30511/scatter/pfnanooutput2.root ``` +### Delete resources + +Delete the workflow after each run so that the "pods" do not accumulate. They are not running anymore but still visible. + +```bash +argo delete -n argo @latest +``` + +Do not keep the cluster, if you are not running jobs. The cost goes by the time it exists, not by the time it is in use. You can delete all resources created by the Terraform script with + +```bash +terraform destroy +``` + +Confirm with "yes". + ## Costs