Skip to content

Setting up JAMO

Andrew Tritt edited this page Nov 13, 2024 · 4 revisions

Running JAMO at NERSC

The jamo directory contains all the necessary files for setting up JAMO at NERSC. Running JAMO at NERSC consists of three main steps:

  1. Building the JAMO application as a Docker image. The relevant files for this can be found in jamo/docker.
  2. Instantiating JAMO on NERSC's Spin infrastructure. The necessary files for this can be found in jamo/k8s and jamo/config.
  3. Running the data transfer service on Perlmutter and the Data transfer nodes. The necessary files for this can be found in jamo/dt_service.

Before charging ahead, make sure you have access to Spin, NERSC's container based platform for service deployment. You will need to attend a SpinUp Workshop. It will also be useful to get access to NERSC's private container registry.

Building the JAMO application

Prerequisites

Before getting started, you will need to set up an account on JGI's Gitlab and set up a personal access token (PAT). You can create an account on JGI's Gitlab by going here and signing in with your LBL LDAP. Once you are signed in, click on your avatar and go to Preferences->Access Tokens->Add Token. Give the token a name and set an expiration date. Under Select scopes, select the box for read_repository.

Getting the JAMO code

After checking out this repository, change into jamo/docker. From here, set USER and PAT environment variables to your JGI Gitlab username and PAT, respectively, and run get_code.sh to retrieve the all the necessary code for running JAMO.

USER=<JGI Gitlab Username> PAT=<Gitlab personal access token> bash get_code.sh

Building the Docker image

Now that you have all the JAMO code, you can build a Docker image for running JAMO at NERSC. You will need to make your image available from Spin at some Docker registry. This tutorial uses NERSC's private registry. See the NERSC documentation for getting access to this registry.

Once you have access, sign in to the registry:

docker login registry.nersc.gov

Now build and push your image to the registry:

docker build -t registry.nersc.gov/<PID>/jamo-service:<TAG> --push .

You will need to set a tag (i.e. <TAG>). Please see the registry for current set of tags and choose a nonexistent tag. You will also need to fill in the NERSC project ID.

If you are building your image locally, you will probably need to use docker buildx to build your image for multiple platforms or, at the very least, build for the platform running NERSC Spin (i.e. linux/amd64). Below is an example of how you would build an image for running on Apple silicon and a Linux machine.

docker buildx build --platform linux/amd64,linux/arm64 -t registry.nersc.gov/<PID>/jamo-service:<TAG> --push .

Running JAMO on Spin

NERSC's Spin infrastructure uses Rancher, a management and orchestration framework for Kubernetes clusters. The jamo/k8s directory contains the Kubernetes configuration files for setting up all the components necessary for running JAMO. The Rancher Objects list below indicates the YAML file, which type of Kubernetes object to create in Rancher, and a brief description of what the object's purpose and/or how to augment the config file for your instance.

  • Secret: Storage->Secrets
  • ConfigMap: Storage->ConfigMaps
  • PersistentVolumeClaim: Storage->PersistentVolumeClaims
  • Deployment: Workloads->Deployments
  • Ingress: Service Discovery->Ingresses

You will need to update the namespace for all objects. Unless otherwise noted (e.g. Deployments), you will need to modify the YAML path metadata.namespace to reflect the name of your namespace.

Rancher Objects

  1. secrets/jamo-dev-cert.yaml - Secret containing the JAMO service SSL certificate. To create this Secret, you will need to request a certificate for your service. Before you do that, you will need to pick a Fully Qualified Domain Name or FQDN. This is the DNS name you will submit to LBLnet (step 12). Once you know what you want your JAMO host (a.k.a. FQDN, DNS name, URL) to be, you can request a certificate through the lab here. To request a certificate, you will need to generate a certificate signing request (CSR). You can ask your favorite AI chatbot on how to do this or see instructions here. There are more details on requesting certificates available at the Berkeley Lab Commons.

    After submitting your request,you should recieve ane email from Sectigo Certificate Manager with links for downloading your certificate. You will need to download the Certificate (w/ issuer after), PEM encoded option. Create this secret through the Rancher UI by creating a TLS Certificate (i.e. Storage->Secrets->Create->TLS Certificate), and copying the PEM encoded certificate into the Certificate form.

  2. secrets/google-oauth-secrets.yaml - Secret containing Google OAuth secrets. You can set this up here. JAMO needs this for setting up accounts with Google credentials. You will need to update the following in the YAML file:

    • The OAuth Secrets. This is the base64 encoding of the JSON string of the OAuth secrets you downloaded from Google. You can also create a new secret through the Rancher UI with the name google-oauth-secrets and copy your JSON string in there.
  3. secrets/sf-api-key.yaml - Secret for connecting to the Superfacility API. You can create one of these here. JAMO needs this for connecting Google credentials to NERSC accounts. You will need to update the following in the YAML file:

    • The Superfacility API key. This is the base-64 encoding of the PEM string of the API key. You can also create a new secret through the Rancher UI with the name sf-api-key and copy your PEM string in there.
  4. secrets/jamo-mongo-pass.yaml - Secret containing a password for MongoDB. This can be anything--you will not need to remember it or write it down anywhere else. You will need to update the following in the YAML file:

    • The MongoDB password (YAML path data.password). This is the base-64 encoding of the password. You can also create a new secret through the Rancher UI with the name jamo-mongo-pass.
  5. secrets/jamo-mysql-pass.yaml - Secret containing a password for MySQL. This can be anything--you will not need to remember it or write it down anywhere else. You will need to update the following in the YAML file:

    • The MySQL password (YAML path data.password). This is the base-64 encoding of the password. You can also create a new secret through the Rancher UI with the name jamo-mysql-pass.
  6. cm/sql-config.yaml - ConfigMap for storing MySQL configurations. This following components in this YAML file will need to be updated:

    • The MySQL user and the LapinPy Core and Tape database names in the initialization SQL statements (YAML path data.init.sql).
  7. pvc/jamo-mongo.yaml - PersistentVolumeClaim (i.e. storage) for MongoDB to write to. You can also create a new PVC through the Rancher UI with the name jamo-mongo.

  8. pvc/jamo-mysql.yaml - PersistentVolumeClaim (i.e. storage) for MySQL to write to. You can also create a new PVC through the Rancher UI with the name jamo-mysql.

  9. deployments/jamo-mongo.yaml - Deployment for MongoDB. This following components in this YAML file will need to be updated:

    • The deployment namespace (YAML path metadata.namespace and spec.template.metadata.namespace)
    • The MongoDB root username (YAML path spec.template.metadata.spec.containers.env.value where spec.template.metadata.spec.containers.env.name == MONGO_INITDB_ROOT_USERNAME)
  10. deployments/jamo-mysql.yaml - Deployment for MySQL. This following components in this YAML file will need to be updated:

    • The deployment namespace (YAML path metadata.namespace and spec.template.metadata.namespace)
    • The MySQL username (YAML path spec.template.metadata.spec.containers.env.value where spec.template.metadata.spec.containers.env.name == MYSQL_USER)
    • The MySQL database name (YAML path spec.template.metadata.spec.containers.env.value where spec.template.metadata.spec.containers.env.name == MYSQL_DATABASE)
  11. deployments/jamo-app.yaml - Deployment for running the JAMO service. This following components in this YAML file will need to be updated:

    • The deployment namespace (YAML path metadata.namespace and spec.template.metadata.namespace)
    • The URL of the JAMO image to use (YAML path spec.template.spec.containers.image)
    • The user to run the containers as (YAML path spec.template.spec.containers.securityContext.runAsUser). For Taskforce5's JAMO, We use the t5user collaboration account from the m4521 project.
    • The pod storage (YAML path spec.template.spec.volumes). Update vol-jamo-config and vol-jamo-wd to use paths on the community file system that you have access to
  12. ingresses/jamo-ingress.yaml - Ingress for JAMO service to access connnections. You can create this through the Rancher UI. You will need to create the standard ingress host first. Once you have your ingress, you can request a DNS name from LBLnet, at which point you can update your ingress.

    • The ingress name (YAML path metadata.name and the Spin URL (YAML path spec.rules.host where spec.rules.host == .*svc.spin.nersc.org). The host URL needs to conform to the following pattern: <NAME>.<NAMESPACE>.<INSTANCE>.svc.spin.nersc.org, where NAME is the service name, NAMESPACE is your namespace, and INSTANCE is the Rancher instance (i.e. development or production)
    • The DNS name for your JAMO service (YAML path spec.rules.host). You will need to update this after you get a DNS name from LBLnet. LBLnet will not add a DNS record for a CNAME that does not exist, so you must create the ingress first to get a CNAME (i.e. <NAME>.<NAMESPACE>.<INSTANCE>.svc.spin.nersc.org) for them to point your DNS name at.
    • The TLS certificate (YAML path spec.tls). This must be the TLS Certificate Secret you created in step 1.

Running the Data Transfer Service

Once you have the JAMO service up and running, you need to start the data transfer service or the DTS. The DTS does the work of ingesting, backing up, and restoring files.

Scripts for running DTS are located in jamo/dt_service. Since the majority of the work being done by the DTS is transfering data, it is best practice to run this on the data transfer nodes (DTNs). The DTNs do not mount Perlmutter Scratch, so we also need to run a service on Perlmutter to ingest data submitted to JAMO from there.

To keep these services alive, we run cron jobs. Perlmutter does not allow direct running of cron jobs. Instead, you need to use scrontab.

  • dt_service/run_dt_service_nersc_prod.sh - script for running on DTNs dtn03 and dtn04. This script will check for a running DTS process and start one if no such process exists.
  • dt_service/crontab.dtn.sh - crontab script for running run_dt_service_nersc_prod.sh every 2 minutes. This file will need to be modified to point to run_dt_service_nersc_prod.sh and the directory you want save logs to.
  • dt_service/run_dt_service_perlmutter_prod.sh - script for running on Perlmutter. This script will run a DTS process; it does not check for an existing process.
  • dt_service/scrontab.perlmutter.sh - scrontab script for running run_dt_service_perlmutter_prod.sh. Submit this to SLURM using scrontab. This file will need to be modified to point to scrontab.perlmutter.sh and the directory you want to save logs to. You will need access to the workflow queue to submit this script as is. If you do not have access to the workflow queue, you can submit to the cron queue, but you will have to alter the time request.