-
Notifications
You must be signed in to change notification settings - Fork 1
R Environment Development
Arachne data node runs R code in pre-defined R environments. An R environment contains all the necessary R package and system dependencies to run the R code. This page describes the process of creating new R environments for Arachne.
Arachne's R environments are created using Dockerfiles. Generally the base image should be one of the Rocker versioned R images. From that base image additional R packages and system dependencies are added including database drivers.
Once you have created a dockerfile with the necessary dependencies to run your R code create build the image on you computer with the following command:
docker docker build -t {docker registry url}/{image name}:{tag} .
Make sure you run this command from the folder where the Dockerfile is located so that .
will refer to the local Dockerfile.
For example
docker build -t executionengine.azurecr.io/hades:2023q3 .
A docker registry is a remote host where docker images can be stored and distributed from. Many cloud vendors offer the ability to host Docker registries including https://hub.docker.com/. Once you have built the docker image, push it to your registry.
docker push {docker registry url}/{image name}:{tag}
docker push executionengine.azurecr.io/hades:2023q3
Be sure to give each new version of your image a new tag so it is uniquely identified by the image-tag combination.
If your registry requires a login you may need to run docker login
before pushing your image.
docker login -u <user> -p <password> executionengine.azurecr.io
You can also set up ssh keys to avoid logging in each time you push. Follow the vendor's
Testing your image is important. To spin up a local container on the command line based on your image for interactive tests use the command
docker run -it --rm {docker registry url}/{image name}:{tag}
docker run -it --rm executionengine.azurecr.io/hades:2023q3
The -it
flag tell Docker to provide an interactive terminal. --rm
will cause the container (a single instance of the image) to be removed after you exit from the container.
This command will start the container using the default command. To explicitly start the container in bash shell use
docker run -it --rm {docker registry url}/{image name}:{tag} bash
From inside the container start an R session by running R
. From R you can interactively test R code inside the container.
While interactive testing is a nice option to have we really want to have automated tests. A test should be similar to a custom R study or project. It will be a folder containing a main.R
script that is used as an entry point and possibly other files and folders needed to run the R code.
A simple test you can run is connecting to the CDM database and running a query.
# main.R script for a simple database connection test
library(DatabaseConnector)
# try to setup db connection
dbmsType <- Sys.getenv("DBMS_TYPE")
connString <- Sys.getenv("CONNECTION_STRING")
dbmsUser <- Sys.getenv("DBMS_USERNAME")
dbmsPwd <- Sys.getenv("DBMS_PASSWORD")
cdmSchema <- Sys.getenv("CDM_SCHEMA")
# write results to the /results folder
if (dbmsType != "" && connString != "" && dbmsUser != "" && cdmSchema != "") {
print("Setting up db connection")
conn <- DatabaseConnector::connect(dbms = dbmsType,
connectionString = connString,
user = dbmsUser,
password = dbmsPwd,
pathToDriver = "/opt/hades/jdbc_drivers")
personCount <- dbGetQuery(conn, paste0("SELECT COUNT(*) AS n FROM ", cdmSchema, ".person"))[[1]]
readr::write_lines(paste("Number of persons:", personCount), "/results/output.txt")
disconnect(conn)
print("test complete")
}
Next, commit this test to a github repository and then run this script inside a container based on your image.
{todo: add example}
Arachne supports a the use of tar.gz files containing an R environment. This allows Arachne to be used without Docker. The linux chroot
command is used to run R code in the tar.gz environment. To build a tar.gz file from a docker image that has been created for Arachne run the following commands on the machine with the docker image.
# start a container from the image
docker run -it --rm -d --name builder {docker registry url}/{image name}:{tag} bash
# tar the entire file system of the image
docker exec -it builder tar --exclude /tmp --exclude /proc --exclude /sys -czf /tmp/{image_tag}.tar.gz /
# copy the tarball to the host machine
docker cp builder:/tmp/{image_tag}.tar.gz /tmp/{image_tag}.tar.gz
docker stop builder
Now you have a tar.gz file that can be used in Arachne.
When installing Arachne data node you can specify options in the application's config.yml file. In the Arachne execution engine's config.yml file in the docker section replace the url parameter with the url to your custom docker registry. Then provide the name of the image for your study in the Arachne user interface.
docker:
enable: true
socket: unix:///var/run/docker.sock
registry:
url: https://registry-1.docker.io