Skip to content

R Code Development

Konstantin Yaroshovets edited this page Oct 9, 2024 · 4 revisions

This page describes the interface for writing a custom study to run on Arachne datanode.

A custom study is contained in a single folder and uploaded to Arachne as a zip file. At a minimum this folder needs to have one R script that Arachne will run. This script is referred to as the entry point and is often called main.R or codeToRun.R but can have any name you wish. The entrypoint file is specified in the Arachne user interface when the code runs.

The folder you pass in to Arachne will be opened and the entrypoint R script will be executed. Any files you create in that folder will be included as output and can be downloaded after execution. Any output printed to standard out or standard error will be included in a log file in the study output.

Arachne will make several environment variables available to your R session that can be used to connect to the CDM Database. In your R script, make sure you check that the environment variables you require are available and populated (i.e. not the empty string "").

General requirements for analysis code

  1. Analysis code should be written in SQL or R or combination
  2. You should have a start file that will contain your starting code.
  3. It’s recommended to use OHDSI templates and project structure as you can observe in https://github.com/ohdsi-studies repository
  4. SQL code should be written in OHDSI SQL dialect (OHDSI's generic SQL dialect, very similar to MS T-SQL)

Helpful resources: https://ohdsi.github.io/TheBookOfOhdsi/SqlAndR.html#implementing-the-study-using-sql-and-r

Preparing OHDSI Analysis

Let’s say a developer has an analysis code grabbed from one of studies available on https://github.com/ohdsi-studies or downloaded from ATLAS (Estimation or Prediction).

Here you can see a typical analysis structure. We are interested in a template of the “start” script to configure analysis. Usually, it can be found in by this location: extras/CodeToRun.R.

CodeToRun.R file could be copied as Main.R and it will need some adjustments to make for ARACHNE DataNode.

As a next step, we should extend and configure Main.R file:

Package Installation Code

This piece of code should be added in the beginning of the Main.R file. It will install R project as a R package:

setwd("./")
tryCatch({
    install.packages(file.path("."), repos = NULL, type = "source", INSTALL_opts=c("--no-multiarch"))
}, finally = {})

Database Connection Properties

As the next step, developers must configure the database connection settings. The ARACHNE DataNode has a datasource connection configured already, thus all connection details will be available as environment variables:

  • DBMS_TYPE - Type of database dialect
  • CONNECTION_STRING - Database Connection String
  • DBMS_USERNAME - Database username
  • DBMS_PASSWORD - Database password
  • DBMS_SCHEMA - OMOP schema name (including vocabularies)
  • RESULT_SCHEMA - Results schema name (assuming DDL executed for ATLAS)
  • TARGET_SCHEMA - Target schema for analysis output
  • COHORT_TARGET_TABLE - Name of cohort table for analysis output
  • JDBC_DRIVER_PATH - path to JDBC driver

Having all these names, the developer can obtain all necessary connection details from the environment variables:

dbms <- Sys.getenv("DBMS_TYPE")
connectionString <- Sys.getenv("CONNECTION_STRING")
user <- Sys.getenv("DBMS_USERNAME")
pwd <- Sys.getenv("DBMS_PASSWORD")
cdmDatabaseSchema <- Sys.getenv("DBMS_SCHEMA")
resultsDatabaseSchema <- Sys.getenv("RESULT_SCHEMA")
cohortsDatabaseSchema <- Sys.getenv("TARGET_SCHEMA")
cohortTable <- Sys.getenv("COHORT_TARGET_TABLE")
driversPath <- (function(path) if (path == "") NULL else path)( Sys.getenv("JDBC_DRIVER_PATH") )


connectionDetails <- DatabaseConnector::createConnectionDetails(
dbms = dbms,
connectionString = connectionString,
user = user,
password = pwd,
pathToDriver = driversPath
)

Output Folder and Temporary Location

In most cases the developer will need to configure the output folder for his analysis. The developer has to define location and to create it in the current working directory. Results will be archived and returned as a ZIP file.

Here is how developer can configure path in R and afterwards execute the command to create it:

outputFolder <- file.path(getwd(), 'my_results')
dir.create(outputFolder)

By this code developer defined the “results” directory to be used as an “outputFolder” value. And it was created on the next step.

In case the developer needs to define some temporary location and there is no need in these files, the developer can use the Linux system /tmp directory. Let’s say the developer needs to have such folder and pass some location parameter into the code:

tmpFolder <- '/tmp/my_tmp_folder'
dir.create(tmpFolder)

Metadata File

You can add special file into your ZIP archive that will automatically populate fields in the submission form in ARACHNE DataNode. The application is looking for 2 possible names:

  • metadata.json or
  • execution-config.json

The content of the file should specify the following parameters:

{
    "analysisName": "Simvastatin",
    "analysisType": "CUSTOM",
    "runtimeEnvironmentName": "Default Runtime",
    "dockerRuntimeEnvironmentImage": "odysseusinc/r-hades:latest" 
    "entryPoint": "MyAnalysis/main.R",
    "studyName": "My study"
}

Where:

  • analysisName - Name of your analysis or package
  • analysisType - Type of the analysis in ARACHNE DataNode. Use CUSTOM for R code.
  • runtimeEnvironmentName - Name of the Tarball Runtime environment
  • dockerRuntimeEnvironmentImage - name and tag of Docker image with Runtime environment
  • entryPoint - Path inside ZIP archive to the main file to start execution of your code
  • studyName - Name of your study

Packing Analysis

Select all files and folders and archive them as a ZIP file. This archive developer will use it in ARACHNE DataNode for execution submission.

Examples

You can find examples following these links below: