-
Notifications
You must be signed in to change notification settings - Fork 1
R Code Development
This page describes the interface for writing a custom study to run on Arachne datanode.
A custom study is contained in a single folder and uploaded to Arachne as a zip file. At a minimum this folder needs to have one R script that Arachne will run. This script is referred to as the entry point and is often called main.R
or codeToRun.R
but can have any name you wish. The entrypoint file is specified in the Arachne user interface when the code runs.
The folder you pass in to Arachne will be opened and the entrypoint R script will be executed. Any files you create in that folder will be included as output and can be downloaded after execution. Any output printed to standard out or standard error will be included in a log file in the study output.
Arachne will make several environment variables available to your R session that can be used to connect to the CDM Database. In your R script, make sure you check that the environment variables you require are available and populated (i.e. not the empty string "").
- Analysis code should be written in SQL or R or combination
- You should have a start file that will contain your starting code.
- It’s recommended to use OHDSI templates and project structure as you can observe in https://github.com/ohdsi-studies repository
- SQL code should be written in OHDSI SQL dialect (OHDSI's generic SQL dialect, very similar to MS T-SQL)
Helpful resources: https://ohdsi.github.io/TheBookOfOhdsi/SqlAndR.html#implementing-the-study-using-sql-and-r
Let’s say a developer has an analysis code grabbed from one of studies available on https://github.com/ohdsi-studies or downloaded from ATLAS (Estimation or Prediction).
Here you can see a typical analysis structure. We are interested in a template of the “start” script to configure analysis. Usually, it can be found in by this location: extras/CodeToRun.R.
CodeToRun.R file could be copied as Main.R and it will need some adjustments to make for ARACHNE DataNode.
As a next step, we should extend and configure Main.R file:
This piece of code should be added in the beginning of the Main.R file. It will install R project as a R package:
setwd("./")
tryCatch({
install.packages(file.path("."), repos = NULL, type = "source", INSTALL_opts=c("--no-multiarch"))
}, finally = {})
As the next step, developers must configure the database connection settings. The ARACHNE DataNode has a datasource connection configured already, thus all connection details will be available as environment variables:
-
DBMS_TYPE
- Type of database dialect -
CONNECTION_STRING
- Database Connection String -
DBMS_USERNAME
- Database username -
DBMS_PASSWORD
- Database password -
DBMS_SCHEMA
- OMOP schema name (including vocabularies) -
RESULT_SCHEMA
- Results schema name (assuming DDL executed for ATLAS) -
TARGET_SCHEMA
- Target schema for analysis output -
COHORT_TARGET_TABLE
- Name of cohort table for analysis output -
JDBC_DRIVER_PATH
- path to JDBC driver
Having all these names, the developer can obtain all necessary connection details from the environment variables:
dbms <- Sys.getenv("DBMS_TYPE")
connectionString <- Sys.getenv("CONNECTION_STRING")
user <- Sys.getenv("DBMS_USERNAME")
pwd <- Sys.getenv("DBMS_PASSWORD")
cdmDatabaseSchema <- Sys.getenv("DBMS_SCHEMA")
resultsDatabaseSchema <- Sys.getenv("RESULT_SCHEMA")
cohortsDatabaseSchema <- Sys.getenv("TARGET_SCHEMA")
cohortTable <- Sys.getenv("COHORT_TARGET_TABLE")
driversPath <- (function(path) if (path == "") NULL else path)( Sys.getenv("JDBC_DRIVER_PATH") )
connectionDetails <- DatabaseConnector::createConnectionDetails(
dbms = dbms,
connectionString = connectionString,
user = user,
password = pwd,
pathToDriver = driversPath
)
In most cases the developer will need to configure the output folder for his analysis. The developer has to define location and to create it in the current working directory. Results will be archived and returned as a ZIP file.
Here is how developer can configure path in R and afterwards execute the command to create it:
outputFolder <- file.path(getwd(), 'my_results')
dir.create(outputFolder)
By this code developer defined the “results” directory to be used as an “outputFolder” value. And it was created on the next step.
In case the developer needs to define some temporary location and there is no need in these files, the developer can use the Linux system /tmp directory. Let’s say the developer needs to have such folder and pass some location parameter into the code:
tmpFolder <- '/tmp/my_tmp_folder'
dir.create(tmpFolder)
You can add special file into your ZIP archive that will automatically populate fields in the submission form in ARACHNE DataNode. The application is looking for 2 possible names:
-
metadata.json
or execution-config.json
The content of the file should specify the following parameters:
{
"analysisName": "Simvastatin",
"analysisType": "CUSTOM",
"runtimeEnvironmentName": "Default Runtime",
"dockerRuntimeEnvironmentImage": "odysseusinc/r-hades:latest"
"entryPoint": "MyAnalysis/main.R",
"studyName": "My study"
}
Where:
-
analysisName
- Name of your analysis or package -
analysisType
- Type of the analysis in ARACHNE DataNode. UseCUSTOM
for R code. -
runtimeEnvironmentName
- Name of the Tarball Runtime environment -
dockerRuntimeEnvironmentImage
- name and tag of Docker image with Runtime environment -
entryPoint
- Path inside ZIP archive to the main file to start execution of your code -
studyName
- Name of your study
Select all files and folders and archive them as a ZIP file. This archive developer will use it in ARACHNE DataNode for execution submission.
You can find examples following these links below: