Skip to content

Commit

Permalink
Merge pull request #20 from PediatricOpenTargets/logstar/gene-all-cancer
Browse files Browse the repository at this point in the history
📫Implement `/tpm/gene-all-cancer/json` and `/tpm/gene-all-cancer/plot` API endpoints
  • Loading branch information
logstar authored Sep 14, 2021
2 parents 7dce737 + acb0b6f commit b265d7b
Show file tree
Hide file tree
Showing 85 changed files with 1,349 additions and 667 deletions.
37 changes: 37 additions & 0 deletions .github/linters/.markdown-lint.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---
###########################
###########################
## Markdown Linter rules ##
###########################
###########################

# Linter rules doc:
# - https://github.com/DavidAnson/markdownlint
#
# Note:
# To comment out a single error:
# <!-- markdownlint-disable -->
# any violations you want
# <!-- markdownlint-restore -->
#

###############
# Rules by id #
###############
MD004: false # Unordered list style
MD007:
indent: 2 # Unordered list indentation
MD013: false # Line length
MD026:
punctuation: ".,;:!。,;:" # List of not allowed
MD029: false # Ordered list item prefix
MD033: false # Allow inline HTML
MD036: false # Emphasis used instead of a heading
MD024:
allow_different_nesting: true # heading duplication is allowed for
# non-sibling headings

#################
# Rules by tags #
#################
blank_lines: false # Error on blank lines
4 changes: 2 additions & 2 deletions .github/workflows/linter.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
with:
fetch-depth: 0
- name: Lint Code Base
uses: github/super-linter@v4.6.2
uses: github/super-linter@v4.7.1
env:
VALIDATE_ALL_CODEBASE: false
DEFAULT_BRANCH: main
Expand All @@ -35,5 +35,5 @@ jobs:

- name: Lint

run: lintr::lint_dir(linters = lintr::with_defaults(object_name_linter = NULL, assignment_linter = NULL, line_length_linter = NULL, spaces_left_parentheses_linter = NULL, commented_code_linter = NULL, object_length_linter = NULL, cyclocomp_linter = lintr::cyclocomp_linter(complexity_limit = 25L)))
run: lintr::lint_dir(linters = lintr::with_defaults(object_name_linter = NULL, assignment_linter = NULL, line_length_linter = NULL, spaces_left_parentheses_linter = NULL, commented_code_linter = NULL, object_length_linter = NULL, cyclocomp_linter = lintr::cyclocomp_linter(complexity_limit = 35L)))
shell: Rscript {0}
2 changes: 1 addition & 1 deletion OpenPedCan-analysis
99 changes: 51 additions & 48 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,42 +1,49 @@
# OpenPedCan-api
# OpenPedCan-api <!-- omit in toc -->

[![GitHub Super-Linter](https://github.com/PediatricOpenTargets/OpenPedCan-api/workflows/Lint%20Code%20Base/badge.svg)](https://github.com/marketplace/actions/super-linter)

`OpenPedCan-api` implements OpenPedCan (Open Pediatric Cancers) project public API (application programming interface) to transfer [OpenPedCan-analysis](https://github.com/PediatricOpenTargets/OpenPedCan-analysis) results and plots via HTTP, which is publicly available at <https://openpedcan-api-dev.d3b.io/__docs__/>.

- [OpenPedCan-api](#openpedcan-api)
- [API endpoint specifications](#api-endpoint-specifications)
- [Deploy `OpenPedCan-api`](#deploy-openpedcan-api)
- [Test run `OpenPedCan-api` server locally](#test-run-openpedcan-api-server-locally)
- [`git clone` `OpenPedCan-api` repository](#git-clone-openpedcan-api-repository)
- [Run static R code analysis using R package `lintr`](#run-static-r-code-analysis-using-r-package-lintr)
- [(Optional) Build data model files](#optional-build-data-model-files)
- [Build `OpenPedCan-api` docker image](#build-openpedcan-api-docker-image)
- [Run `OpenPedCan-api` docker image](#run-openpedcan-api-docker-image)
- [Test `OpenPedCan-api` server using `curl`](#test-openpedcan-api-server-using-curl)
- [API system design](#api-system-design)
- [Data model layer](#data-model-layer)
- [Analysis logic layer](#analysis-logic-layer)
- [API layer](#api-layer)
- [HTTP server layer](#http-server-layer)
- [Testing layer](#testing-layer)
- [Deployment layer](#deployment-layer)
- [API Development road map](#api-development-road-map)

## API endpoint specifications

<https://openpedcan-api-dev.d3b.io/__docs__/> specifies the following API endpoint attributes.
`OpenPedCan-api` implements OpenPedCan (Open Pediatric Cancers) project public API (application programming interface) to transfer [OpenPedCan-analysis](https://github.com/PediatricOpenTargets/OpenPedCan-analysis) results and plots via HTTP, which is publicly available at <https://openpedcan-api-qa.d3b.io/__docs__/>.

- [1. API endpoint specifications](#1-api-endpoint-specifications)
- [2. Deploy `OpenPedCan-api`](#2-deploy-openpedcan-api)
- [3. Test run `OpenPedCan-api` server locally](#3-test-run-openpedcan-api-server-locally)
- [3.1. `git clone` `OpenPedCan-api` repository](#31-git-clone-openpedcan-api-repository)
- [3.2. Run static R code analysis using R package `lintr`](#32-run-static-r-code-analysis-using-r-package-lintr)
- [3.3. (Optional) Build data model files](#33-optional-build-data-model-files)
- [3.4. Build `OpenPedCan-api` docker image](#34-build-openpedcan-api-docker-image)
- [3.5. Run `OpenPedCan-api` docker image](#35-run-openpedcan-api-docker-image)
- [3.6. Test `OpenPedCan-api` server using `curl`](#36-test-openpedcan-api-server-using-curl)
- [4. API system design](#4-api-system-design)
- [4.1. Data model layer](#41-data-model-layer)
- [4.2. Analysis logic layer](#42-analysis-logic-layer)
- [4.3. API layer](#43-api-layer)
- [4.4. HTTP server layer](#44-http-server-layer)
- [4.5. Testing layer](#45-testing-layer)
- [4.6. Deployment layer](#46-deployment-layer)
- [5. API Development roadmap](#5-api-development-roadmap)

## 1. API endpoint specifications

<https://openpedcan-api-qa.d3b.io/__docs__/> specifies the following API endpoint attributes.

- HTTP request method
- Path
- Parameters
- Response media type

## Deploy `OpenPedCan-api`
## 2. Deploy `OpenPedCan-api`

Following is a comment by @blackdenc at <https://github.com/PediatricOpenTargets/OpenPedCan-api/issues/5#issuecomment-904824004>.
According to comments and messages by @blackdenc :

> As far as building in Jenkins, we build the container and tag it, then push it to ECR and give that tag to the ECS task definition at runtime.
`OpenPedCan-api` is deployed with the following steps:

- Build and tag `OpenPedCan-api` docker image using `Dockerfile`.
- Push the built image to Amazon Elastic Container Registry (ECR).
- Pass the ECR docker image tag to Amazon Elastic Container Service (ECS) Fargate (?) task definition at runtime.

<https://openpedcan-api-qa.d3b.io/__docs__/> is the QA server that will only deploy the `main` branch of the repository.

<https://openpedcan-api-dev.d3b.io/__docs__/> is the DEV server that will deploy any new branch of the repository, and the QA environment will remain un-changed until a new commit is merged to main.

`Dockerfile` builds the `OpenPedCan-api` docker image to be run on Amazon ECS.

Expand All @@ -45,7 +52,7 @@ To deploy without using docker:
- `Rscript --vanilla main.R` needs to be run with the same working directory as the last `WORKDIR` path in `Dockerfile` prior to the docker instruction `ENTRYPOINT ["Rscript", "--vanilla", "main.R"]`.
- Build or download `db` files according to the commands in `Dockerfile`.

## Test run `OpenPedCan-api` server locally
## 3. Test run `OpenPedCan-api` server locally

Test run `OpenPedCan-api` server with the following steps:

Expand All @@ -72,7 +79,7 @@ R package jsonlite 1.7.2
R package lintr 2.0.1
```

### `git clone` `OpenPedCan-api` repository
### 3.1. `git clone` `OpenPedCan-api` repository

```bash
# Change URL if a fork repo needs to be used
Expand All @@ -86,15 +93,15 @@ git checkout -t origin/the-branch-that-needs-to-be-tested
# git checkout COMMIT_HASH_ID
```

### Run static R code analysis using R package `lintr`
### 3.2. Run static R code analysis using R package `lintr`

```bash
./tests/run_r_lintr.sh
```

If there is any syntax error, comment in the GitHub pull request with the full error messages.

### (Optional) Build data model files
### 3.3. (Optional) Build data model files

Use the following bash command to build data model files locally to the `db` directory. This step takes > 25GB memory.

Expand All @@ -108,7 +115,7 @@ Use the following bash command to build data model files locally to the `db` dir
- Copy data model files from `open-ped-can-api-build-db` docker image to host `db` directory.
- Check `sha256sum` for data model files.

### Build `OpenPedCan-api` docker image
### 3.4. Build `OpenPedCan-api` docker image

Use the following bash commands to Build `OpenPedCan-api` docker image.

Expand All @@ -126,30 +133,30 @@ Use the following bash commands to Build `OpenPedCan-api` docker image.

Note for developers: For `docker build` with docker cache and remote pre-built data model files, pass `--build-arg CACHE_DATE=$(date +%s)` to the `docker build` command to use the latest remote data model files on each build.

### Run `OpenPedCan-api` docker image
### 3.5. Run `OpenPedCan-api` docker image

```bash
docker run --rm -p 8082:80 open-ped-can-api
```

Note for developers: To run extra R `stopifnot(...)` assertions, pass `-e DEBUG=1` to `docker run` command.

### Test `OpenPedCan-api` server using `curl`
### 3.6. Test `OpenPedCan-api` server using `curl`

Test the running server with the following command.

```bash
./tests/curl_test_endpoints.sh
```

`tests/curl_test_endpoints.sh` sends multiple HTTP requests to `localhost:8082` by default, with the following steps. The port number of `localhost` can be changed by passing the `bash` environment variable `API_PORT` with a different value, but there has to be a `OpenPedCan-api` server listening on the port.
`tests/curl_test_endpoints.sh` sends multiple HTTP requests to `localhost:8082` by default, with the following steps. The port number of `localhost` can be changed by passing the `bash` environment variable `LOCAL_API_HOST_PORT` with a different value, but there has to be a `OpenPedCan-api` server listening on the port. The API HTTP server host can be changed to <https://openpedcan-api-qa.d3b.io/__docs__/> or <https://openpedcan-api-dev.d3b.io/__docs__/>, by passing environment variable `API_HOST=qa` or `API_HOST=dev` respectively.

- Send an HTTP request using `curl`.
- Output the HTTP response body to `tests/http_response_output_files/png` or `tests/http_response_output_files/json`.
- Print HTTP response status code, content type, and run time.
- If response body content type is JSON, convert the JSON file to TSV file in `tests/results`.

## API system design
## 4. API system design

The `OpenPedCan-api` server system has the following layers:

Expand All @@ -164,23 +171,23 @@ For more details about implementations, see [Test run `OpenPedCan-api` server lo

The root directory of this repository should only contain starting points of different layer and configuration files.

### Data model layer
### 4.1. Data model layer

`db` directory contains files that implement the data model layer.

`db/build_db.sh` builds data model files that are used by analysis logic layer.

`db/load_db.sh` loads local or remote pre-built data model files to the HTTP server layer.

### Analysis logic layer
### 4.2. Analysis logic layer

`src` directory contains files that implement the analysis logic layer.

### API layer
### 4.3. API layer

Discussions in PedOT meetings, Slack work space, GitHub issues, etc specify the API layer.

### HTTP server layer
### 4.4. HTTP server layer

`main.R` runs the `OpenPedCan-api` HTTP server. The HTTP server is implemented using [libuv](http://docs.libuv.org/en/stable/design.html) and [http-parser](https://github.com/nodejs/http-parser) and called by [R package plumber](https://github.com/rstudio/plumber).

Expand All @@ -192,22 +199,18 @@ The API HTTP server handles every HTTP request [sequentially](https://www.rplumb
- Convert the return value of the endpoint R function to defined response content type, e.g. JSON and PNG.
- Send HTTP response to the request address.

### Testing layer
### 4.5. Testing layer

The `tests` directory contain all tools and code for testing the API server. `tests/http_response_output_files` contains the API server response plots and tables. `tests/results` contains results generated during test run.

### Deployment layer
### 4.6. Deployment layer

Jenkinsfile and Dockerfile specify the procedures to deploy the `OpenPedCan-api` server.

## API Development road map
## 5. API Development roadmap

Implementation action items:

- Implement endpoint `GET /tpm/gene-all-cancer/json`.
- Implement endpoint `GET /tpm/gene-all-cancer/plot`.
- Remove gene down-sampling procedure in `db/tpm_data_lists.R`, in order to include all genes.
- Build data model in another R process. Load the data model in the API server R process. This may reduce RAM usage.
- Build data model into a Postgres database. Implement/refactor R functions to interact with the Postgres database. This will reduce RAM usage. This may reduce run time.
- Add unit tests to R functions.
- Send more informative response HTTP status code. Currently, all failures use status code 500.
Expand Down
35 changes: 35 additions & 0 deletions changelog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# OpenPedCan-api

## v0.2.0-alpha

### Changed

- Updated data model using [`OpenPedCan-analysis` v9 release data](https://github.com/PediatricOpenTargets/OpenPedCan-analysis/pull/103).
- Enabled endpoint Cross-Origin Resource Sharing (CORS) by default.
- Changed `/tpm/gene-disease-gtex` boxplot title from "Disease vs. GTEx tissue bulk gene expression" to "Primary tumor vs GTEx tissue bulk gene expression".
- Changed "cohort =" to "Dataset =" in boxplot and summary table x-axis labels.
- Changed "cohort" to "Dataset" in boxplot summary table columns.
- Increased minimum number of samples required per `Disease` or `GTEx_tissue_subgroup` from 1 to 3.
- Rotated boxplot x-axis labels by 45 degrees.
- Changed `tests/curl_test_endpoints.sh` variable `API_PORT` to `LOCAL_API_HOST_PORT`.
- Updated `README.md`.

### Added

- Implemented HTTP GET method for `/tpm/gene-all-cancer/json` API endpoint.
- Implemented HTTP GET method for `/tpm/gene-all-cancer/plot` API endpoint.
- Added `cors` filter to enable endpoint CORS by default.
- Added `API_HOST` variable in `tests/curl_test_endpoints.sh`, in order to test DEV and QA hosts.
- Added this `changelog.md`.

## v0.1.0-alpha

### Added

- HTTP GET method for `/tpm/gene-disease-gtex/json` API endpoint.
- HTTP GET method for `/tpm/gene-disease-gtex/plot` API endpoint.
- Tools for building API data model in `db` directory.
- Tools for testing local API HTTP server, in `tests` directory.
- `Dockerfile` for running API HTTP server.
- This `README.md` that specifies development procedure, system design, and development roadmap.
- Git submodule [`OpenPedCan-analysis`](https://github.com/PediatricOpenTargets/OpenPedCan-analysis).
4 changes: 2 additions & 2 deletions db/build_db.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@ cd ..
echo "Download OpenPedCan-analysis data release..."
# The commit ID to checkout to build data model.
#
# 61e23154a34e1d8b3fc1c50a67dd8f79c2067776 points to v8 release with updated
# 96132ae1e7485d9ab129380898fac5e255ccb36f points to v9 release with updated
# OpenPedCan-analysis/analyses/long-format-table-utils/annotator.
OPEN_PED_CAN_ANALYSIS_COMMIT="61e23154a34e1d8b3fc1c50a67dd8f79c2067776"
OPEN_PED_CAN_ANALYSIS_COMMIT="96132ae1e7485d9ab129380898fac5e255ccb36f"

# If submodule repo url changes, e.g. rename, this will update the URL according
# to the one in .gitmodules.
Expand Down
2 changes: 1 addition & 1 deletion db/sha256sum.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
173554ee34308ccd63bd4ce2d408a87d24b7dc925b539f172f150dce001700d5 tpm_data_lists.rds
f381ed881caec94a0b9a4be8fdb800fa69c3b9d2f3b4e44766ebe42117e2ba5b tpm_data_lists.rds
45 changes: 35 additions & 10 deletions src/get_gene_tpm_boxplot.R
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,9 @@ get_gene_tpm_boxplot <- function(gene_tpm_boxplot_tbl) {
stopifnot(is.factor(uniq_x_label_vec))
stopifnot(all(!is.na(uniq_x_label_vec)))

sample_type_vec <- unique(gene_tpm_boxplot_tbl$sample_type)
stopifnot(is.character(sample_type_vec))
stopifnot(all(sample_type_vec %in% c("disease", "normal")))
uniq_sample_type_vec <- unique(gene_tpm_boxplot_tbl$sample_type)
stopifnot(is.character(uniq_sample_type_vec))
stopifnot(all(uniq_sample_type_vec %in% c("disease", "normal")))

efo_id_vec <- purrr::discard(unique(gene_tpm_boxplot_tbl$EFO), is.na)
stopifnot(is.character(efo_id_vec))
Expand All @@ -51,15 +51,41 @@ get_gene_tpm_boxplot <- function(gene_tpm_boxplot_tbl) {
if (length(gtex_subgroup_vec) > 0) {
title <- paste(
paste0(gene_symbol, " (", ensg_id, ")"),
"Disease vs. GTEx tissue bulk gene expression",
"Primary tumor vs GTEx tissue bulk gene expression",
sep = "\n")
} else {
title <- paste(
paste0(gene_symbol, " (", ensg_id, ")"),
"Disease tissue bulk gene expression",
"Primary tumor tissue bulk gene expression",
sep = "\n")
}

if (identical(length(uniq_sample_type_vec), 1L)) {
# Only one sample type.
#
# Use grey for either one.
box_fill_colors = c("disease" = "grey80", "normal" = "grey80")
} else {
# More than one saple types.
#
# Use red for diease and grey for normal.
box_fill_colors = c("disease" = "red3", "normal" = "grey80")
}

# The x-axis labels are long and rotated 45 degrees, so they are out of the
# plot in the default margin. Increase right margin to fit all text.
plot_margin <- ggplot2::theme_get()$plot.margin

if (!identical(length(plot_margin), 4L)) {
plot_margin <- rep(grid::unit(x = 5.5, units = "points"), 4)
}

rightmost_x_label <- dplyr::last(
levels(gene_tpm_boxplot_tbl$x_labels), default = "")
# increase right margin by the width of the last x label * 0.71
plot_margin[2] <- grid::unit(
x = 0.8, units = "strwidth", data = rightmost_x_label)

gene_tpm_boxplot <- ggplot2::ggplot(gene_tpm_boxplot_tbl,
ggplot2::aes(x = x_labels, y = TPM,
fill = sample_type)) +
Expand All @@ -69,12 +95,11 @@ get_gene_tpm_boxplot <- function(gene_tpm_boxplot_tbl) {
ggplot2::ylab("TPM") +
ggplot2::xlab("") +
ggplot2_publication_theme(base_size = 12) +
ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 90,
vjust = 0.5,
hjust = 1)) +
ggplot2::theme(
axis.text.x = ggplot2::element_text(angle = -45, vjust = 1, hjust = 0),
plot.margin = plot_margin) +
ggplot2::ggtitle(title) +
ggplot2::scale_fill_manual(values = c("disease" = "red3",
"normal" = "grey80")) +
ggplot2::scale_fill_manual(values = box_fill_colors) +
ggplot2::guides(fill = "none")

return(gene_tpm_boxplot)
Expand Down
Loading

0 comments on commit b265d7b

Please sign in to comment.