Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📫Implement /tpm/gene-all-cancer/json and /tpm/gene-all-cancer/plot API endpoints #20

Merged
merged 24 commits into from
Sep 14, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
466cbc0
Implement GET gene-all-cancer table and plot endpoints
logstar Aug 30, 2021
3c18342
Update README.md
logstar Aug 31, 2021
34394f1
Merge branch 'main' into logstar/gene-all-cancer
logstar Sep 1, 2021
23e43b7
Rerun curl tests with new x label format
logstar Sep 1, 2021
5908a3f
Rotate boxplot x labels by 45 degrees
logstar Sep 1, 2021
2d3cf67
Update README.md
logstar Sep 1, 2021
dd0496a
Change min number of samples per group from 1 to 3
logstar Sep 1, 2021
3c2773d
Change cohort to Dataset for boxplot and summary table
logstar Sep 2, 2021
0d27089
Change Disease to Primary tumor in boxplot titles
logstar Sep 2, 2021
ef6edec
Update OpenPedCan-analysis to v9 release
logstar Sep 2, 2021
fb9dc06
Update data model using OpenPedCan-analysis v9 release
logstar Sep 2, 2021
3e48af8
Add changelog.md
logstar Sep 2, 2021
bffdd15
Increment API version
logstar Sep 2, 2021
c3a4212
Update README.md
logstar Sep 2, 2021
2ea1a2b
Update changelog.md
logstar Sep 2, 2021
fc04f9b
Update .github/workflows/linter.yml
logstar Sep 2, 2021
9744f41
Add .markdown-lint.yml
logstar Sep 2, 2021
70c1652
Move .markdown-lint.yml
logstar Sep 2, 2021
0f8f2d7
Update README.md
logstar Sep 2, 2021
cc60259
Add curl test options for QA and DEV hosts
logstar Sep 2, 2021
82af416
Enable endpoint Cross-Origin Resource Sharing by default
logstar Sep 9, 2021
1c20da3
Update changelog.md
logstar Sep 9, 2021
47cc4ca
Clarify endpoint parameters
logstar Sep 12, 2021
acb0b6f
Remove /plot and /sum testing endpoints
logstar Sep 13, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions .github/linters/.markdown-lint.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---
###########################
###########################
## Markdown Linter rules ##
###########################
###########################

# Linter rules doc:
# - https://github.com/DavidAnson/markdownlint
#
# Note:
# To comment out a single error:
# <!-- markdownlint-disable -->
# any violations you want
# <!-- markdownlint-restore -->
#

###############
# Rules by id #
###############
MD004: false # Unordered list style
MD007:
indent: 2 # Unordered list indentation
MD013: false # Line length
MD026:
punctuation: ".,;:!。,;:" # List of not allowed
MD029: false # Ordered list item prefix
MD033: false # Allow inline HTML
MD036: false # Emphasis used instead of a heading
MD024:
allow_different_nesting: true # heading duplication is allowed for
# non-sibling headings

#################
# Rules by tags #
#################
blank_lines: false # Error on blank lines
4 changes: 2 additions & 2 deletions .github/workflows/linter.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
with:
fetch-depth: 0
- name: Lint Code Base
uses: github/super-linter@v4.6.2
uses: github/super-linter@v4.7.1
env:
VALIDATE_ALL_CODEBASE: false
DEFAULT_BRANCH: main
Expand All @@ -35,5 +35,5 @@ jobs:

- name: Lint

run: lintr::lint_dir(linters = lintr::with_defaults(object_name_linter = NULL, assignment_linter = NULL, line_length_linter = NULL, spaces_left_parentheses_linter = NULL, commented_code_linter = NULL, object_length_linter = NULL, cyclocomp_linter = lintr::cyclocomp_linter(complexity_limit = 25L)))
run: lintr::lint_dir(linters = lintr::with_defaults(object_name_linter = NULL, assignment_linter = NULL, line_length_linter = NULL, spaces_left_parentheses_linter = NULL, commented_code_linter = NULL, object_length_linter = NULL, cyclocomp_linter = lintr::cyclocomp_linter(complexity_limit = 35L)))
shell: Rscript {0}
2 changes: 1 addition & 1 deletion OpenPedCan-analysis
99 changes: 51 additions & 48 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,42 +1,49 @@
# OpenPedCan-api
# OpenPedCan-api <!-- omit in toc -->

[![GitHub Super-Linter](https://github.com/PediatricOpenTargets/OpenPedCan-api/workflows/Lint%20Code%20Base/badge.svg)](https://github.com/marketplace/actions/super-linter)

`OpenPedCan-api` implements OpenPedCan (Open Pediatric Cancers) project public API (application programming interface) to transfer [OpenPedCan-analysis](https://github.com/PediatricOpenTargets/OpenPedCan-analysis) results and plots via HTTP, which is publicly available at <https://openpedcan-api-dev.d3b.io/__docs__/>.

- [OpenPedCan-api](#openpedcan-api)
- [API endpoint specifications](#api-endpoint-specifications)
- [Deploy `OpenPedCan-api`](#deploy-openpedcan-api)
- [Test run `OpenPedCan-api` server locally](#test-run-openpedcan-api-server-locally)
- [`git clone` `OpenPedCan-api` repository](#git-clone-openpedcan-api-repository)
- [Run static R code analysis using R package `lintr`](#run-static-r-code-analysis-using-r-package-lintr)
- [(Optional) Build data model files](#optional-build-data-model-files)
- [Build `OpenPedCan-api` docker image](#build-openpedcan-api-docker-image)
- [Run `OpenPedCan-api` docker image](#run-openpedcan-api-docker-image)
- [Test `OpenPedCan-api` server using `curl`](#test-openpedcan-api-server-using-curl)
- [API system design](#api-system-design)
- [Data model layer](#data-model-layer)
- [Analysis logic layer](#analysis-logic-layer)
- [API layer](#api-layer)
- [HTTP server layer](#http-server-layer)
- [Testing layer](#testing-layer)
- [Deployment layer](#deployment-layer)
- [API Development road map](#api-development-road-map)

## API endpoint specifications

<https://openpedcan-api-dev.d3b.io/__docs__/> specifies the following API endpoint attributes.
`OpenPedCan-api` implements OpenPedCan (Open Pediatric Cancers) project public API (application programming interface) to transfer [OpenPedCan-analysis](https://github.com/PediatricOpenTargets/OpenPedCan-analysis) results and plots via HTTP, which is publicly available at <https://openpedcan-api-qa.d3b.io/__docs__/>.

- [1. API endpoint specifications](#1-api-endpoint-specifications)
- [2. Deploy `OpenPedCan-api`](#2-deploy-openpedcan-api)
- [3. Test run `OpenPedCan-api` server locally](#3-test-run-openpedcan-api-server-locally)
- [3.1. `git clone` `OpenPedCan-api` repository](#31-git-clone-openpedcan-api-repository)
- [3.2. Run static R code analysis using R package `lintr`](#32-run-static-r-code-analysis-using-r-package-lintr)
- [3.3. (Optional) Build data model files](#33-optional-build-data-model-files)
- [3.4. Build `OpenPedCan-api` docker image](#34-build-openpedcan-api-docker-image)
- [3.5. Run `OpenPedCan-api` docker image](#35-run-openpedcan-api-docker-image)
- [3.6. Test `OpenPedCan-api` server using `curl`](#36-test-openpedcan-api-server-using-curl)
- [4. API system design](#4-api-system-design)
- [4.1. Data model layer](#41-data-model-layer)
- [4.2. Analysis logic layer](#42-analysis-logic-layer)
- [4.3. API layer](#43-api-layer)
- [4.4. HTTP server layer](#44-http-server-layer)
- [4.5. Testing layer](#45-testing-layer)
- [4.6. Deployment layer](#46-deployment-layer)
- [5. API Development roadmap](#5-api-development-roadmap)

## 1. API endpoint specifications

<https://openpedcan-api-qa.d3b.io/__docs__/> specifies the following API endpoint attributes.

- HTTP request method
- Path
- Parameters
- Response media type

## Deploy `OpenPedCan-api`
## 2. Deploy `OpenPedCan-api`

Following is a comment by @blackdenc at <https://github.com/PediatricOpenTargets/OpenPedCan-api/issues/5#issuecomment-904824004>.
According to comments and messages by @blackdenc :

> As far as building in Jenkins, we build the container and tag it, then push it to ECR and give that tag to the ECS task definition at runtime.
`OpenPedCan-api` is deployed with the following steps:

- Build and tag `OpenPedCan-api` docker image using `Dockerfile`.
- Push the built image to Amazon Elastic Container Registry (ECR).
- Pass the ECR docker image tag to Amazon Elastic Container Service (ECS) Fargate (?) task definition at runtime.

<https://openpedcan-api-qa.d3b.io/__docs__/> is the QA server that will only deploy the `main` branch of the repository.

<https://openpedcan-api-dev.d3b.io/__docs__/> is the DEV server that will deploy any new branch of the repository, and the QA environment will remain un-changed until a new commit is merged to main.

`Dockerfile` builds the `OpenPedCan-api` docker image to be run on Amazon ECS.

Expand All @@ -45,7 +52,7 @@ To deploy without using docker:
- `Rscript --vanilla main.R` needs to be run with the same working directory as the last `WORKDIR` path in `Dockerfile` prior to the docker instruction `ENTRYPOINT ["Rscript", "--vanilla", "main.R"]`.
- Build or download `db` files according to the commands in `Dockerfile`.

## Test run `OpenPedCan-api` server locally
## 3. Test run `OpenPedCan-api` server locally

Test run `OpenPedCan-api` server with the following steps:

Expand All @@ -72,7 +79,7 @@ R package jsonlite 1.7.2
R package lintr 2.0.1
```

### `git clone` `OpenPedCan-api` repository
### 3.1. `git clone` `OpenPedCan-api` repository

```bash
# Change URL if a fork repo needs to be used
Expand All @@ -86,15 +93,15 @@ git checkout -t origin/the-branch-that-needs-to-be-tested
# git checkout COMMIT_HASH_ID
```

### Run static R code analysis using R package `lintr`
### 3.2. Run static R code analysis using R package `lintr`

```bash
./tests/run_r_lintr.sh
```

If there is any syntax error, comment in the GitHub pull request with the full error messages.

### (Optional) Build data model files
### 3.3. (Optional) Build data model files

Use the following bash command to build data model files locally to the `db` directory. This step takes > 25GB memory.

Expand All @@ -108,7 +115,7 @@ Use the following bash command to build data model files locally to the `db` dir
- Copy data model files from `open-ped-can-api-build-db` docker image to host `db` directory.
- Check `sha256sum` for data model files.

### Build `OpenPedCan-api` docker image
### 3.4. Build `OpenPedCan-api` docker image

Use the following bash commands to Build `OpenPedCan-api` docker image.

Expand All @@ -126,30 +133,30 @@ Use the following bash commands to Build `OpenPedCan-api` docker image.

Note for developers: For `docker build` with docker cache and remote pre-built data model files, pass `--build-arg CACHE_DATE=$(date +%s)` to the `docker build` command to use the latest remote data model files on each build.

### Run `OpenPedCan-api` docker image
### 3.5. Run `OpenPedCan-api` docker image

```bash
docker run --rm -p 8082:80 open-ped-can-api
```

Note for developers: To run extra R `stopifnot(...)` assertions, pass `-e DEBUG=1` to `docker run` command.

### Test `OpenPedCan-api` server using `curl`
### 3.6. Test `OpenPedCan-api` server using `curl`

Test the running server with the following command.

```bash
./tests/curl_test_endpoints.sh
```

`tests/curl_test_endpoints.sh` sends multiple HTTP requests to `localhost:8082` by default, with the following steps. The port number of `localhost` can be changed by passing the `bash` environment variable `API_PORT` with a different value, but there has to be a `OpenPedCan-api` server listening on the port.
`tests/curl_test_endpoints.sh` sends multiple HTTP requests to `localhost:8082` by default, with the following steps. The port number of `localhost` can be changed by passing the `bash` environment variable `LOCAL_API_HOST_PORT` with a different value, but there has to be a `OpenPedCan-api` server listening on the port. The API HTTP server host can be changed to <https://openpedcan-api-qa.d3b.io/__docs__/> or <https://openpedcan-api-dev.d3b.io/__docs__/>, by passing environment variable `API_HOST=qa` or `API_HOST=dev` respectively.

- Send an HTTP request using `curl`.
- Output the HTTP response body to `tests/http_response_output_files/png` or `tests/http_response_output_files/json`.
- Print HTTP response status code, content type, and run time.
- If response body content type is JSON, convert the JSON file to TSV file in `tests/results`.

## API system design
## 4. API system design

The `OpenPedCan-api` server system has the following layers:

Expand All @@ -164,23 +171,23 @@ For more details about implementations, see [Test run `OpenPedCan-api` server lo

The root directory of this repository should only contain starting points of different layer and configuration files.

### Data model layer
### 4.1. Data model layer

`db` directory contains files that implement the data model layer.

`db/build_db.sh` builds data model files that are used by analysis logic layer.

`db/load_db.sh` loads local or remote pre-built data model files to the HTTP server layer.

### Analysis logic layer
### 4.2. Analysis logic layer

`src` directory contains files that implement the analysis logic layer.

### API layer
### 4.3. API layer

Discussions in PedOT meetings, Slack work space, GitHub issues, etc specify the API layer.

### HTTP server layer
### 4.4. HTTP server layer

`main.R` runs the `OpenPedCan-api` HTTP server. The HTTP server is implemented using [libuv](http://docs.libuv.org/en/stable/design.html) and [http-parser](https://github.com/nodejs/http-parser) and called by [R package plumber](https://github.com/rstudio/plumber).

Expand All @@ -192,22 +199,18 @@ The API HTTP server handles every HTTP request [sequentially](https://www.rplumb
- Convert the return value of the endpoint R function to defined response content type, e.g. JSON and PNG.
- Send HTTP response to the request address.

### Testing layer
### 4.5. Testing layer

The `tests` directory contain all tools and code for testing the API server. `tests/http_response_output_files` contains the API server response plots and tables. `tests/results` contains results generated during test run.

### Deployment layer
### 4.6. Deployment layer

Jenkinsfile and Dockerfile specify the procedures to deploy the `OpenPedCan-api` server.

## API Development road map
## 5. API Development roadmap

Implementation action items:

- Implement endpoint `GET /tpm/gene-all-cancer/json`.
- Implement endpoint `GET /tpm/gene-all-cancer/plot`.
- Remove gene down-sampling procedure in `db/tpm_data_lists.R`, in order to include all genes.
- Build data model in another R process. Load the data model in the API server R process. This may reduce RAM usage.
- Build data model into a Postgres database. Implement/refactor R functions to interact with the Postgres database. This will reduce RAM usage. This may reduce run time.
- Add unit tests to R functions.
- Send more informative response HTTP status code. Currently, all failures use status code 500.
Expand Down
35 changes: 35 additions & 0 deletions changelog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# OpenPedCan-api

## v0.2.0-alpha

### Changed

- Updated data model using [`OpenPedCan-analysis` v9 release data](https://github.com/PediatricOpenTargets/OpenPedCan-analysis/pull/103).
- Enabled endpoint Cross-Origin Resource Sharing (CORS) by default.
- Changed `/tpm/gene-disease-gtex` boxplot title from "Disease vs. GTEx tissue bulk gene expression" to "Primary tumor vs GTEx tissue bulk gene expression".
- Changed "cohort =" to "Dataset =" in boxplot and summary table x-axis labels.
- Changed "cohort" to "Dataset" in boxplot summary table columns.
- Increased minimum number of samples required per `Disease` or `GTEx_tissue_subgroup` from 1 to 3.
- Rotated boxplot x-axis labels by 45 degrees.
- Changed `tests/curl_test_endpoints.sh` variable `API_PORT` to `LOCAL_API_HOST_PORT`.
- Updated `README.md`.

### Added

- Implemented HTTP GET method for `/tpm/gene-all-cancer/json` API endpoint.
- Implemented HTTP GET method for `/tpm/gene-all-cancer/plot` API endpoint.
- Added `cors` filter to enable endpoint CORS by default.
- Added `API_HOST` variable in `tests/curl_test_endpoints.sh`, in order to test DEV and QA hosts.
- Added this `changelog.md`.

## v0.1.0-alpha

### Added

- HTTP GET method for `/tpm/gene-disease-gtex/json` API endpoint.
- HTTP GET method for `/tpm/gene-disease-gtex/plot` API endpoint.
- Tools for building API data model in `db` directory.
- Tools for testing local API HTTP server, in `tests` directory.
- `Dockerfile` for running API HTTP server.
- This `README.md` that specifies development procedure, system design, and development roadmap.
- Git submodule [`OpenPedCan-analysis`](https://github.com/PediatricOpenTargets/OpenPedCan-analysis).
4 changes: 2 additions & 2 deletions db/build_db.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@ cd ..
echo "Download OpenPedCan-analysis data release..."
# The commit ID to checkout to build data model.
#
# 61e23154a34e1d8b3fc1c50a67dd8f79c2067776 points to v8 release with updated
# 96132ae1e7485d9ab129380898fac5e255ccb36f points to v9 release with updated
# OpenPedCan-analysis/analyses/long-format-table-utils/annotator.
OPEN_PED_CAN_ANALYSIS_COMMIT="61e23154a34e1d8b3fc1c50a67dd8f79c2067776"
OPEN_PED_CAN_ANALYSIS_COMMIT="96132ae1e7485d9ab129380898fac5e255ccb36f"

# If submodule repo url changes, e.g. rename, this will update the URL according
# to the one in .gitmodules.
Expand Down
2 changes: 1 addition & 1 deletion db/sha256sum.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
173554ee34308ccd63bd4ce2d408a87d24b7dc925b539f172f150dce001700d5 tpm_data_lists.rds
f381ed881caec94a0b9a4be8fdb800fa69c3b9d2f3b4e44766ebe42117e2ba5b tpm_data_lists.rds
45 changes: 35 additions & 10 deletions src/get_gene_tpm_boxplot.R
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,9 @@ get_gene_tpm_boxplot <- function(gene_tpm_boxplot_tbl) {
stopifnot(is.factor(uniq_x_label_vec))
stopifnot(all(!is.na(uniq_x_label_vec)))

sample_type_vec <- unique(gene_tpm_boxplot_tbl$sample_type)
stopifnot(is.character(sample_type_vec))
stopifnot(all(sample_type_vec %in% c("disease", "normal")))
uniq_sample_type_vec <- unique(gene_tpm_boxplot_tbl$sample_type)
stopifnot(is.character(uniq_sample_type_vec))
stopifnot(all(uniq_sample_type_vec %in% c("disease", "normal")))

efo_id_vec <- purrr::discard(unique(gene_tpm_boxplot_tbl$EFO), is.na)
stopifnot(is.character(efo_id_vec))
Expand All @@ -51,15 +51,41 @@ get_gene_tpm_boxplot <- function(gene_tpm_boxplot_tbl) {
if (length(gtex_subgroup_vec) > 0) {
title <- paste(
paste0(gene_symbol, " (", ensg_id, ")"),
"Disease vs. GTEx tissue bulk gene expression",
"Primary tumor vs GTEx tissue bulk gene expression",
sep = "\n")
} else {
title <- paste(
paste0(gene_symbol, " (", ensg_id, ")"),
"Disease tissue bulk gene expression",
"Primary tumor tissue bulk gene expression",
sep = "\n")
}

if (identical(length(uniq_sample_type_vec), 1L)) {
# Only one sample type.
#
# Use grey for either one.
box_fill_colors = c("disease" = "grey80", "normal" = "grey80")
} else {
# More than one saple types.
#
# Use red for diease and grey for normal.
box_fill_colors = c("disease" = "red3", "normal" = "grey80")
}

# The x-axis labels are long and rotated 45 degrees, so they are out of the
# plot in the default margin. Increase right margin to fit all text.
plot_margin <- ggplot2::theme_get()$plot.margin

if (!identical(length(plot_margin), 4L)) {
plot_margin <- rep(grid::unit(x = 5.5, units = "points"), 4)
}

rightmost_x_label <- dplyr::last(
levels(gene_tpm_boxplot_tbl$x_labels), default = "")
# increase right margin by the width of the last x label * 0.71
plot_margin[2] <- grid::unit(
x = 0.8, units = "strwidth", data = rightmost_x_label)

gene_tpm_boxplot <- ggplot2::ggplot(gene_tpm_boxplot_tbl,
ggplot2::aes(x = x_labels, y = TPM,
fill = sample_type)) +
Expand All @@ -69,12 +95,11 @@ get_gene_tpm_boxplot <- function(gene_tpm_boxplot_tbl) {
ggplot2::ylab("TPM") +
ggplot2::xlab("") +
ggplot2_publication_theme(base_size = 12) +
ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 90,
vjust = 0.5,
hjust = 1)) +
ggplot2::theme(
axis.text.x = ggplot2::element_text(angle = -45, vjust = 1, hjust = 0),
plot.margin = plot_margin) +
ggplot2::ggtitle(title) +
ggplot2::scale_fill_manual(values = c("disease" = "red3",
"normal" = "grey80")) +
ggplot2::scale_fill_manual(values = box_fill_colors) +
ggplot2::guides(fill = "none")

return(gene_tpm_boxplot)
Expand Down
Loading