Skip to content

Commit

Permalink
Merge branch 'release-16.2.0'
Browse files Browse the repository at this point in the history
  • Loading branch information
rxu17 committed Feb 14, 2024
2 parents be43eb6 + 012e48c commit 429dafb
Show file tree
Hide file tree
Showing 42 changed files with 1,410 additions and 231 deletions.
15 changes: 7 additions & 8 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,8 @@ jobs:
needs: [test, lint]
runs-on: ubuntu-latest
if: github.event_name == 'release'
permissions:
id-token: write
steps:
- uses: actions/checkout@v2
- name: Set up Python
Expand All @@ -70,11 +72,8 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install setuptools wheel twine
- name: Build and publish
env:
TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
run: |
python setup.py sdist bdist_wheel
twine upload dist/*
pip install setuptools wheel build
- name: Build distributions
run: python -m build
- name: Publish to pypi
uses: pypa/gh-action-pypi-publish@release/v1
31 changes: 29 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,8 +94,14 @@ This package uses [semantic versioning](https://semver.org/) for releasing new v
### Testing
#### Running test pipeline
Make sure to run each of the [pipeline steps here](README.md#developing-locally) on the test pipeline and verify that your pipeline runs as expected. This is __not__ automatically run by Github Actions and have to be manually run.
#### Running tests
##### Tests in Python
This package uses [`pytest`](https://pytest.org/en/latest/) to run tests. The test code is located in the [tests](./tests) subdirectory.
Here's how to run the test suite:
Expand All @@ -104,7 +110,17 @@ Here's how to run the test suite:
pytest -vs tests/
```

Tests are also run automatically by Github Actions on any pull request and are required to pass before merging.
Tests in Python are also run automatically by Github Actions on any pull request and are required to pass before merging.

##### Tests in R

This package uses [`testthat`](https://testthat.r-lib.org/) to run tests in R. The test code is located in the [testthat](./R/tests/testthat) subdirectory.

Here's how to run the test suite:

```shell
Rscript -e "testthat::test_dir('R/tests/testthat/')"
```

#### Test Development

Expand Down Expand Up @@ -134,6 +150,17 @@ Follow gitflow best practices as linked above.
1. Merge `main` back into `develop`
1. Push `develop`

### DockerHub
### Modifying Docker

Follow this section when modifying the [Dockerfile](https://github.com/Sage-Bionetworks/Genie/blob/main/Dockerfile):

1. Have your synapse authentication token handy
1. ```docker build -f Dockerfile -t <some_docker_image_name> .```
1. ```docker run --rm -it -e SYNAPSE_AUTH_TOKEN=$YOUR_SYNAPSE_TOKEN <some_docker_image_name>```
1. Run [test code](README.md#developing-locally) relevant to the dockerfile changes to make sure changes are present and working
1. Once changes are tested, follow [genie contributing guidelines](#developing) for adding it to the repo
1. Once deployed to main, make sure docker image was successfully deployed remotely (our docker image gets automatically deployed) [here](https://hub.docker.com/repository/docker/sagebionetworks/genie/builds)

#### Dockerhub

This repository does not use github actions to push docker images. By adding the `sagebiodockerhub` github user as an Admin to this GitHub repository, we can configure an automated build in DockerHub. You can view the builds [here](https://hub.docker.com/repository/docker/sagebionetworks/genie/builds). To get admin access to the DockerHub repository, ask Sage IT to be added to the `genieadmin` DockerHub team.
4 changes: 2 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ RUN apt-get update && apt-get install -y --allow-unauthenticated --no-install-re
# texlive-generic-recommended \
texlive-latex-extra \
# genome nexus
openjdk-8-jre \
openjdk-11-jre \
# This is for reticulate
python3.8-venv && \
apt-get clean && \
Expand Down Expand Up @@ -83,6 +83,6 @@ WORKDIR /root/
# Must move this git clone to after the install of Genie,
# because must update cbioportal
RUN git clone https://github.com/cBioPortal/cbioportal.git -b v5.3.19
RUN git clone https://github.com/Sage-Bionetworks/annotation-tools.git -b 0.0.2
RUN git clone https://github.com/Sage-Bionetworks/annotation-tools.git -b 0.0.4

WORKDIR /root/Genie
41 changes: 41 additions & 0 deletions R/dashboard_template_functions.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# ---------------------------------------------------------------------------
# Title: dashboard_template_functions.R
# Description: This script contains helper functions used in
# templates/dashboardTemplate.Rmd
# ---------------------------------------------------------------------------

#' This function gets the database to synapse id mapping table,
#' maps the provided database_name to its synapse id and returns it
#'
#' @param database_name (str) database name in database
#' to synapse id mapping table
#' @param database_synid_mappingid (str) synapse id of the database
#' to synapse id mapping table
#'
#' @return (str) synapse id of the mapped database name
get_syn_id_from_mapped_database <- function(database_name, database_synid_mappingid){
database_synid_mapping = synTableQuery(sprintf('select * from %s',
database_synid_mappingid))
database_synid_mappingdf = as.data.frame(database_synid_mapping)
table_synid = database_synid_mappingdf$Id[database_synid_mappingdf$Database == database_name]
return(table_synid)
}

#' This function creates a table of failed annotation counts by grouped columns
#' @param maf_data (data.frame) input maf data frame
#' @param group_by_cols (str vector) list of columns to create counts by
#' @param counts_col_name (str) name to give to the counts column
#'
#' @return (data.frame) counts table
get_failed_annotation_table_counts <- function(maf_data, group_by_cols, counts_col_name){
table_counts <- table(maf_data[maf_data$Annotation_Status == "FAILED", group_by_cols])

if (nrow(table_counts) == 0){
counts_table <- data.frame(matrix(ncol = length(group_by_cols) + 1, nrow = 0))
} else{
counts_table <- as.data.frame(table_counts)
}
colnames(counts_table) <- c(group_by_cols, counts_col_name)
counts_table <- counts_table[do.call(order, counts_table[group_by_cols]), ]
return(counts_table)
}
69 changes: 69 additions & 0 deletions R/tests/testthat/test_dashboard_template_functions.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# tests for dashboard_template_functions.R

source("../../dashboard_template_functions.R")

library(synapser)
library(testthat)

sample_counts_table <- function() {
data <- data.frame(
Center = factor(c("GOLD","SAGE", "TEST"),
levels = c("GOLD", "SAGE", "TEST")),
Counts = c(1, 2, 1)
)
return(data)
}

empty_counts_table <- function() {
data <- data.frame(
Center = logical(),
Counts = logical()
)
return(data)
}

sample_maf_table <- function() {
data <- data.frame(
Center = c("TEST", "TEST", "SAGE", "SAGE", "GOLD", "BRONZE"),
Tumor_Sample_Barcode = c("SAGE1", "SAGE2", "SAGE3", "SAGE4", "SAGE5", "SAGE6"),
Annotation_Status = c("SUCCESS", "FAILED", "FAILED", "FAILED", "FAILED", "SUCCESS")
)
return(data)
}

sample_maf_table_no_failed_annotations <- function() {
data <- data.frame(
Center = c("TEST", "SAGE", "GOLD"),
Tumor_Sample_Barcode = c("SAGE1", "SAGE2", "SAGE3"),
Annotation_Status = c("SUCCESS", "SUCCESS", "SUCCESS")
)
return(data)
}


test_that("get_syn_id_from_mapped_database_gets_correct_value", {
synLogin()
result <- get_syn_id_from_mapped_database(
database_name = "main",
database_synid_mappingid = "syn11600968"
)
expect_equal(result, "syn7208886")
})


test_that("get_failed_annotation_table_counts_returns_expected_output", {
result <- get_failed_annotation_table_counts(
maf_data=sample_maf_table(),
group_by_cols="Center",
counts_col_name="Counts")
expect_equal(result, sample_counts_table())
})

test_that("get_failed_annotation_table_counts_returns_empty_table_with_no_failed_annotations", {
result <- get_failed_annotation_table_counts(
maf_data=sample_maf_table_no_failed_annotations(),
group_by_cols="Center",
counts_col_name="Counts")
expect_equal(result, empty_counts_table())
})

26 changes: 21 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,6 @@ genie validate data_clinical_supp_SAGE.txt SAGE
```



## Contributing

Please view [contributing guide](CONTRIBUTING.md) to learn how to contribute to the GENIE package.
Expand All @@ -65,6 +64,16 @@ These are instructions on how you would develop and test the pipeline locally.
pip install -r requirements-dev.txt
```
If you are having trouble with the above, try installing via `pipenv`
1. Specify a python version that is supported by this repo:
```pipenv --python <python_version>```
1. [pipenv install from requirements file](https://docs.pipenv.org/en/latest/advanced.html#importing-from-requirements-txt)
1. Activate your `pipenv`:
```pipenv shell```
1. Configure the Synapse client to authenticate to Synapse.
1. Create a Synapse [Personal Access token (PAT)](https://help.synapse.org/docs/Managing-Your-Account.2055405596.html#ManagingYourAccount-PersonalAccessTokens).
1. Add a `~/.synapseConfig` file
Expand All @@ -83,33 +92,40 @@ These are instructions on how you would develop and test the pipeline locally.
1. Run the different pipelines on the test project. The `--project_id syn7208886` points to the test project.
1. Validate all the files.
1. Validate all the files **excluding vcf files**:
```
python bin/input_to_database.py main --project_id syn7208886 --onlyValidate
```
1. Validate **all** the files:
```
python bin/input_to_database.py mutation --project_id syn7208886 --onlyValidate --genie_annotation_pkg ../annotation-tools
```
1. Process all the files aside from the mutation (maf, vcf) files. The mutation processing was split because it takes at least 2 days to process all the production mutation data. Ideally, there is a parameter to exclude or include file types to process/validate, but that is not implemented.
```
python bin/input_to_database.py main --project_id syn7208886 --deleteOld
```
1. Process the mutation data. Be sure to clone this repo: https://github.com/Sage-Bionetworks/annotation-tools. This repo houses the code that re-annotates the mutation data with genome nexus. The `--createNewMafDatabase` will create a new mutation tables in the test project. This flag is necessary for production data for two main reasons:
1. Process the mutation data. Be sure to clone this repo: https://github.com/Sage-Bionetworks/annotation-tools and `git checkout` the version of the repo pinned to the [Dockerfile](https://github.com/Sage-Bionetworks/Genie/blob/main/Dockerfile). This repo houses the code that re-annotates the mutation data with genome nexus. The `--createNewMafDatabase` will create a new mutation tables in the test project. This flag is necessary for production data for two main reasons:
* During processing of mutation data, the data is appended to the data, so without creating an empty table, there will be duplicated data uploaded.
* By design, Synapse Tables were meant to be appended to. When a Synapse Tables is updated, it takes time to index the table and return results. This can cause problems for the pipeline when trying to query the mutation table. It is actually faster to create an entire new table than updating or deleting all rows and appending new rows when dealing with millions of rows.
* If you run this more than once on the same day, you'll run into an issue with overwriting the narrow maf table as it already exists. Be sure to rename the current narrow maf database under `Tables` in the test synapse project and try again.
```
python bin/input_to_database.py mutation --project_id syn7208886 --deleteOld --genie_annotation_pkg ../annotation-tools --createNewMafDatabase
```
1. Create a consortium release. Be sure to add the `--test` parameter. Be sure to clone the cbioportal repo: https://github.com/cBioPortal/cbioportal
1. Create a consortium release. Be sure to add the `--test` parameter. Be sure to clone the cbioportal repo: https://github.com/cBioPortal/cbioportal and `git checkout` the version of the repo pinned to the [Dockerfile](https://github.com/Sage-Bionetworks/Genie/blob/main/Dockerfile)
```
python bin/database_to_staging.py Jan-2017 ../cbioportal TEST --test
```
1. Create a public release. Be sure to add the `--test` parameter. Be sure to clone the cbioportal repo: https://github.com/cBioPortal/cbioportal
1. Create a public release. Be sure to add the `--test` parameter. Be sure to clone the cbioportal repo: https://github.com/cBioPortal/cbioportal and `git checkout` the version of the repo pinned to the [Dockerfile](https://github.com/Sage-Bionetworks/Genie/blob/main/Dockerfile)
```
python bin/consortium_to_public.py Jan-2017 ../cbioportal TEST --test
Expand Down
2 changes: 1 addition & 1 deletion genie/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,6 @@

# create version in __init__.py
# https://packaging.python.org/en/latest/guides/single-sourcing-package-version/
__version__ = "16.1.0"
__version__ = "16.2.0"

__all__ = ["__version__"]
63 changes: 35 additions & 28 deletions genie/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,49 +44,35 @@ def build_parser():

subparsers = parser.add_subparsers(
title="commands",
description="The following commands are available:",
help='For additional help: "genie <COMMAND> -h"',
description="The following commands are available: ",
help='For additional help use: "genie <COMMAND> -h"',
)

parser_validate = subparsers.add_parser(
"validate", help="Validates GENIE file formats"
"validate", help="Validates GENIE file formats. "
)

parser_validate.add_argument(
"filepath",
type=str,
nargs="+",
help="File(s) that you are validating."
"If you validation your clinical files and you have both sample and "
"patient files, you must provide both",
help="File(s) that you are validating. "
"If you have separate clinical sample and patient files, "
"you must provide both files when validating.",
)

parser_validate.add_argument("center", type=str, help="Contributing Centers")

parser_validate.add_argument(
"--format_registry_packages",
type=str,
nargs="+",
default=["genie_registry"],
help="Python package name(s) to get valid file formats from (default: %(default)s).",
)

parser_validate.add_argument(
"--oncotree_link", type=str, help="Link to oncotree code"
)

validate_group = parser_validate.add_mutually_exclusive_group()

validate_group.add_argument(
"--filetype",
type=str,
help="By default, the validator uses the filename to match "
help="Use the --filetype {FILETYPE} parameter to ignore filename validation. "
"By default, the validator uses the filename to match "
"the file format. If your filename is incorrectly named, "
"it will be invalid. If you know the file format you are "
"validating, you can ignore the filename validation and skip "
"to file content validation. "
"Note, the filetypes with SP at "
"the end are for special sponsored projects.",
"it will be invalid. "
"Options: [maf, vcf, clinical, assayinfo, bed, cna, sv, seg, mutationsInCis]",
)

validate_group.add_argument(
Expand All @@ -98,18 +84,39 @@ def build_parser():
"to this directory.",
)

parser_validate.add_argument(
"--oncotree_link",
type=str,
help="Specify an oncotree url when validating your clinical "
"file "
"(e.g: https://oncotree.info/api/tumorTypes/tree?version=oncotree_2021_11_02). "
"By default the oncotree version used will be specified in this entity: "
"syn13890902",
)

parser_validate.add_argument(
"--nosymbol-check",
action="store_true",
help="Ignores specific post-processing validation criteria related to HUGO symbols "
"in the structural variant and cna files.",
)

# TODO: remove this default when private genie project is ready
parser_validate.add_argument(
"--project_id",
type=str,
default="syn3380222",
help="Synapse Project ID where data is stored. (default: %(default)s).",
help="FOR DEVELOPER USE ONLY: Synapse Project ID where data is stored. "
"(default: %(default)s).",
)

parser_validate.add_argument(
"--nosymbol-check",
action="store_true",
help="Do not check hugo symbols of fusion and cna file",
"--format_registry_packages",
type=str,
nargs="+",
default=["genie_registry"],
help="FOR DEVELOPER USE ONLY: Python package name(s) to get valid file formats "
"from (default: %(default)s).",
)

parser_validate.set_defaults(func=validate._perform_validate)
Expand Down
1 change: 1 addition & 0 deletions genie/config.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""Configuration to obtain registry classes"""

import importlib
import logging

Expand Down
Loading

0 comments on commit 429dafb

Please sign in to comment.