Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker vn #553

Draft
wants to merge 19 commits into
base: staging
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ jobs:
strategy:
fail-fast: false
matrix:
python-version: ["3.8", "3.9", "3.10", "3.11"]
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12"]
steps:
- uses: actions/checkout@v3

Expand Down
23 changes: 23 additions & 0 deletions .github/workflows/pr-priority-label.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
name: Pull Request Has Priority Label
on:
pull_request:
types: [opened, labeled, unlabeled, synchronize]
jobs:
pr-priority-label:
runs-on: ubuntu-latest
permissions:
issues: write
pull-requests: write
outputs:
status: ${{ steps.check-labels.outputs.status }}
steps:
- id: check-labels
uses: mheap/github-action-required-labels@v5
with:
mode: exactly
count: 1
labels: "priority:*"
use_regex: true
add_comment: true
message: "PRs require a priority label. Please add one."
exit_type: failure
9 changes: 7 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,14 @@
# Runs service on port 80.
# Healthchecks service up every 5m.

FROM python:3.7
FROM python:3.9
RUN apt update ; apt install -y rsync
RUN pip install pipenv uvicorn[standard]
ENV SEQREPO_ROOT_DIR=/usr/local/share/seqrepo/2021-01-29
ENV GENE_NORM_DB_URL=http://dynamodb:8001
ENV AWS_ACCESS_KEY_ID = 'DUMMYIDEXAMPLE'
ENV AWS_SECRET_ACCESS_KEY = 'DUMMYEXAMPLEKEY'
ENV AWS_DEFAULT_REGION = 'us-west-2'
Comment on lines +8 to +12
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should consider having these as ARGs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These values are required during container runtime as well. It would be better to continue with ENV unless we are planning for build time customisation.

COPY . /app
WORKDIR /app
RUN if [ ! -f "Pipfile.lock" ] ; then pipenv lock ; else echo Pipfile.lock exists ; fi
Expand All @@ -13,4 +18,4 @@ EXPOSE 80
HEALTHCHECK --interval=5m --timeout=3s \
CMD curl -f http://localhost/variation || exit 1

CMD pipenv run uvicorn variation.main:app --port 80 --host 0.0.0.0
CMD cd src pipenv run uvicorn variation.main:app --log-level debug --port 80 --host 0.0.0.0
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2018-2023 VICC
Copyright (c) 2018-2024 VICC

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
89 changes: 72 additions & 17 deletions README.md
rajatkapoordfci marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -1,25 +1,28 @@
# Variation Normalization

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5894937.svg)](https://doi.org/10.5281/zenodo.5894937)
[![image](https://img.shields.io/pypi/v/variation-normalizer.svg)](https://pypi.python.org/pypi/variation-normalizer) [![image](https://img.shields.io/pypi/l/variation-normalizer.svg)](https://pypi.python.org/pypi/variation-normalizer) [![image](https://img.shields.io/pypi/pyversions/variation-normalizer.svg)](https://pypi.python.org/pypi/variation-normalizer) [![Actions status](https://github.com/cancervariants/variation-normalization/actions/workflows/checks.yaml/badge.svg)](https://github.com/cancervariants/variation-normalization/actions/checks.yaml)[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5894937.svg)](https://doi.org/10.5281/zenodo.5894937)

Services and guidelines for normalizing variation terms to [VRS](https://vrs.ga4gh.org/en/latest) compatible representations.
<!-- description -->
The Variation Normalizer parses and translates free-text descriptions of genomic variations into computable objects conforming to the [Variation Representation Specification (VRS)](https://vrs.ga4gh.org/en/latest), enabling consistent and accurate variant harmonization across a diversity of genomic knowledge resources.
<!-- /description -->

Public OpenAPI endpoint: <https://normalize.cancervariants.org/variation>
---

Installing with pip:
[Live OpenAPI endpoint](https://normalize.cancervariants.org/variation)

---

## Installation

Install from [PyPI](https://pypi.org/project/variation-normalizer):

```shell
pip install variation-normalizer
python3 -m pip install variation-normalizer
```

The variation-normalization repo depends on VRS models, and therefore each variation-normalizer package on PyPI uses a particular version of VRS. The correspondences between packages may be summarized as:
---

| variation-normalization branch | variation-normalizer version | gene-normalizer version | VRS version |
| ---- | --- | ---- | --- |
| [main](https://github.com/cancervariants/variation-normalization/tree/main) | 0.6.X | 0.1.X | [1.X.X](https://github.com/ga4gh/vrs) |
| [staging](https://github.com/cancervariants/variation-normalization/tree/staging) | 0.8.X | 0.3.X | [2.0-alpha](https://github.com/ga4gh/vrs/tree/2.0-alpha) |

## About
## Normalization

Variation Normalization works by using four main steps: tokenization, classification, validation, and translation. During tokenization, we split strings on whitespace and parse to determine the type of token. During classification, we specify the order of tokens a classification can have. We then do validation checks such as ensuring references for a nucleotide or amino acid matches the expected value and validating a position exists on the given transcript. During translation, we return a VRS Allele object.

Expand All @@ -36,7 +39,18 @@ Variation Normalizer accepts input from GRCh37 or GRCh8 assemblies.

We are working towards adding more types of variations, coordinates, and representations.

### Endpoints

### VRS Versioning

The variation-normalization repo depends on VRS models, and therefore each variation-normalizer package on PyPI uses a particular version of VRS. The correspondences between packages may be summarized as:

| variation-normalization branch | variation-normalizer version | gene-normalizer version | VRS version |
| ---- | --- | ---- | --- |
| [main](https://github.com/cancervariants/variation-normalization/tree/main) | 0.6.X | 0.1.X | [1.X.X](https://github.com/ga4gh/vrs) |
| [staging](https://github.com/cancervariants/variation-normalization/tree/staging) | 0.8.X | 0.3.X | [2.0-alpha](https://github.com/ga4gh/vrs/tree/2.0-alpha) |


### Available Endpoints

#### `/to_vrs`

Expand All @@ -48,7 +62,7 @@ Returns a VRS Variation aligned to the prioritized transcript. The Variation Nor

If a genomic variation query _is_ given a gene (E.g. `BRAF g.140753336A>T`), the associated cDNA representation will be returned. This is because the gene provides additional strand context. If a genomic variation query is _not_ given a gene, the GRCh38 representation will be returned.

## Developer Instructions
## Development

Clone the repo:

Expand All @@ -68,7 +82,7 @@ pipenv shell
pipenv update && pipenv install --dev
```

### Backend Services
### Required resources

Variation Normalization relies on some local data caches which you will need to set up. It uses pipenv to manage its environment, which you will also need to install.

Expand Down Expand Up @@ -154,11 +168,11 @@ uvicorn variation.main:app --reload
Next, view the OpenAPI docs on your local machine:
<http://127.0.0.1:8000/variation>

### Init coding style tests
### Code QC

Code style is managed by [Ruff](https://docs.astral.sh/ruff/) and checked prior to commit.

Check style with `ruff`:
To perform formatting and check style:

```shell
python3 -m ruff format . && python3 -m ruff check --fix .
Expand Down Expand Up @@ -186,3 +200,44 @@ From the _root_ directory of the repository:
```shell
pytest tests/
```

## Docker Setup:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Docker Setup:
## Docker Setup

This section deals with setting up Variation Normalizer's backend dependencies via Docker. You must have Docker installed for this section. See more [here](https://docs.docker.com/engine/install/).

To create a new Docker network, use the [docker network create](https://docs.docker.com/reference/cli/docker/network/create/) command. For example, `docker network create tulip-net`

## SeqRepo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a subheading

Suggested change
## SeqRepo
### SeqRepo

Variation Normalizer depends on [Biocommons SeqRepo](https://github.com/biocommons/biocommons.seqrepo). It is recommended to have the image as a volume attached to SeqRepo since the size exceeds 10 GB and can take a while to download.
1. Pull the image from Docker Hub Repository:

```shell
docker pull biocommons/seqrepo

## UTA
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a subheading

Suggested change
## UTA
### UTA

The Postgres UTA instance is another dependancy required for Variation Normalizer. To setup Container for UTA postgres Db instance. Follow the following steps:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The Postgres UTA instance is another dependancy required for Variation Normalizer. To setup Container for UTA postgres Db instance. Follow the following steps:
[Biocommons UTA](https://github.com/biocommons/uta) is another dependency required for the Variation Normalizer.

a.) Pull the image from Docker Hub Repository by typing following command in terminal.
Set the uta_v env variable by typing command uta_v=<"name of the version>. For eg uta_v=uta_20210129b.
Command : docker pull biocommons/uta:$uta_v
b.) Once the image is downnloaded, Start the container with the command :
docker run
-d
-e POSTGRES_PASSWORD=some-password-that-you-make-up
-v /tmp:/tmp
-v uta_vol:/var/lib/postgresql/data
--name $uta_v
--net=<"name of the network> \
biocommons/uta:$uta_v
Comment on lines +218 to +229
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
a.) Pull the image from Docker Hub Repository by typing following command in terminal.
Set the uta_v env variable by typing command uta_v=<"name of the version>. For eg uta_v=uta_20210129b.
Command : docker pull biocommons/uta:$uta_v
b.) Once the image is downnloaded, Start the container with the command :
docker run
-d
-e POSTGRES_PASSWORD=some-password-that-you-make-up
-v /tmp:/tmp
-v uta_vol:/var/lib/postgresql/data
--name $uta_v
--net=<"name of the network> \
biocommons/uta:$uta_v
1. Pull the image from Docker Hub Repository:
```shell
docker pull biocommons/uta:$uta_v

Where uta_v is the name of the UTA version, for example uta_v=uta_20210129b

b.) Once the image is downloaded, run the following:

docker run  
-d  
-e POSTGRES_PASSWORD=some-password-that-you-make-up  
-v /tmp:/tmp  
-v uta_vol:/var/lib/postgresql/data  
--name $uta_v  
--net=<"name of the network> \  
biocommons/uta:$uta_v


### DynamoDB
AWS provides a docker image for the local instance. The DynamoDB local instance requires credentials (`AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`). You can provide dummy values for these if you do not have an AWS account.
1. Pull the image from Docker Hub repository and start the container:

```shell
docker run --net tulip-net -d --name dynamodb -p 8001:8001 amazon/dynamodb-local:1.18.0 -jar DynamoDBLocal.jar -port 8001

### Running the Dockerfile locally

1. Build the image from the docker file:

```shell
docker build -t variation-normalization .
Loading