Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker Changes #574

Open
wants to merge 26 commits into
base: staging
Choose a base branch
from
Open

Docker Changes #574

wants to merge 26 commits into from

Conversation

rajatkapoordfci
Copy link

@rajatkapoordfci rajatkapoordfci commented Jul 23, 2024

PR Scope : For phase 1 , the scope is limited to dockerization of the Variation Normalizer. As of now , Variation Normalizer has to started after all the Db services are up and ready with data download, This is a manual step in this phase. Once a user has confirmed all the Db services are ready with data then only Variation Normalizer is started.

Phase 2 : We can target , automating the orchestration of all services required for Variation Normalizer. Primary scope would be removal of the manual step of checking whether all db services are up and loaded with respective data.

Please review the PR for docker related changes.

jsstevenson and others added 26 commits March 22, 2024 09:33
Resolves `invalid value workflow reference: no version specified`
A working dockerfile will be added back in #553
* Methods now raise ValueError if incorrect enum value passed
close #552

* Remove `vrs_ref_allele_seq` from `GnomadVcfToProteinService` model.
Instead, this is now stored in `variation.locaation.sequence`
@korikuzma
Copy link
Member

@rajatkapoordfci Thanks for the PR! Testing it out now.

- seqrepo_vol:/usr/local/share/seqrepo
command: >
sh -c "pipenv run gene_norm_update --update_all --update_merged &&
cd src wait_for_db &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is wait_for_db?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait_for_db is the custom function to handle the database race condition , as depends on only checks whether service is running or not. It will be implemented in next phase. Please review further and I will remove this command for now along with other review comments(if any) for this phase.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajatkapoordfci Would you be able to update the PR description with what this phase intended to achieve and what the next phase(s) include?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Add Phase scope in the description.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

# Runs service on port 80.
# Healthchecks service up every 5m.

FROM python:3.10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably good to use the latest Python version

Suggested change
FROM python:3.10
FROM python:3.12

Copy link
Author

@rajatkapoordfci rajatkapoordfci Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can upgrade the python version to 3.12 , however this would depend on the compatibility with all the dependant modules of Variation Normalizer. If currently Variation Normalizer is working on 3.12 , we can upgrade to 3.12.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Variation Normalizer works with 3.12

Dockerfile Outdated
Comment on lines 10 to 11
RUN if [ ! -f "Pipfile.lock" ] ; then pipenv lock ; else echo Pipfile.lock exists ; fi
RUN pipenv sync
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of pipenv, can we just install deps using pip?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be done , but this would require change in Docker commands. I have followed the existing setup of Variation Normalizer mentioned in Readme which is using pip. Any particular reason for moving to pip approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajatkapoordfci We've been removing pipenv from our projects in favor of pip (and we'll probably switch to uv). pipenv is also pretty slow


From the root directory , where Docker and docker-compose file are , run the following commands:

*docker-compose up -d db
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is clearer to rename db to seqrepo

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, your team can finalise the naming convention for the services . I will rename them accordingly.


*docker-compose up -d db
*docker-compose up -d uta
*docker-compose up -d dynamodb-local
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is clearer to rename dynamodb-local to vicc-normalizers-ddb or something similar. I assume in the future, we will be re-using this for gene/disease/therapy.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, your team can finalise the naming convention for the services . I will rename them accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not get this to work on my machine and had to make some modifications. Have you pushed all your commits? And were you able to get this working on your machine?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes , It working fine on my machine without any change. Please share whether it's working in your machine or not. If not what are the errors. We can connect on call as well to discuss further.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajatkapoordfci what machine are you using again? We use MacOS.

The main issues I encountered were not having the correct dependencies installed (gene-normalizer etl requires the etl optional dependency to be installed) and permissions errors.

If you want, I can make a PR into your branch with the changes I had to make in order to get it semi-working. Cool-Seq-Tool creates a genomic table (which I'm not sure this is the best approach and I have an issue to revisit this) and requires some additional setup before starting the service. This part I did not get to fixing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/cancervariants/variation-normalization/tree/docker-compose-kori Here is my branch where I made some changes. I didn't get to fixing the Cool-Seq-Tool additional genomic table in UTA db yet. @rajatkapoordfci thanks again for working on this. It's great work and exciting to see!

@korikuzma korikuzma removed their assignment Nov 14, 2024
Copy link

github-actions bot commented Jan 6, 2025

This PR is stale because it has been open 3 day(s) with no activity. Please review this PR.

@github-actions github-actions bot added the stale label Jan 6, 2025
@korikuzma korikuzma removed the stale label Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority:medium Medium priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants