Neural search through protein sequences using the ProtBert model and the Jina AI framework.
App demo:
protein_search.mp4
- ProtBert https://huggingface.co/Rostlab/prot_bert
- Jina AI https://jina.ai
- 3D protein models http://3dmol.org
First, clone the repository with git
,
git clone https://github.com/georgeamccarthy/protein_search/ # Cloning
cd protein_search # Changing directory
If you're familiar with Docker
, you can simply run make docker
(assuming you're running Linux).
The above command will,
- Create the container for the
frontend
, installs dependencies, starts theStreamlit
application - Create the container for the
backend
, installs dependencies, starts theJina
application - Provide you with links as logs to access the two containers
Visually, you should see something like,
From there on, you should be able to visit the Streamlit frontend, and enter your protein relatd query.
Some notes before you use this route,
Docker
takes a few moments to build the wheel for the dependencies, so thepip
step in each of the containers my last as long as 1-2 minutes.- The
torch
dependency inbackend/requirements.txt
is 831.1 MBs large at the time of writing. Unless you get red colored logs, everything is fine and just taking time to be installed fortorch
- This project uses the
Rostbert/prot_bert
pre-trained model fromHuggingFace
which is 1.68 GBs in size.
The great news is that you will need to install these dependencies and build the images only once. Docker will cache all of the layers and steps, and caching for the pre-trained model has been integrated.
Some more functionalites provided are,
- To stop the logs from
docker
, pressCtrl^C
- For resuming, run
make up
- To remove the containers from the background, run
make remove
- To build the containers again, run
make docker
As for introducing new changes, both the containers do not need to be restarted to do so.
For each of the folders frontend
, and backend
, run the following commands
- Making a new
venv
virtual environment,
cd folder_to_go_into/ # `folder_to_go_into` is either `frontend` or `backend`
python3 -m venv env
source venv/bin/activate
- Installing dependencies
pip install -r requirements.txt
If in backend
, run python3 src/app.py
Open a new terminal, head back into the frontend
folder, repeat venv
creation and dependency
installation, and run streamlit run app.py
.
Refer to the Makefile
for the specific commands
To format code following the black
standard
$ make format
Code linting with flake8
$ make lint
Testing
$ make test
Testing with coverage analysis
$ make coverage
Format, test and coverage
$ make build