This project will wrangle short-read genomic alignments, for example from wastewater-sampling, into a format for easy import into Loculus and its sequence database SILO.
sr2silo is designed to process a nucliotide alignments from .bam
files with metadata, translate and align reads in amino acids, gracefully handling all insertions and deletions and upload the results to the backend LAPIS-SILO.
.github/workflows
: Contains GitHub Actions used for building, testing, and publishing. install, and whether or not to mount the project directory into the container..vscode/settings.json
: Contains VSCode settings specific to the project, such as the Python interpreter to use and the maximum line length for auto-formatting.src
: Place new source code here.scripts
: Place new source code here, temporary and intermediate works.tests
: Contains Python-based test cases to validate source code.pyproject.toml
: Contains metadata about the project and configurations for additional tools used to format, lint, type-check, and analyze Python code.
To build the package and maintain dependencies, we use Poetry. In particular, it's good to install it and become familiar with its basic functionalities by reading the documentation.
- Build and set up the Conda environment using the Makefile:
This command creates the Conda environment (if not already created), installs Poetry, and sets up Diamond.
make setup
-
Install additional development dependencies:
poetry install --with dev poetry run pre-commit install
-
Run tests:
poetry run pytest
This is currently implemented as script and under heavy development. To run, we recommend a build as a docker compose as it relies on other RUST components.
Edit the docker-compose.env
file in the docker-compose
directory with the following paths:
SAMPLE_DIR=../../../data/sr2silo/daemon_test/samples/A1_05_2024_10_08/20241024_2411515907/alignments/
SAMPLE_ID=A1_05_2024_10_08
BATCH_ID=20241024_2411515907
TIMELINE_FILE=../../../data/sr2silo/daemon_test/timeline.tsv
NEXTCLADE_REFERENCE=sars-cov2
RESULTS_DIR=./results
KEYCLOAK_TOKEN_URL=https://authentication-wise-seqs.loculus.org/realms/loculus/protocol/openid-connect/token
SUBMISSION_URL=https://backend-wise-seqs.loculus.org/test/submit?groupId={group_id}&dataUseTermsType=OPEN
CI=false
KEYCLOAK_TOKEN_URL and SUBMISSION_URL are used for the submission to lapis.
CI determines if sr2silo
runs in a Continuous Integration pipeline and shall mock
uploads and skip submissions.
To upload the processed outputs S3 storage is required.
For sensitive information like AWS credentials, use Docker secrets. Create the following files in the secrets directory:
secrets/aws_access_key_id.txt
:
YourAWSAccessKeyId
secrets/aws_secret_access_key.txt
:
YourAWSSecretAccessKey
secrets/aws_default_region.txt
:YourAWSRegion
To process a single sample, run the following command:
docker-compose --env-file .env up --build
The code quality checks run on GitHub can be seen in
.github/workflows/test.yml
for the python package CI/CD,
We are using:
- Ruff to lint the code.
- Black to format the code.
- Pyright to check the types.
- Pytest to run the unit tests code and workflows.
- Interrogate to check the documentation.
This project welcomes contributions and suggestions. For details, visit the repository's Contributor License Agreement (CLA) and Code of Conduct pages.