Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update command line parameters (#32) #33

Merged
merged 1 commit into from
Jan 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,5 @@
VILOCA: VIral LOcal haplotype reconstruction and mutation CAlling for short and long read data
===============
[![Build Status](https://travis-ci.org/cbg-ethz/shorah.svg?branch=master)](https://travis-ci.org/cbg-ethz/shorah)
[![Bioconda package](https://img.shields.io/conda/dn/bioconda/shorah.svg?label=Bioconda)](https://bioconda.github.io/recipes/shorah/README.html)
[![Docker container](https://quay.io/repository/biocontainers/shorah/status)](https://quay.io/repository/biocontainers/shorah)


VILOCA is an open source project for the analysis of next generation sequencing
data. It is designed to analyse genetically heterogeneous samples. Its tools
Expand All @@ -17,22 +13,26 @@ genetic variants present in a mixed sample.
For installation miniconda is recommended: https://docs.conda.io/en/latest/miniconda.html.
We recommend to install VILOCA in a clean conda environment:
```
conda create --name env_viloca libshorah
conda create --name env_viloca --channel conda-forge --channel bioconda libshorah
conda activate env_viloca
pip install git+https://github.com/LaraFuhrmann/VILOCA@master
pip install git+https://github.com/cbg-ethz/VILOCA@master
```

### Example
To test your installation, we recommend running the program on `tests/data_1`.
To test your installation run VILOCA `tests/data_1`:
```
viloca run -b test_aln.cram -f test_ref.fasta -z scheme.insert.bed --mode use_quality_scores
```


If the sequencing amplicon strategy is known, we recommend using the amplicon-mode of the program, which takes as input the `<smth>.insert.bed` - file:
`shorah shotgun -b test_aln.cram -f test_ref.fasta -z scheme.insert.bed --mode use_quality_scores`
`viloca run -b test_aln.cram -f test_ref.fasta -z scheme.insert.bed --mode use_quality_scores`

If the sequencing quality scores are not trustable, the sequencing error parameters can also be learned:
`shorah shotgun -b test_aln.cram -f test_ref.fasta -z scheme.insert.bed --mode learn_error_params`.
`viloca run -b test_aln.cram -f test_ref.fasta -z scheme.insert.bed --mode learn_error_params`.

If there is no information on the sequencing amplicon strategy available, run:
`shorah shotgun -b test_aln.cram -f test_ref.fasta --mode use_quality_scores`
`viloca run -b test_aln.cram -f test_ref.fasta --mode use_quality_scores`

### Parameters
There are several parameters available:
Expand Down
631 changes: 480 additions & 151 deletions poetry.lock

Large diffs are not rendered by default.

8 changes: 4 additions & 4 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
[tool.poetry]
name = "ShoRAH"
name = "VILOCA"
version = "0.1.0"
description = "SHOrt Reads Assembly into Haplotypes"
license = "GPL-3.0-only"
authors = ["Benjamin Langer <[email protected]>"]
authors = ["Benjamin Langer <[email protected]>, Lara Fuhrmann <[email protected]>"]
build = "build.py"
packages = [
{ include = "shorah" }
{ include = "viloca" }
]

[tool.poetry.scripts]
shorah = 'shorah.cli:main'
viloca = 'viloca.cli:main'

[tool.poetry.dependencies]
python = ">=3.9.9,<3.11"
Expand Down
2 changes: 1 addition & 1 deletion tests/data_1/shotgun_test.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/bin/bash

shorah shotgun -a 0.1 -w 201 -x 100000 -p 0.9 -c 0 \
viloca run -a 0.1 -w 201 -x 100000 -p 0.9 -c 0 \
-r HXB2:2469-3713 -R 42 -f test_ref.fasta -b test_aln.cram --out_format csv "$@"
2 changes: 1 addition & 1 deletion tests/data_5/shotgun_prepare.sh
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
#!/bin/bash

shorah shotgun -a 0.1 -w 42 -x 100000 -p 0.9 -c 0 -r REF:43-273 -R 42 -b test_aln.cram -f ref.fasta
viloca run -a 0.1 -w 42 -x 100000 -p 0.9 -c 0 -r REF:43-273 -R 42 -b test_aln.cram -f ref.fasta
2 changes: 1 addition & 1 deletion tests/test_b2w.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import filecmp
import os
import glob
from shorah import b2w, tiling
from viloca import b2w, tiling
import math
import libshorah

Expand Down
24 changes: 18 additions & 6 deletions tests/test_b2w_mapping.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
from array import array
import pytest
from cigar import Cigar
from shorah import b2w
from viloca import b2w
import hashlib

class MockAlignedSegment:
def __init__(self, query_name: str, reference_start: int, query_sequence: str, cigarstring: str):
Expand Down Expand Up @@ -35,10 +36,10 @@ def add_indels(self, indels_map):
cnt = self.reference_start
for i in self.cigartuples:
if i[0] == 1: # insert TODO Justify -1
indels_map.append((self.query_name, self.reference_start, hash(self.cigarstring), cnt-1, i[1], 0)) # cnt-1
indels_map.append((self.query_name, self.reference_start, hashlib.sha1(self.cigarstring.encode()).hexdigest(), cnt-1, i[1], 0)) # cnt-1
elif i[0] == 2: # del
for k in range(i[1]):
indels_map.append((self.query_name, self.reference_start, hash(self.cigarstring), cnt+k, 0, 1))
indels_map.append((self.query_name, self.reference_start, hashlib.sha1(self.cigarstring.encode()).hexdigest(), cnt+k, 0, 1))
cnt += i[1]
else:
cnt += i[1]
Expand Down Expand Up @@ -326,11 +327,22 @@ def test_run_one_window(mArr, spec, window_length, window_start, extended_window
mock_dict = mocker.MagicMock()
mock_dict.__getitem__.return_value = 42

arr, _, _, _, _, _ = b2w._run_one_window(
# added by Lara
original_window_length = window_length
control_window_length = window_length

if extended_window_mode:
for pos, val in max_indel_at_pos.items():
if window_start <= pos < window_start + original_window_length:
control_window_length += val


arr, _, _, _, = b2w._run_one_window(
mock_samfile,
window_start,
window_start, # 0 based
"HXB2-does-not-matter",
window_length,
control_window_length,
0,
mock_dict,
0,
Expand All @@ -343,4 +355,4 @@ def test_run_one_window(mArr, spec, window_length, window_start, extended_window
print(arr)

for idx, el in enumerate(arr):
assert el.split("\n")[1] == spec[idx]
assert el.split("\n")[1] == spec[idx]
2 changes: 1 addition & 1 deletion tests/test_envp_post.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from unittest.mock import patch, mock_open
from shorah import envp_post
from viloca import envp_post

DEFAULT_MOCK_DATA = "default mock data"

Expand Down
4 changes: 2 additions & 2 deletions tests/test_pooled_post.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from unittest.mock import patch, mock_open
from shorah import pooled_post
from viloca import pooled_post
import numpy as np

DEFAULT_MOCK_DATA = "default mock data"
Expand Down Expand Up @@ -48,4 +48,4 @@ def open_side_effect(name):
# open("debug/w-HXB2-2938-3138.dbg"),
# open("support/w-HXB2-2938-3138.reads-support.fas"),
# open("corrected/w-HXB2-2938-3138.reads-cor.fas"),
# "shorah") # TODO
# "shorah") # TODO
4 changes: 2 additions & 2 deletions tests/test_pooled_pre.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import pysam
import os
from shorah import pooled_pre
from viloca import pooled_pre

def test__annotate_alignment_file():
out = "out.bam"
Expand Down Expand Up @@ -48,4 +48,4 @@ def test_pre_process_pooled():
os.remove(out + ".bai")

assert a[0] == a[1] != 0
assert a[2] == 0
assert a[2] == 0
4 changes: 2 additions & 2 deletions tests/test_shorah_snv.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import pytest
from shorah.shorah_snv import _compare_ref_to_read, SNP_id, SNV
from viloca.shorah_snv import _compare_ref_to_read, SNP_id, SNV


@pytest.mark.parametrize("ref, seq, spec", [
Expand Down Expand Up @@ -39,4 +39,4 @@ def test_compare_ref_to_read(ref, seq, spec):

assert snp == spec

assert tot_snv == len(snp)
assert tot_snv == len(snp)
2 changes: 1 addition & 1 deletion tests/test_tiling.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from shorah import tiling
from viloca import tiling
import pytest

def test_equispaced():
Expand Down
File renamed without changes.
2 changes: 1 addition & 1 deletion shorah/__main__.py → viloca/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
- https://docs.python.org/2/using/cmdline.html#cmdoption-m
- https://docs.python.org/3/using/cmdline.html#cmdoption-m
"""
from shorah.cli import main
from viloca.cli import main

if __name__ == "__main__":
main()
Loading