Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update command line parameters #32

Merged
merged 25 commits into from
Nov 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
76cfd15
working multiprocessing
LaraFuhrmann Sep 6, 2023
b2f8e52
counter and control_window_length are now computed in advance of _run…
LaraFuhrmann Sep 8, 2023
17bfca7
[test] test_run_one_window had to be adapted as it has now also the c…
LaraFuhrmann Sep 8, 2023
b8b2450
[test] test_run_one_window had to be adapted as it has now also the c…
LaraFuhrmann Sep 8, 2023
9005626
[test] correct expected output
LaraFuhrmann Sep 8, 2023
d863bb5
[test] computation control_window_length
LaraFuhrmann Sep 8, 2023
9b5c349
fixed: problem with hash not being deterministic which caused problem…
LaraFuhrmann Nov 6, 2023
5214a89
clean up
LaraFuhrmann Nov 6, 2023
a96928f
correct indent
LaraFuhrmann Nov 6, 2023
b344ae1
random hash to deterministic hash
LaraFuhrmann Nov 6, 2023
6c2f9e5
update multiprocessing modules that are needed
LaraFuhrmann Nov 6, 2023
e6c5e52
fix: computation control window length
LaraFuhrmann Nov 9, 2023
44b66e4
delete empty log file
LaraFuhrmann Nov 22, 2023
1ad8951
delete empty log file
LaraFuhrmann Nov 22, 2023
caeded9
shotgun command to run command
LaraFuhrmann Nov 22, 2023
f15d199
update log file naming
LaraFuhrmann Nov 22, 2023
211461c
[README] add channels to conda enviroment installation]
LaraFuhrmann Nov 23, 2023
956af67
updated readme with new commands
LaraFuhrmann Nov 23, 2023
f9e7cb4
update imported packages to viloca, and other naming
LaraFuhrmann Nov 23, 2023
9be8f5b
update viloca naming here
LaraFuhrmann Nov 23, 2023
4fe78c8
update viloca naming here
LaraFuhrmann Nov 23, 2023
1d4185f
[unit tests] update commands
LaraFuhrmann Nov 23, 2023
a7ae559
update author
LaraFuhrmann Nov 23, 2023
da27b76
build is needed for libshorah
LaraFuhrmann Nov 23, 2023
98a8e98
updated readme
LaraFuhrmann Nov 27, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,5 @@
VILOCA: VIral LOcal haplotype reconstruction and mutation CAlling for short and long read data
===============
[![Build Status](https://travis-ci.org/cbg-ethz/shorah.svg?branch=master)](https://travis-ci.org/cbg-ethz/shorah)
[![Bioconda package](https://img.shields.io/conda/dn/bioconda/shorah.svg?label=Bioconda)](https://bioconda.github.io/recipes/shorah/README.html)
[![Docker container](https://quay.io/repository/biocontainers/shorah/status)](https://quay.io/repository/biocontainers/shorah)


VILOCA is an open source project for the analysis of next generation sequencing
data. It is designed to analyse genetically heterogeneous samples. Its tools
Expand All @@ -17,22 +13,26 @@ genetic variants present in a mixed sample.
For installation miniconda is recommended: https://docs.conda.io/en/latest/miniconda.html.
We recommend to install VILOCA in a clean conda environment:
```
conda create --name env_viloca libshorah
conda create --name env_viloca --channel conda-forge --channel bioconda libshorah
conda activate env_viloca
pip install git+https://github.com/LaraFuhrmann/VILOCA@master
pip install git+https://github.com/cbg-ethz/VILOCA@master
```

### Example
To test your installation, we recommend running the program on `tests/data_1`.
To test your installation run VILOCA `tests/data_1`:
```
viloca run -b test_aln.cram -f test_ref.fasta -z scheme.insert.bed --mode use_quality_scores
```


If the sequencing amplicon strategy is known, we recommend using the amplicon-mode of the program, which takes as input the `<smth>.insert.bed` - file:
`shorah shotgun -b test_aln.cram -f test_ref.fasta -z scheme.insert.bed --mode use_quality_scores`
`viloca run -b test_aln.cram -f test_ref.fasta -z scheme.insert.bed --mode use_quality_scores`

If the sequencing quality scores are not trustable, the sequencing error parameters can also be learned:
`shorah shotgun -b test_aln.cram -f test_ref.fasta -z scheme.insert.bed --mode learn_error_params`.
`viloca run -b test_aln.cram -f test_ref.fasta -z scheme.insert.bed --mode learn_error_params`.

If there is no information on the sequencing amplicon strategy available, run:
`shorah shotgun -b test_aln.cram -f test_ref.fasta --mode use_quality_scores`
`viloca run -b test_aln.cram -f test_ref.fasta --mode use_quality_scores`

### Parameters
There are several parameters available:
Expand Down
631 changes: 480 additions & 151 deletions poetry.lock

Large diffs are not rendered by default.

8 changes: 4 additions & 4 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
[tool.poetry]
name = "ShoRAH"
name = "VILOCA"
version = "0.1.0"
description = "SHOrt Reads Assembly into Haplotypes"
license = "GPL-3.0-only"
authors = ["Benjamin Langer <[email protected]>"]
authors = ["Benjamin Langer <[email protected]>, Lara Fuhrmann <[email protected]>"]
build = "build.py"
packages = [
{ include = "shorah" }
{ include = "viloca" }
]

[tool.poetry.scripts]
shorah = 'shorah.cli:main'
viloca = 'viloca.cli:main'

[tool.poetry.dependencies]
python = ">=3.9.9,<3.11"
Expand Down
2 changes: 1 addition & 1 deletion tests/data_1/shotgun_test.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/bin/bash

shorah shotgun -a 0.1 -w 201 -x 100000 -p 0.9 -c 0 \
viloca run -a 0.1 -w 201 -x 100000 -p 0.9 -c 0 \
-r HXB2:2469-3713 -R 42 -f test_ref.fasta -b test_aln.cram --out_format csv "$@"
2 changes: 1 addition & 1 deletion tests/data_5/shotgun_prepare.sh
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
#!/bin/bash

shorah shotgun -a 0.1 -w 42 -x 100000 -p 0.9 -c 0 -r REF:43-273 -R 42 -b test_aln.cram -f ref.fasta
viloca run -a 0.1 -w 42 -x 100000 -p 0.9 -c 0 -r REF:43-273 -R 42 -b test_aln.cram -f ref.fasta
2 changes: 1 addition & 1 deletion tests/test_b2w.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import filecmp
import os
import glob
from shorah import b2w, tiling
from viloca import b2w, tiling
import math
import libshorah

Expand Down
24 changes: 18 additions & 6 deletions tests/test_b2w_mapping.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
from array import array
import pytest
from cigar import Cigar
from shorah import b2w
from viloca import b2w
import hashlib

class MockAlignedSegment:
def __init__(self, query_name: str, reference_start: int, query_sequence: str, cigarstring: str):
Expand Down Expand Up @@ -35,10 +36,10 @@ def add_indels(self, indels_map):
cnt = self.reference_start
for i in self.cigartuples:
if i[0] == 1: # insert TODO Justify -1
indels_map.append((self.query_name, self.reference_start, hash(self.cigarstring), cnt-1, i[1], 0)) # cnt-1
indels_map.append((self.query_name, self.reference_start, hashlib.sha1(self.cigarstring.encode()).hexdigest(), cnt-1, i[1], 0)) # cnt-1
elif i[0] == 2: # del
for k in range(i[1]):
indels_map.append((self.query_name, self.reference_start, hash(self.cigarstring), cnt+k, 0, 1))
indels_map.append((self.query_name, self.reference_start, hashlib.sha1(self.cigarstring.encode()).hexdigest(), cnt+k, 0, 1))
cnt += i[1]
else:
cnt += i[1]
Expand Down Expand Up @@ -326,11 +327,22 @@ def test_run_one_window(mArr, spec, window_length, window_start, extended_window
mock_dict = mocker.MagicMock()
mock_dict.__getitem__.return_value = 42

arr, _, _, _, _, _ = b2w._run_one_window(
# added by Lara
original_window_length = window_length
control_window_length = window_length

if extended_window_mode:
for pos, val in max_indel_at_pos.items():
if window_start <= pos < window_start + original_window_length:
control_window_length += val


arr, _, _, _, = b2w._run_one_window(
mock_samfile,
window_start,
window_start, # 0 based
"HXB2-does-not-matter",
window_length,
control_window_length,
0,
mock_dict,
0,
Expand All @@ -343,4 +355,4 @@ def test_run_one_window(mArr, spec, window_length, window_start, extended_window
print(arr)

for idx, el in enumerate(arr):
assert el.split("\n")[1] == spec[idx]
assert el.split("\n")[1] == spec[idx]
2 changes: 1 addition & 1 deletion tests/test_envp_post.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from unittest.mock import patch, mock_open
from shorah import envp_post
from viloca import envp_post

DEFAULT_MOCK_DATA = "default mock data"

Expand Down
4 changes: 2 additions & 2 deletions tests/test_pooled_post.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from unittest.mock import patch, mock_open
from shorah import pooled_post
from viloca import pooled_post
import numpy as np

DEFAULT_MOCK_DATA = "default mock data"
Expand Down Expand Up @@ -48,4 +48,4 @@ def open_side_effect(name):
# open("debug/w-HXB2-2938-3138.dbg"),
# open("support/w-HXB2-2938-3138.reads-support.fas"),
# open("corrected/w-HXB2-2938-3138.reads-cor.fas"),
# "shorah") # TODO
# "shorah") # TODO
4 changes: 2 additions & 2 deletions tests/test_pooled_pre.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import pysam
import os
from shorah import pooled_pre
from viloca import pooled_pre

def test__annotate_alignment_file():
out = "out.bam"
Expand Down Expand Up @@ -48,4 +48,4 @@ def test_pre_process_pooled():
os.remove(out + ".bai")

assert a[0] == a[1] != 0
assert a[2] == 0
assert a[2] == 0
4 changes: 2 additions & 2 deletions tests/test_shorah_snv.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import pytest
from shorah.shorah_snv import _compare_ref_to_read, SNP_id, SNV
from viloca.shorah_snv import _compare_ref_to_read, SNP_id, SNV


@pytest.mark.parametrize("ref, seq, spec", [
Expand Down Expand Up @@ -39,4 +39,4 @@ def test_compare_ref_to_read(ref, seq, spec):

assert snp == spec

assert tot_snv == len(snp)
assert tot_snv == len(snp)
2 changes: 1 addition & 1 deletion tests/test_tiling.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from shorah import tiling
from viloca import tiling
import pytest

def test_equispaced():
Expand Down
File renamed without changes.
2 changes: 1 addition & 1 deletion shorah/__main__.py → viloca/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
- https://docs.python.org/2/using/cmdline.html#cmdoption-m
- https://docs.python.org/3/using/cmdline.html#cmdoption-m
"""
from shorah.cli import main
from viloca.cli import main

if __name__ == "__main__":
main()
Loading