Skip to content

Commit

Permalink
v1.4.0 update (#32)
Browse files Browse the repository at this point in the history
* bump v1.4.0

* bump v1.4.0

* Update nv_te_analyzer.py

* Update nanovar

* Update nv_report.py

* Update nv_characterize.py

* Revert "Update nv_report.py"

This reverts commit eb494aa.

* Update nanovar

Add pysam.faidx for temp1.fa

* Revert "Revert "Update nv_report.py""

This reverts commit 1445a1b.

* Revert "Update nv_characterize.py"

This reverts commit 762d408.

* Update nv_te_analyzer.py

* Remove print debug

* Edit Travis

* Update basic capabilities

* Update quickrun

* update requirements.txt

* update report
  • Loading branch information
cytham authored Sep 8, 2021
1 parent 7a3ca7a commit b6bfe6a
Show file tree
Hide file tree
Showing 28 changed files with 2,068 additions and 1,698 deletions.
16 changes: 15 additions & 1 deletion CHANGELOG.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,23 @@ NanoVar Changelog
Release Summary:


Version 1.4.0 - Sept 1, 2021
* Implemented a large cytogenetic variation detection algorithm through CytoCAD (Add the paramenter "--cnv hg38" during run)
* Added LINE (L1) and SINE (Alu) novel insertion detection functionality (NanoVar screens the sequence of INS SVs
for L1 and Alu elements and output the results in the INFO column of VCF file (E.g. TE=L1HS)
* Updated curated hg38 filter file (added all N regions)
* Expanded the CIGAR reading values to include '=' and 'X'
* Improved breakpoint clustering algorithm and rectified bugs
* Modified setup.py to state compatibility with python3.8
* Fixed Numpy VisibleDeprecationWarning in nv_report.py
* Added '--pickle' arguement for debugging purposes (Hidden option)
* Added '--archivefasta' arguement for debugging purposes (Hidden option
* Added '--blastout' arguement for debugging purposes (Hidden option)


Version 1.3.9 - Mar 24, 2021
* Fixed nv_detect_algo insertion and deletion large size bug
* Added pysam >=0.15.4 into bioconda metal.yml as prerequisite
* Added pysam >=0.15.3 into bioconda metal.yml as prerequisite
* Added pybedtools >=0.8.2 prerequisite to fixed RuntimeWarning buffering=1 error (Refer to https://github.com/daler/pybedtools/issues/322)
* Prevent repeated read-indexes by adjusting seed (Thanks to Geoffrey Woodland)
* Improve read cluster exception message (Thanks to Geoffrey Woodland)
Expand Down
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ recursive-include nanovar/gaps *.bed
recursive-include nanovar/model *.h5
recursive-include nanovar/css *.css
recursive-include nanovar/js *.js
recursive-include nanovar/ref *L1*
13 changes: 9 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<br/><br/>

## NanoVar - Structural variant caller using low-depth long-read sequencing
[![Build Status](https://travis-ci.org/cytham/nanovar.svg?branch=master)](https://travis-ci.com/cytham/nanovar)
[![Build Status](https://app.travis-ci.com/cytham/nanovar.svg?branch=master)](https://app.travis-ci.com/github/cytham/nanovar)
[![PyPI pyversions](https://img.shields.io/pypi/pyversions/nanovar)](https://pypi.org/project/nanovar/)
[![PyPI versions](https://img.shields.io/pypi/v/nanovar)](https://pypi.org/project/nanovar/)
[![Conda](https://img.shields.io/conda/v/bioconda/nanovar)](https://anaconda.org/bioconda/nanovar)
Expand All @@ -26,23 +26,28 @@ NanoVar is a genomic structural variant (SV) caller that utilizes low-depth long
* Requires 4x and 8x sequencing depth for detecting homozygous and heterozygous SVs respectively.
* Rapid computational speed (Takes <3 hours to map and analyze 12 gigabases datasets (4x) using 24 CPU threads)
* Approximates SV genotype
* Detect large chromosomal copy-number variation using [CytoCAD](https://github.com/cytham/cytocad)
* Identifies full-length LINE and SINE insertions (Marked by "TE=" in the INFO column of VCF file)

## Getting Started

### Quick run

```
nanovar [Options] -t 24 -f hg38 sample.fq/sample.bam ref.fa working_dir
nanovar [Options] -t 24 -f hg38 --cnv hg38 sample.fq/sample.bam ref.fa working_dir
```

| Parameter | Argument | Comment |
| :--- | :--- | :--- |
| `-t` | num_threads | Indicate number of CPU threads to use |
| `-f` (Optional) | gap_file (Optional) | Choose built-in gap BED file or specify own file to exclude gap regions in the reference genome. Built-in gap files include: hg19, hg38 and mm10|
| `--cnv` | hg38 | Perform large CNV detection using CytoCAD (Only works for hg38 genome)
| - | sample.fq/sample.bam | Input long-read FASTA/FASTQ file or mapped BAM file |
| - | ref.fa | Input reference genome in FASTA format |
| - | working_dir | Specify working directory |

See [wiki](https://github.com/cytham/nanovar/wiki) for entire list of options.

#### Output
| Output file | Comment |
| :--- | :--- |
Expand All @@ -61,7 +66,7 @@ There are three ways to install NanoVar:
# Installing from bioconda automatically installs all dependencies
conda install -c bioconda nanovar
```
#### Option 2: Pip (See dependencies below)
#### Option 2: PyPI (See dependencies below)
```
# Installing from PyPI requires own installation of dependencies, see below
pip install nanovar
Expand Down Expand Up @@ -145,6 +150,6 @@ Although NanoVar is provided with a universal model and threshold score, instruc

* For BND SVs, NanoVar is unable to calculate the actual number of SV-opposing reads (normal reads) at the novel adjacency as
there are two breakends from distant locations. It is not clear whether the novel adjacency is derived from both or either
breakends, in cases of balanced and unbalanced variants, and therefore its not possible to know which breakend location(s) to
breakends, in cases of balanced and unbalanced variants, and therefore it is not possible to know which breakend location(s) to
consider for counting normal reads. Currently, NanoVar approximates the normal read count by the minimum count from either
breakend location. Although this helps in capturing unbalanced BNDs, it might lead to some false positives.
70 changes: 0 additions & 70 deletions nanovar/gaps/hg19_filter.bed

This file was deleted.

238 changes: 238 additions & 0 deletions nanovar/gaps/hg38_curated_filter_main.bed
Original file line number Diff line number Diff line change
@@ -0,0 +1,238 @@
chr1 0 10400 10000 telomere
chr1 207266 258066 50000 Nregion
chr1 297568 348368 50000 Nregion
chr1 535588 586388 50000 Nregion
chr1 2702381 2746690 43509 Nregion
chr1 12953984 13004784 50000 Nregion
chr1 16798763 16849563 50000 Nregion
chr1 29551833 29554235 1602 Nregion
chr1 121619600 143200400 21580000 centromere,Nregion
chr1 223558535 223609335 50000 Nregion
chr1 228557964 228608764 50000 Nregion
chr1 248946022 248956822 10000 telomere
chr10 0 10400 10000 telomere
chr10 38499600 42217400 3717000 centromere,Nregion
chr10 47779968 47870768 90000 Nregion
chr10 133690066 133740866 50000 Nregion
chr10 133787022 133797822 10000 telomere
chr11 0 60400 60000 telomere,Nregion
chr11 50819600 54550400 3730000 centromere,Nregion
chr11 70955296 71056096 100000 Nregion
chr11 87977802 88003296 24694 Nregion
chr11 135076222 135087022 10000 telomere
chr12 0 10400 10000 telomere
chr12 7083250 7085050 1000 Nregion
chr12 34718600 37260400 2541000 centromere,Nregion
chr12 132222962 132224762 1000 Nregion
chr12 133264909 133275709 10000 telomere
chr13 0 18410400 18410000 telomere,centromere,Nregion
chr13 86202579 86253379 50000 Nregion
chr13 111703455 111754255 50000 Nregion
chr13 111793041 111843841 50000 Nregion
chr13 113672620 113723420 50000 Nregion
chr13 114353928 114364728 10000 telomere
chr14 0 19700400 19700000 telomere,centromere,Nregion
chr14 106883318 107044118 160000 telomere,Nregion
chr15 0 22400400 22400000 telomere,centromere,Nregion
chr15 23226474 23277274 50000 Nregion
chr15 84269666 84320466 50000 Nregion
chr15 101980789 101991589 10000 telomere
chr16 0 10400 10000 telomere
chr16 18436086 18486886 50000 Nregion
chr16 33214195 33264995 50000 Nregion
chr16 33392011 33442811 50000 Nregion
chr16 34288929 34339729 50000 Nregion
chr16 34521110 34571910 50000 Nregion
chr16 34576405 34581365 4160 Nregion
chr16 36259600 46420400 10160000 centromere,Nregion
chr16 90227945 90338745 110000 telomere,Nregion
chr17 0 60400 60000 telomere,Nregion
chr17 447788 489387 40799 Nregion
chr17 21659600 26940400 5280000 centromere,Nregion
chr17 81742142 81792942 50000 Nregion
chr17 81795881 81798127 1446 Nregion
chr17 83247041 83257841 10000 telomere
chr18 0 10400 10000 telomere
chr18 15409600 20940400 5530000 centromere,Nregion
chr18 46969512 47020312 50000 Nregion
chr18 80262885 80373685 110000 telomere,Nregion
chr19 0 60400 60000 telomere,Nregion
chr19 24439600 27245400 2805000 centromere,Nregion
chr19 58607216 58618016 10000 telomere
chr2 0 10400 10000 telomere
chr2 16144719 16146519 1000 Nregion
chr2 32866730 32868530 1000 Nregion
chr2 32916225 32918025 1000 Nregion
chr2 89259600 94510400 5250000 centromere,Nregion
chr2 97439218 97490018 50000 Nregion
chr2 242183129 242193929 10000 telomere
chr20 0 60400 60000 telomere,Nregion
chr20 26363840 26365814 1174 Nregion
chr20 26379600 31170400 4790000 centromere,Nregion
chr20 64333767 64444567 110000 telomere,Nregion
chr21 0 5010400 5010000 telomere,Nregion
chr21 5165846 5216646 50000 Nregion
chr21 5393158 5443958 50000 Nregion
chr21 5448612 5499412 50000 Nregion
chr21 5627196 5677996 50000 Nregion
chr21 5795609 5846409 50000 Nregion
chr21 5916193 5966993 50000 Nregion
chr21 6160971 6211771 50000 Nregion
chr21 6376858 6427658 50000 Nregion
chr21 6579781 6630581 50000 Nregion
chr21 6738685 6789485 50000 Nregion
chr21 6933819 6984619 50000 Nregion
chr21 7149127 7199927 50000 Nregion
chr21 7327465 7378265 50000 Nregion
chr21 7500490 7551290 50000 Nregion
chr21 7693300 7744100 50000 Nregion
chr21 7865346 7916146 50000 Nregion
chr21 8049439 8100239 50000 Nregion
chr21 8260571 8311371 50000 Nregion
chr21 8471960 8522760 50000 Nregion
chr21 8706315 8757115 50000 Nregion
chr21 8886204 8987004 100000 Nregion
chr21 9195687 9246487 50000 Nregion
chr21 9376743 9527543 150000 Nregion
chr21 10169468 10270268 100000 Nregion
chr21 10273927 10324727 50000 Nregion
chr21 10809600 12967400 2157000 centromere,Nregion
chr21 43212062 43262862 50000 Nregion
chr21 46699583 46710383 10000 telomere
chr22 0 10510400 10510000 telomere,Nregion
chr22 10784243 10835043 50000 Nregion
chr22 10874172 10924972 50000 Nregion
chr22 10966324 11017124 50000 Nregion
chr22 11068587 11119387 50000 Nregion
chr22 11160521 11211321 50000 Nregion
chr22 11377656 11428456 50000 Nregion
chr22 11496937 11547737 50000 Nregion
chr22 11630888 11681688 50000 Nregion
chr22 11724229 11775029 50000 Nregion
chr22 11977155 12027955 50000 Nregion
chr22 12225188 12275988 50000 Nregion
chr22 12438290 12489090 50000 Nregion
chr22 12641330 12692130 50000 Nregion
chr22 12725804 12776604 50000 Nregion
chr22 12817737 12868537 50000 Nregion
chr22 12899600 15300400 2400000 centromere,Nregion
chr22 16279272 16303243 23171 Nregion
chr22 16303896 16305827 1131 Nregion
chr22 18238729 18339529 100000 Nregion
chr22 18433113 18483913 50000 Nregion
chr22 18659164 18709964 50000 Nregion
chr22 49973465 49975765 1500 Nregion
chr22 50808068 50818868 10000 telomere
chr3 0 10400 10000 telomere
chr3 90564895 90569228 3533 Nregion
chr3 90722058 90772858 50000 Nregion
chr3 91249505 91256821 6516 Nregion
chr3 91257490 91260580 2290 Nregion
chr3 91264981 91277394 11613 Nregion
chr3 91549600 93706400 2156000 centromere,Nregion
chr3 198235159 198295959 60000 telomere,Nregion
chr4 0 10400 10000 telomere
chr4 1428958 1434606 4848 Nregion
chr4 1435394 1441952 5758 Nregion
chr4 8797077 8816877 19000 Nregion
chr4 9272516 9323316 50000 Nregion
chr4 31818895 31832969 13274 Nregion
chr4 32832616 32839416 6000 Nregion
chr4 49079600 51810400 2730000 centromere,Nregion
chr4 58878393 58921781 42588 Nregion
chr4 190122721 190173521 50000 Nregion
chr4 190204155 190214955 10000 telomere
chr5 0 10400 10000 telomere
chr5 17530148 17580948 50000 Nregion
chr5 46429600 50120400 3690000 centromere,Nregion
chr5 139452259 139454059 1000 Nregion
chr5 155759924 155761724 1000 Nregion
chr5 181477859 181538659 60000 telomere,Nregion
chr6 0 60400 60000 telomere,Nregion
chr6 58449600 60230400 1780000 centromere,Nregion
chr6 61356629 61363466 6037 Nregion
chr6 95020390 95071190 50000 Nregion
chr6 167590993 167641793 50000 Nregion
chr6 170745579 170806379 60000 telomere,Nregion
chr7 0 10400 10000 telomere
chr7 237446 240642 2396 Nregion
chr7 58099600 60900400 2800000 centromere,Nregion
chr7 61327388 61378188 50000 Nregion
chr7 61527620 61578420 50000 Nregion
chr7 61963769 61967463 2894 Nregion
chr7 61975704 62026504 50000 Nregion
chr7 62456379 62507179 50000 Nregion
chr7 143650404 143701204 50000 Nregion
chr7 159335573 159346373 10000 telomere
chr8 0 60400 60000 telomere,Nregion
chr8 7616727 7667527 50000 Nregion
chr8 12233945 12284745 50000 Nregion
chr8 43979600 45950400 1970000 centromere,Nregion
chr8 85663822 85714622 50000 Nregion
chr8 145078236 145139036 60000 telomere,Nregion
chr9 0 10400 10000 telomere
chr9 41225586 41229778 3392 Nregion
chr9 43219600 60520400 17300000 centromere,Nregion
chr9 60688032 60738832 50000 Nregion
chr9 60779121 60829921 50000 Nregion
chr9 61003487 61054287 50000 Nregion
chr9 61231566 61282366 50000 Nregion
chr9 61468408 61519208 50000 Nregion
chr9 61734968 61785768 50000 Nregion
chr9 62149338 62250138 100000 Nregion
chr9 62748432 62799232 50000 Nregion
chr9 62957971 63008771 50000 Nregion
chr9 63202462 63253262 50000 Nregion
chr9 63491864 63542664 50000 Nregion
chr9 63918047 63968847 50000 Nregion
chr9 64134613 64185413 50000 Nregion
chr9 64214762 64315562 100000 Nregion
chr9 64997724 65048524 50000 Nregion
chr9 65079682 65130482 50000 Nregion
chr9 65324723 65375523 50000 Nregion
chr9 65594791 65645591 50000 Nregion
chr9 66390987 66591787 200000 Nregion
chr9 67920152 68220952 300000 Nregion
chr9 134182692 134185936 2444 Nregion
chr9 138334317 138395117 60000 telomere,Nregion
chrX 0 10400 10000 telomere
chrX 44421 95221 50000 Nregion
chrX 133471 222746 88475 Nregion
chrX 1948945 2133394 183649 Nregion
chrX 37098862 37286237 186575 Nregion
chrX 49347994 49528794 180000 Nregion
chrX 50228564 50279364 50000 Nregion
chrX 58539600 62500400 3960000 centromere,Nregion
chrX 114280798 114331598 50000 Nregion
chrX 115738549 115839349 100000 Nregion
chrX 116557379 116595966 37787 Nregion
chrX 120878981 120929781 50000 Nregion
chrX 144425206 144476006 50000 Nregion
chrX 156030495 156041295 10000 telomere
chrY 0 10400 10000 telomere
chrY 44421 95221 50000 Nregion
chrY 133471 222746 88475 Nregion
chrY 1948945 2133394 183649 Nregion
chrY 9046514 9055574 8260 Nregion
chrY 9057208 9108008 50000 Nregion
chrY 9113919 9116771 2052 Nregion
chrY 9403313 9454113 50000 Nregion
chrY 10266544 10600400 333056 centromere,Nregion
chrY 10633040 10646233 12393 Nregion
chrY 10649589 10651821 1432 Nregion
chrY 10673658 10676944 2486 Nregion
chrY 10679315 10682842 2727 Nregion
chrY 10693792 10744592 50000 Nregion
chrY 10852500 10855535 2235 Nregion
chrY 10890019 10892342 1523 Nregion
chrY 10896125 10898584 1659 Nregion
chrY 10922086 10923964 1078 Nregion
chrY 10965294 10967684 1590 Nregion
chrY 11592502 11643302 50000 Nregion
chrY 11659974 11662581 1807 Nregion
chrY 20207393 20258193 50000 Nregion
chrY 21739142 21741841 1899 Nregion
chrY 21788881 21805681 16000 Nregion
chrY 26672814 56673614 30000000 Nregion
chrY 56771109 56821909 50000 Nregion
chrY 57217015 57227815 10000 telomere
Loading

0 comments on commit b6bfe6a

Please sign in to comment.