Releases: Gabaldonlab/jloh
Releases · Gabaldonlab/jloh
v0.20.0
Log of changes:
- changed
--hybrid
to--assign-blocks
injloh extract
to facilitate understanding of the parameter's function - Fixed issue in
stats
when having zero homozygous SNPs, which would assign everything to "homo" - Fixed issue in
stats
which would return "nan" for quantiles when having no homozygous SNP - Fixed issue in
stats
which was returning 0 SNPs as a threshold, the minimum is now set to 1 - Minor changes in the stderr readout of
jloh extract
, increased readability - Changed behavior of pybedtools in terms of
tmp
folder. Now associated with an alphanumeric string to avoid overwriting of tmp files when working in the same tmpdir with two jloh instances (shell issue).
v0.19.0
Log of changes:
- Updated g2g module. Now it presents output as BED file, and filters it by minimum length of g2g block.
- Changed algorithm of the sim module. Now the simulation resembles more a real genome. The genome is broken into haplotypes of variable size sampled from a distribution centered around a mean that the user can pass as a parameter (--mean-haplotype-size). Each haplotype is assigned a divergence rate. The divergence is variable, and like the haplotype size, is sampled from a distribution centered around the one declared in --divergence. Then, based on the fraction of --loh declared by the user, some of these haplotypes are assigned divergence = 0. Then, the algorithm adjusts the divergence rate of the non-LOH haplotypes to bring the total average back to --divergence. Finally, random mutations are introduced in each haplotype based on their assigned divergence rate.
- Added GPL3.0 licence
v0.18.0
Log of changes:
- using
--min-snps-kbp
instead of simple--min-snps
. This because a small block with 10 SNPs is much more SNP-dense than a 10 kbp blockwith only 10 SNPs. This applies to jloh stats and jloh extract. - fixed help section of jloh stats which still said "density"
- removed the
--snp-distance
parameter. It is now inferred from the values of--min-snps-kbp
. This reduces the parameters for the user, making it a more comfortable tool to use. - added the
--skip-trimming
parameter to the nextflow workflow - made jloh stats multithreaded
- changed some syntax in jloh stats to make it compatible with python from version 3.6 instead of from 3.9
- Fixed the usage of jloh extract in the workflows, according to the modifications listed above
- Added a step where the bam file is subdivided in separate files by chromosome, which made the code run ~10x faster
- Changed default suggested
--min-snps-kbp
parameter setting from 5% quantile to 50% quantile in thejloh stats
module, following our findings reported in the manuscript
v0.17.0
Log of changes:
- Added the intersect module to perform intersections/removals with the output files of two runs (similar to bedtools)
- Added the chimeric module to find genes harboring chimeras between two different haplotypes
- Added the junctions module to calculate statistics on neighboring blocks from different origins (REF, ALT, or simply from two different calls)
- Fixed a small bug in the default mode, that was not producing the heterozygous BED file.
- Modified the density module to produce more values. Specifically, besides the mean snp density, it calculates a distribution and extracts the quantiles. It does the same for SNP distances. These values are useful to set as thresholds in JLOH extract, so it produces an estimation of which parameters would fit the best. Module has been renamed as jloh stats.
- modified the
--min-snps
and the--snp-distance
parameter in a way that two values have to be passed, one for heterozygous and one for homozygous SNPs. These values can be estimated withjloh stats
, or passed by the user. - All heterozygous blocks within
--min-length - 1
from each other are now merged before the generation of REF and ALT LOH blocks.
v0.16.2
Log of changes:
- Fixed bug in printing the *tsv file that was generating two lines per record on the B subgenome in a few cases
- Fixed small bugs in the
run_with_real_data.nf
workflow script - Fixed small bug in
jloh sim
v0.16.0
Log of changes:
- Updated nextflow workflow with
--hybrid
mode - Fixed
jloh sim
in how it finds regions of relevance, reducing false positives injloh extract
- Fixed bug in default mode that was not using the VCF files properly
- Re-introduced writing of BED file with heterozygous regions that are discarded in the first step of
jloh extract
. - New parameter in
jloh sim
:--loh-mean-length
, which controls the average length of any introduced LOH block, defaults at 5000 (before it was hardcoded as 1000 bp) - Adjusted parameters in the --default/--sensitive/--relaxed modes of
jloh g2g
- Fixed bug in
jloh extract
at the stage of the LOH candidates that saw intervals starting from position "-1". - Adjusted output of
jloh extract
so that thetsv
file has blocks in 1-based coordinates while thebed
file has blocks in 0-based half-open coordinates. - Fixed bug in
jloh sim
that was not introducing variants when divergence was < 0.05 - Added output file in
jloh sim
: the non-divergent file, containing regions where no variation has been introduced.
v0.15.0
Log of changes:
jloh g2g
now creates two separate BED files with regions, one per parental subgenome. This fixes an issue arising when they have chromosomes with the same names.- The same is done for
jloh extract
as well, now. Streams of LOH blocks are now left separate so that when same chromosome names are there in the two parents, they don't get confused. - In
jloh extract
, the "candidates" file is tsv now, not bed. - Fixed an issue in
jloh extract
andjloh g2g
when removing the temporary folder. - Change
jloh sim
script entirely to make the code more readable. Now works with parallel threads too. g2g
now has a --sensitive and a --relaxed mapping parameter, that defines how nucmer matches are found.
v0.14.0
Log of changes:
- New module:
JLOH sim
. This module creates a copy of a genome with some divergence and LOH introduced. It is based on a script that is part of the redundans tool: fasta2diverged.py. This script has been updated to account for more functions and to work with hybrid genomes as well (i.e. producing two copies of a genome, with some mutations each, some of which homozygous). JLOH g2g
now uses--est-divergence
instead of--min-identity
, so the user can specify a value of divergence between subgenomes. The value goes from 0 to 1, where e.g. 0.01 is 1% divergence. This value is used to limit which mapping results to keep (1-min-identity = divergence)JLOH g2g
now produces a BED file with regions to KEEP, not to discard. See next comment.JLOH extract
now uses a--regions
BED file instead of a--mask
BED file. This file is still produced byJLOH g2g
. This allows for more control with BEDtools intersect.
v0.13.0
Log of changes:
- new parameter of
JLOH extract
:--mask
, which allows you to pass a BED file containing regions that you don't want to include in the final list of blocks. - new module:
JLOH g2g
: allows you to map one genome onto the other and produce the--mask
file.
v0.12.3
Log of changes:
- Fixed treatment of temporary files, now a folder is created within
--output-dir
and this folder is removed at the end of the process