Releases: epi2me-labs/wf-human-variation
Releases · epi2me-labs/wf-human-variation
v2.2.0
Added
- Output
{{sample}}.stats.json
file describing some key metrics for the analysis. - Summary of gene coverage if a 4-column BED is provided.
- Automated sex determination using relative coverage of chrX and chrY.
- Retry strategy added to
snp:aggregate_pileup_variants
to prevent out of memory error.
Changed
--GVCF --phased
will produce a phased GVCF.- Changed default phasing algorithm to
whatshap
, with the possibility to change the phasing tolongphase
with--use_longphase true
.- The intermediate phasing is still performed using
longphase
.
- The intermediate phasing is still performed using
- Setting
--snp --sv --phased
will emit individually phased SNPs and SVs. - Phased bedMethyl files now follow the pattern
{{ alias }}.wf_mods.{{ haplotype }}.bedmethyl.gz
. --sex
parameter usesXX
andXY
rather than "female" and "male".- Update
modkit
to v0.2.6. - Improved modkit runtime by increasing default threads and increasing the default interval size.
- Improved modkit runtime by increasing the default interval size and running modkit on individual contigs.
modkit
is now run only on chromosomes 1-22, X, Y and MT, unless--include_all_ctgs
is provided.- Increased minimum CPU requirement for the workflow to 16.
- Filtering of SVs using a BED file now includes sites only partially overlapping the specified regions.
basecaller_cfg
will be inferred from thebasecall_model
DS key of input read groups, if available- Providing
--basecaller_cfg
will not be required ifbasecall_model
is present in the DS tag of the read groups of the input BAM --basecaller_cfg
will be ignored if abasecall_model
is found in the input BAM
- Providing
- Reconciled workflow with wf-template v5.1.2
- Update to Clair3 v1.0.8.
- Update to longphase v1.7.1.
Fixed
- Update schema to allow selection of multiple BAM files per sample in EPI2ME.
- Spectre CNV report not handling cases when no CNVs detected.
- Lines denoting normal maximum and pathogenic minimum thresholds now correctly displayed on STR repeat content plots.
- Workflow will not emit
sample.cram
ifsample.haplotagged.cram
has been created by the workflow to save storage. - Emitting nonsense
input.1
file
Removed
- Single-step joint phasing of SV and SNP.
--output_separate_phased
as the workflow emits only individually phased VCFs.- A copy of the reference and the generated reference cache is no longer output by the workflow.
- The workflow encourages use of readily available standard reference sequences, so re-emitting the input reference as a workflow output is unnecessarily consuming disk space.
v2.1.0
Changed
- ClinVar version in SnpEff container updated to version 20240307.
- Convert to BAM only when
--cnv --use_qdnaseq
is selected. - Update to Clair3 v1.0.6.
- Update Spectre to fix an error when parsing Clair3 VCFs with multiple AFs.
- Support for an input folder of multiple BAM files per sample with
--bam
(instead of only allowing a single BAM per sample). refine_with_sv
to be run by chromosome in order to reduce memory footprint.
Fixed
- Force minimap2 to clean up memory more aggressively. Empirically this reduces peak-memory use over the course of execution.
- Handling of input VCF files with
--vcf_fn
. --phased --sv --snp
generates a truncated VCF file when#
appears in the VCFINFO
field- Some reporting scripts using too much memory.
Removed
- CRAM as supported input format.
old_ref
parameter as providing the reference of an existing CRAM is no longer needed.
v2.0.0
Changed
- CNV calling with
--cnv
is now performed using Spectre, which is optimised for long reads.- Legacy CNV calling using QDNAseq may still be carried out with
--cnv --use_qdnaseq
. - The bin size parameter has been renamed from
--bin_size
to--qdnaseq_bin_size
.
- Legacy CNV calling using QDNAseq may still be carried out with
- Skip CNV CRAM to BAM conversion if downsampling is required, to avoid creating an unnecessary intermediate file.
- The output of
--depth_intervals
now has.bedgraph.gz
extension. - SV workflow outputs SVs in the autosomes, sex chromosomes and MT; use
--include_all_ctgs
to output calls on all the sequences.
Added
- Output definitions for coverage files.
- N50 and mean coverage added to alignment report.
Fixed
- EPI2ME Desktop incorrectly allowed selection of directory for
tr_bed
. failedQCReport
failing to generate a report.
v1.11.0
Changed
- Add an additional
whatshap haplotag
process after the final VCF phasing. - Updates to the phasing subworkflow significantly impact the runtime and storage requirement for the workflow, as detailed here.
- Several performance improvements which should noticeably reduce the running time of the workflow
Fixed
- Updated the version of Straglr, which addresses the following:
- Repeats can now be called in RFC1
- Start position of called STRs is 1-based rather than 0-based
- VCF headers now match those in the
FORMAT
field
- Generate
allChromosomes.bed
usingsamtools faidx
index instead ofpyfaidx
, to avoid a KeyError - Inconsistent file ownership of bundled Clair3 model files which could lead to subuid errors in some environments
v1.10.1
Fixed
- Bug report form.
v1.10.0
Added
- Clair3 4.3.0 models.
Changed
--phase_vcf
,--joint_phasing
and--phase_mod
are now deprecated for--phased
; see the README for more details.--use_longphase_intermediate
is now deprecated, and--use_longphase false
will usewhatshap
throughout consistently- Running
--phase --snp --use_longphase false
will now phase indels too --basecalling_cfg
currently provides the configuration to Clair3.- The
clair3:
prefix to Clair3 specific models is no longer required.
Fixed
- CNV report generation fails if there is no consensus on the copy number of a chromosome
Undetermined
category has been added to theChromosome Copy Summary
to account for these cases
readStats
reports metrics on the downsampled BAM when--downsample_coverage
is requested.- Spurious warning for missing MM/ML tags when a BAM fails the coverage threshold
Removed
- wf-basecalling subworkflow
- fast5_dir input and other basecalling related options have been removed from the workflow parameters
- Users should run the standalone wf-basecalling workflow and provide the output to wf-human-variation
- Mapula statistics with
--mapula
v1.9.2
Fixed
--joint_phasing
generating single-chromosome VCF files.
v1.9.1
Changed
- ClinVar annotation of SVs has been temporarily removed due to not being correctly incorporated. SnpEff annotations are still produced as part of the final SV VCF.
- New documentation
Removed
--annotation_threads
parameter, as the SnpEff process does not support multithreading.
Fixed
- Truncated SV VCF header generated from
vcfsort
. sed
crashing with I/O error in some instances.- Missing flagstats file in output directory.
v1.9.0
Added
- STR workflow report now includes additional plots which display repeat units and interruptions in each supporting read
- CNV workflow now outputs an indexed VCF file to the output directory
Changed
- Legend symbols in STR genotpying plot
- Unambiguous naming of bedMethyl files generated with
--mod
- Unphased outputs will have the pattern
[sample_name].wf_mods.bedmethyl.gz
- Phased outputs will have the pattern
[sample_name]_[1|2|ungrouped].wf_mods.bedmethyl.gz
- Unphased outputs will have the pattern
Fixed
- Report step failing if bcftools stats file has only some sub-sections
- Clair3 ignoring the bed file
- merge_haplotagged_contigs incorrectly generating intermediate CRAM when input is BAM
- STR content generation failing due to forward slash in disease name in
variant_catalog_hg38.json
- Report name for the read alignment statistics now follows the pattern
[sample_name].wf-human-alignment-report.html
v1.8.3
Fixed
- configure-jbrowse breaking on unescaped spaces