Releases: PapenfussLab/gridss
2.5.0
Improves worst-case assembly performance. Worse-case runtime and memory usage should be much improved.
- Added new safety check to prevent assembly stalling
- Reduced peak memory by downsampling each read instead of loading entire region then downsampling
- Tighter default assembly configuration parameters (reduces active size of assembly graph)
- Exporting BED files containins assembly timeouts/safety abort regions to assembly working directory
- Additional assembly runtime optimisations
- Improved usability of wrapper script #220 #217.
- Better error messages and more environment checks (e.g. check for java 1.8+)
2.4.0
Improved performance, usability, and reduced the minimum event size reported to 10bp.
- Minimum callable event size called reduced to 10bp
- This parameter is configurable by specifying a
CONFIGURATON_FILE
file containingvariantcalling.minSize=10
- This parameter is configurable by specifying a
- Runtime of assembly step decreased by 27% by optimising data structures used
- Runtime of variant calling (max clique) decreased by 50% by reverting back to parallel processing of all orientations
- Added wrapper shell scripts
- gridss.sh
- this is a wrapper to the full GRIDSS pipeline
- gridss_lite.sh
- this wraps a faster but less sensitive GRIDSS pipeline.
- Sub-clonal and hard to call variants callable by the full GRIDSS pipeline will be missed by this pipeline.
- gridss.sh
- #74 Using CTX instead of ITX to match BreakDancer notation in simple event annotator script
- Improved startup times by caching the in-memory 2-bit encoded representation of the reference genome.
- A
REFERENCE.gridsscache
file will be created in the same directory as the reference genome.
- A
- gridss.
2.3.0
Optimisation and bug fix release. The headline feature of this release is the GRIDSS lite driver script
- Added grids_lite.sh driver script. Lower sensitivity but twice as fast
- gridss.sh is now a proper driver script with command line parsing
- Default minimum called event size is now 10bp (down from 32bp)
- Improved error messages
- Unknown chromsomes in
BLACKLIST
bed will now raise an error- This prevents a blacklist with the wrong prefix being used ineffectively (e.g. "chr1" in black vs "1" in reference)
- Added ENCFF001TDO.bed hg19 blacklist without chr prefixes
- Max clique calling for each orientation (
++
,+-
,-+
,--
,-
,+
) is now done in parallel - Q2 tag is optional and is no longer populated by default
- This reduces the size of the intermediate files
- fallback base quality score is 20
- No longer auto-adding contigs to sequence dictionary
- No longer allowing unknown contigs in breakpoint ALT alleles
- Bug fix: IdentifyVariants was not writing to temp files when requested
- Bug fix: no longer recreating sam header for every assembly contig
- Bug fix: async BAM parsing now respects CRC flag
- Fixed race condition in aysnc BAM reader
2.2.3
Bug fix and QOL improvement release
- Added friendly error message if specified BLACKLIST file does not exist
- #199 AnnotateUntemplatedSequence no longer hangs if there are no records to annotate
- #202 Added gridss program group header to GRIDSS generated BAMs
- Added GRIDSS version number to VCF header
- #213 Excluding local assembly anchoring from SC calculation so SC field can be used for variant linking purposes
- Added ASC attribute
- #205 fixed ASSEMBLY_ONLY filter reported for variants with indel support but not SR/RP support
- Added error message when input and output VCFs are the same file
- AnnotateUntemplatedSequence now only writes BEALN field if it does not already exist (useful for viral annotations)
- Fixed some typos and descriptions
- Added max clique telemetry. Enable by specifying a configuration file with
visualisation.maxCliqueTelemetry = true
.
2.2.2
2.2.1
Bug fix release. All users of v2.2.0 should upgrade to this release as v2.2.0 can randomly hang.
- #200 updated to latest version of htsjdk aysnc bam patch
- #189 Using unclipped start/end to determine max mapped read length
- #191 Removed trailing space in header due to warning message in perl VCF parsing library
- GeneratePonBedpe: streaming both BED and BEDPE to minimise memory usage
- Add htsjdk patch for VCF memory leak
2.2.0
This release contains significant performance improvements as well as new tools useful for somatic variant calling, and high-speed calling of targeted regions.
- High-speed calling of targeted regions is now possible
- New command-line utility: gridss.ExtractFullReads
- New command-line utility: gridss.IndexedExtractFullReads
- Added gridss_targeted.sh example script
- New command-line utility: gridss.GeneratePonBedpe
- Used to generate a reference panel of normals for GRIDSS somatic calling.
- The Hartwig Medical Foundation has released a comprehensive hg19 GRIDSS PON based on almost 4000 WGS tumour/normal samples. It can be found at https://resources.hartwigmedicalfoundation.nl
- Increased default minimum read mapq from 10 to 20.
- This results in a lower FDR but fewer calls in questionable regions of the genome
- #191 Removed trailing space in header due to warning message in perl VCF parsing library
- #189 Using unclipped start/end to determine max mapped read length
- Build script has been switched to maven shade to enable direct inclusion of htsjdk and picard patches
- Improved runtime performance through improved parallelism
- Direct inclusion of htsjdk#1264, and htsjdk#1249
- Tweaked default assembly configuration parameters
- Including picard tools patch that enable asynchronous read and parallel processing of GRIDSS and picard tools metrics
2.1.0
This release adds additional command-line utilities and arguments that enables GRIDSS to be run in a targeted manner.
As usual, this release also contains bug fixes and performance enhancements.
- Added ExtractFullReads utility to extract all fragments overlapping a set of regions of interest. Unlike simpler utilities such as
samtools view
, both reads in a read pairs are extracted if either of them overlap a ROI, and all chimeric alignment records are extracted if a split read overlaps a ROI. - AllocateEvidence can now recalculate the read and assembly support independently.
- Fixed a bug causing breakpoints with 1 base of untemplated inserted sequence were being reported as clean breakpoints with 1bp more of reference sequence sequence (i.e. the untemplated inserted sequence is treated as being aligned to one side of the breakpoint even though the base is mismatched).
- Improved multi-threading including a direct patch of samtools/htsjdk#1249
- Added gridss_separate.sh example script which splits out the pipeline steps performed by CallVariants into separate command line invocations.
- Updated htsjdk and picard versions.
Note that since GRIDSS uses the picard command line parser, the picard syntax transition period also applies to GRIDSS.
2.0.1
Minor bug fix release
- CollectGridssMetricsAndExtractSVReads now correctly passes INCLUDE_DUPLICATES command line parameter to ExtractSVReads
- Added configuration option to dump the read names of all supporting reads for each variant call.
- Disabled by default since this massively inflates the VCF file size
2.0.0
GRIDSS output now includes the reporting of single breakends. When the breakpoint sequence is novel or multi-mapping on one side, an unambiguous breakpoint call cannot be made as only one side of the breakpoint is uniquely mappable. In such cases GRIDSS will now call a single breakend. See section 5.4.9 of the VCF file format specifications document for details on how single breakends are represented in VCF.
- Single breakend are now included in the default output VCF
- The command-line tool gridss.AnnotateUntemplatedSequence can be used to identify potential partner breakends both in the reference genome, and against other reference databases (such as viral databases for the identification of integrated DNA virus).
This release also includes the following bug fixes:
- #174 Reporting no inexact homology when a misalignment of the flanking sequence is detected
- #174 No longer reporting negative inexact homology when untemplated inserted sequence is present
- #172 BANRPQ and BANSRQ headers were set to Integer when values were actually Float
- Fixes crash when using GRIDSS with bcftools
Note: Due to their nature, single breakend calls have a higher false positive rate than breakpoint calls. Furthermore, due to the nature of read mapping, single breakends calls are enriched in low complexity sequence such as micro-satellites. Although there are many true positives, the false positive rate in these regions is higher for all general purpose SV callers such as GRIDSS.