4.0.5.2
Highlights of this release include major Funcotator performance improvements on hg19/b37 inputs, a newly rewritten Java version of FilterVariantTranches, HaplotypeCaller bamout improvements, and improved Python integration by eliminate timeouts.
As usual, a docker image for this release can be downloaded from https://hub.docker.com/r/broadinstitute/gatk/.
Funcotator Improvements
- Improve handling of hg19/B37 references (#4586).
- Fixed performance bug involving excessive cache misses when querying datasources, resulting in major
performance improvements when running on HG19/B37 data (performance increased by approx. 30x with v1.4.20180615 of
the standard Funcotator data sources) (#4586). - Automatically detect when B37 data run against hg19 data source and convert contig names to be hg19 compliant.
- Assumes all data sources for the hg19 reference are compliant with hg19 contig names. User-created data
sources will have to honor this. - Perform additional validation on input data to ensure a given reference FASTA has a sequence
dictionary that is a superset of the given VCF. This is a more stringent check than is automatically
performed by the GATK. Can be disabled with the--disable-sequence-dictionary-validation
flag. - Released new version of datasources to go with this release (1.4.20180615), necessary because the data
sources needed to be made consistent with hg19 (before they were a mix of hg19 and b37 contig names). - Updated the minimum required data source version to be the latest release.
- Updated the
getDbSNP.sh
andcreateSqliteCosmicDb.sh
data source scripts to preprocess those data sources
to have hg19-compliant contigs names. - Removed the
--allow-hg19-gencode-b37-contig-matching
flag. - Removed the
--allow-hg19-gencode-b37-contig-matching-override
flag.
- Fixed performance bug involving excessive cache misses when querying datasources, resulting in major
- User defined transcripts were being used as a filter rather than a priority order. The filtering step has been eliminated. Fixes #4918 (#4931)
- Added custom MAF fields to MafOutputRenderer (#4917)
- LocatableXsv data sources now produce at most 1 funcotation per allele pair. (#4936)
- LocatableXsv data sources now provide the correct number of funcotations (#4915)
- Preserve VCF fields in MAF output (#4872)
- Fixing error when spanning deletions overlap coding regions (#4881)
HaplotypeCaller/Mutect2
- Improvements to FilterMutectCalls. Eliminates about 3% of all false positives in DREAM while reducing sensitivity by about 0.1%
- Fix many questionable -bamout alignments where, because of a bad choice of Smith-Waterman parameters,
deletions were preferred over single-base substitutions.(#4858)
Result is many fewer spurious indels in the -bamout output. - Introduced new SmithWaterman parameters affecting realignment of the reads to their best haplotype. This
also changes some annotations that depend on the alignment, such asBaseQualityRankSum
andReadPositionRankSum
.
The changes are slight and make things more correct. - Modify the behavior of (BaseGraph) getNextReferenceVertex for non-ref paths (#4889)
FilterVariantTranches
- Rewrite VCF Tranche filtering in java, with tests (#4800)
Engine
- StreamingPythonExecutor no longer uses timeouts or relies on prompt synchronization. (#4757)
- Allow concordance tools (AbstractConcordanceWalker) to use NIO for truth call set (#4905)
- Add pre- and post- apply variant transformer to VariantWalkerBase
MarkDuplicatesSpark
- Fixed a missing special case in MarkDuplicates ReadsKey code to better match current picard results (#4899)
- Reworked the keys for MarkDuplicatesSpark to be sufficient for grouping on their own. (4878)
- Improve error message for MarkDuplicates duplicates readnames issues (#4879)
Structural Variants
- Add tests for AssemblyContigWithFineTunedAlignments (#4961)
- Fix no index output for assembly bam file (#4945)
- Overhaul tests on assembly-based non-complex breakpoint and type inference code (#4835)
- Simple fix to remove trailing slash in GCS_SAVE_PATH to avoid double slashes in GCS_RESULTS_DIR (#4873)