Skip to content

4.0.5.2

Compare
Choose a tag to compare
@cmnbroad cmnbroad released this 29 Jun 21:01
· 1304 commits to master since this release

Highlights of this release include major Funcotator performance improvements on hg19/b37 inputs, a newly rewritten Java version of FilterVariantTranches, HaplotypeCaller bamout improvements, and improved Python integration by eliminate timeouts.

As usual, a docker image for this release can be downloaded from https://hub.docker.com/r/broadinstitute/gatk/.

Funcotator Improvements

  • Improve handling of hg19/B37 references (#4586).
    • Fixed performance bug involving excessive cache misses when querying datasources, resulting in major
      performance improvements when running on HG19/B37 data (performance increased by approx. 30x with v1.4.20180615 of
      the standard Funcotator data sources) (#4586).
    • Automatically detect when B37 data run against hg19 data source and convert contig names to be hg19 compliant.
    • Assumes all data sources for the hg19 reference are compliant with hg19 contig names. User-created data
      sources will have to honor this.
    • Perform additional validation on input data to ensure a given reference FASTA has a sequence
      dictionary that is a superset of the given VCF. This is a more stringent check than is automatically
      performed by the GATK. Can be disabled with the --disable-sequence-dictionary-validation flag.
    • Released new version of datasources to go with this release (1.4.20180615), necessary because the data
      sources needed to be made consistent with hg19 (before they were a mix of hg19 and b37 contig names).
    • Updated the minimum required data source version to be the latest release.
    • Updated the getDbSNP.sh and createSqliteCosmicDb.sh data source scripts to preprocess those data sources
      to have hg19-compliant contigs names.
    • Removed the --allow-hg19-gencode-b37-contig-matching flag.
    • Removed the --allow-hg19-gencode-b37-contig-matching-override flag.
  • User defined transcripts were being used as a filter rather than a priority order. The filtering step has been eliminated. Fixes #4918 (#4931)
  • Added custom MAF fields to MafOutputRenderer (#4917)
  • LocatableXsv data sources now produce at most 1 funcotation per allele pair. (#4936)
  • LocatableXsv data sources now provide the correct number of funcotations (#4915)
  • Preserve VCF fields in MAF output (#4872)
  • Fixing error when spanning deletions overlap coding regions (#4881)

HaplotypeCaller/Mutect2

  • Improvements to FilterMutectCalls. Eliminates about 3% of all false positives in DREAM while reducing sensitivity by about 0.1%
  • Fix many questionable -bamout alignments where, because of a bad choice of Smith-Waterman parameters,
    deletions were preferred over single-base substitutions.(#4858)
    Result is many fewer spurious indels in the -bamout output.
  • Introduced new SmithWaterman parameters affecting realignment of the reads to their best haplotype. This
    also changes some annotations that depend on the alignment, such as BaseQualityRankSum and ReadPositionRankSum.
    The changes are slight and make things more correct.
  • Modify the behavior of (BaseGraph) getNextReferenceVertex for non-ref paths (#4889)

FilterVariantTranches

  • Rewrite VCF Tranche filtering in java, with tests (#4800)

Engine

  • StreamingPythonExecutor no longer uses timeouts or relies on prompt synchronization. (#4757)
  • Allow concordance tools (AbstractConcordanceWalker) to use NIO for truth call set (#4905)
  • Add pre- and post- apply variant transformer to VariantWalkerBase

MarkDuplicatesSpark

  • Fixed a missing special case in MarkDuplicates ReadsKey code to better match current picard results (#4899)
  • Reworked the keys for MarkDuplicatesSpark to be sufficient for grouping on their own. (4878)
  • Improve error message for MarkDuplicates duplicates readnames issues (#4879)

Structural Variants

  • Add tests for AssemblyContigWithFineTunedAlignments (#4961)
  • Fix no index output for assembly bam file (#4945)
  • Overhaul tests on assembly-based non-complex breakpoint and type inference code (#4835)
  • Simple fix to remove trailing slash in GCS_SAVE_PATH to avoid double slashes in GCS_RESULTS_DIR (#4873)

Misc:

  • Upgrading picard 2.18.2 -> 2.18.7 (#4949)
  • Update htsjdk 2.15.1 -> 2.16.0 (#4914)
  • Added support to PrintReadsSpark for non-coordinate sorted bams (#4853)
  • Adding --sort-order option to SortSamSpark (#4545)
  • Increased boot disk size on GATK tasks in M2 wdl to accomodate 4.0.5.0 docker (#4877)