Skip to content

4.0.9.0

Compare
Choose a tag to compare
@droazen droazen released this 20 Sep 17:11
· 1184 commits to master since this release
6e352bb

Highlighting this release are some important fixes and improvements to the HaplotypeCaller, in particular support for genotyping spanning deletions and a fix to the reference confidence calculation around indels. This release also brings support for "Requester Pays" GCS (Google Cloud Storage) buckets, fasta.gz support to the -R/--reference argument, a port of LeftAlignAndTrimVariants from GATK3, a new tool FuncotatorDataSourceDownloader to download Funcotator datasources, and bug fixes to Mutect2, VariantRecalibrator, and SelectVariants.

As usual, a docker image for this release can be downloaded from https://hub.docker.com/r/broadinstitute/gatk/

  • HaplotypeCaller

    • Fixed the reference confidence calculation upstream of indels (#5172)
      • Improve hom-ref GQs near indels in GVCFs. Also consider bases on either side of indels informative if local assembly has been performed.
      • The previous behavior generated some PL=0,0,0 no-calls because the CIGAR of reads containing indels wasn't taken into account when determining which reads were informative for the indel reference confidence model. The local realignment wasn't being used inside the active region previously either, which has been fixed. A related change considers bases on either side of indels informative if local assembly has been performed (but not during active region detection). Both result in far fewer 0,0,0 calls. Unfortunately there are still some 0,0,X homRef calls related to #5171.
    • Make HaplotypeCaller genotype and output spanning deletions (#4963)
      • Modifies HaplotypeCaller so that it can output and genotype spanning deletion alleles represented by the * allele.
      • Fixes #2960
      • Previously, the output of HaplotypeCaller would not include spanning deletion alleles when run in single sample VCF mode or in genotype given alleles mode, even when that genotype would be more appropriate. In the joint calling workflow GenotypeGVCFs adds genotypes for spanning deletions, although the input likelihoods will not be broken out to specifically account for spanning deletion alleles.
    • Simplify HaplotypeBAMWriter code. #944 (#5122)
  • Mutect2

    • Mutect2 now emits DP values in the FORMAT field (#5185)
    • Add --get-af-from-ad option to recalculate the allele fraction based on AD instead of the Bayesian estimate (#5118)
      • Recommended for mitochondrial applications
    • Fixed a StringIndexOutOfBoundsException crash in the ReferenceBases annotation when a variant is within 10 base pairs of the end of a chromosome (#5151)
    • Restore base quality filter code that got removed unintentionally in #4895. (#5123)
    • Remove extra space in the MutectVersion header line (previously was Mutect Version) (#5184)
  • Added support for "Requester Pays" GCS (Google Cloud Storage) buckets via new --gcs-project-for-requester-pays argument (#5140)

  • Added fasta.gz support to the -R/--reference argument in walker tools (#5120)

  • Added GCS/NIO support to the --tmp-dir argument (#4469)

  • Upgraded google-cloud-java to the official 0.62.0 release, and move off of our custom fork of the library. This release includes the retry for transient 502 errors that we added to our fork in GATK 4.0.8.0 (#5194) (#5135)

  • Ported the LeftAlignAndTrimVariants tool from GATK3 (#5144)

  • VariantRecalibrator: the serialized model now sets annotation order (#3655)

    • This addresses a problem where serialized GMMs for VQSR assumed that the annotation order would be the same between the commands that generated them and the commands that used them. VQSR no longer depends on the commandline order of the annotations.
  • SelectVariants: Drop sites with the * allele as the only ALT when running with --exclude-non-variants (#5129)

  • Funcotator:

    • Created a new FuncotatorDataSourceDownloader tool to download data sources. (#5150)
    • Add an experimental FilterFuncotations tool (#4991)
    • Updated COSMIC to annotate protein change strings with their counts. (#5181)
    • Fix INDEL start/stop position and alleles for VCF gencode output. (#5131)
    • Get datasource version from a manifest file instead of the README (#5149)
    • Extract a new FuncotatorEngine to make it easier to write additional tools in the future that leverage Funcotator's annotation engine (#5134)
    • Handle character encoding error cases. (#5124)
  • CNNScoreVariants:

    • Add WDLs and JSONs to run CNNScoreVariants in a single-sample workflow (#4774)
    • Added --python-profile argument to enable Python profiling. (#4953)
  • CNV tools:

    • Produce an IGV-compatible seg file alongside the copy ratio calls in CallCopyRatioSegments (#5115)
    • Added optional mappability and segmental-duplication annotation to AnnotateIntervals. (#5162)
    • Improvements and refactoring of the Nucleotide class (#4846)
  • SV tools:

    • Bug fix to read name mangling in ExtractOriginalAlignmentRecordsByNameSpark (#5107)
    • Added an InsertSizeDistribution class to represent expected insert-size distribution (normal and log-normal distributed) parameterized by insert size mean and stddev (#4827)
    • Added documentation clarification and additional validation to SVInterval (#5157)
    • Test and utils clean up (#5116)
  • MarkDuplicatesSpark:

    • Switched MarkDuplicatesSpark tile-parsing code to use shorts in order to match Picard (#5165)
    • Added better error messages around missing read groups in MarkDuplicatesSpark (#5177)
  • Clone read base qualities rather than reference them directly in the read clipper code to prevent unsafe array operations (#4926)

  • Fix three bugs in the AlignmentUtils class (#3494)

    • The treatment of D-over-D in function applyCigarToCigar() was backward.
    • In function createReadAlignedToRef() the read start position passed to the leftAlignIndel() call was incorrect if the haplotype has an indel relative to reference.
    • When the leftAlignIndel() call drops any leading D operator in the result cigar, the read start position needs to be adjusted accordingly.
  • Test infrastructure improvements:

    • Split out gatk-testUtils as a separate artifact in our build system(#5112)
    • Skip push builds if there is a pull request (cuts down on total number of travis builds by about half) (#5156)
    • We now share the test settings between the main build and the docker tests (#5155)
  • Documented use of --temp-dir with GenomicsDBImport. (#5047)

  • Deleted obsolete experimental tool MarkDuplicatesGATK in favor of MarkDuplicatesSpark (#5166)

  • Deleted obsolete experimental tool BaseRecalibratorSparkSharded (#5192)

  • Upgraded htsjdk to version 2.16.1 (#5168)

  • Upgraded Picard to version 2.18.13. (#5173)