4.beta.4
Pre-release
Pre-release
Highlights of this release include fixes to the GATK4 HaplotypeCaller
to bring it closer to the output of the GATK3 HaplotypeCaller
(although many of these fixes still need to be applied to HaplotypeCallerSpark
), fixes for longstanding indexing and CRAM-related bugs in htsjdk, bash tab completion support for GATK commands, and many improvements to Mutect2
and the SV tools.
A docker image for this release can be found in the broadinstitute/gatk
repository on dockerhub. Within the image, cd into /gatk
then run gatk-launch
commands as usual.
Note: Due to our current dependency on a snapshot of google-cloud-java
, this release cannot be published to maven central.
Changes in this release:
HaplotypeCaller
: a number of important updates and fixes to bring it closer to GATK 3.x's output (most of these fixes apply only toHaplotypeCaller
, notHaplotypeCallerSpark
) (#3519)- reduce memory usage of the
AssemblyRegion
traversal by an order of magnitude - create empty pileup objects for uncovered loci internally (fixes occasional gaps between GVCF blocks as well as some calling artifacts)
- when determining active regions, only consider loci within the user's intervals
- port some additional changes to the GATK 3.x
HaplotypeCaller
to GATK4 - fix bug with handling of the
MQ
annotation
- reduce memory usage of the
- Added bash tab completion support for GATK commands (#3424)
- Updated to
Intel GKL
0.5.8, which fixes bug in AVX detection, which was behaving incorrectly on some AMD systems (#3513) - Upgrade
htsjdk
to 2.11.0-4-g958dc6e-SNAPSHOT to pick up an important VCF header performance fix. (#3504) - Updated
google-cloud-nio
dependency to 0.20.4-alpha-20170727.190814-1:shaded (#3373) - Fix tabix indexing bugs in htsjdk, and reenable the
IndexFeatureFile
tool (#3425) - Fix longstanding issue with CRAM MD5 slice calculation in htsjdk (#3430)
- Started publishing nightly builds
- Performance improvements to allow MD+BQSR+HC Spark pipeline to scale to a full genome (#3106)
- Eliminate expensive
toString()
call inGenotypeGVCFs
(#3478) ValidateVariants
gvcf memory optimization (#3445)- Simplified
Mutect2
annotations (#3351) - Fix MuTect2 INFO field types in the VCF header (#3422)
- SV tools: fixed possibility of a negative fragment length that shouldn't have happened (#3463)
- Added command line argument for IntervalMerging based on GATK3 (#3254)
- Added 'nio_max_retries' option as a command line accessible option for GATK tools (#3328)
- Fix aligned PathSeq input getting filtered by WellformedReadFilter (#3453)
- Patch the
ReferenceBases
annotation to handle the case where no reference is present (#3299) - Honor index/MD5 creation for HaplotypeCaller/Mutect2 bamouts. (#3374)
- Fix SV pipeline default init script handling (#3467)
- SV tools: improve the test bam (#3455)
- SV tools: improved filtering for smallish indels (#3376)
- Extends BwaMemImageSingleton into a cache, BwaMemImageCache, that can… (#3359)
- Try installing R packages from multiple CRAN repos in case some are down (#3451)
- Run Oncotator (optional) in the CNV case WDL. (#3408)
- Add option to run Spark tests only (#3377)
- Added a .dockerignore file (#3418)
- Code cleanup in the sv discovery package (#3361) and fixes #3224
- Implement PathSeq taxon hit scoring in Spark (#3406)
- Add option to skip pre-Bwa repartitioning in PSFilter (#3405)
- Update the GQ after PLs get subset (#3409)
- Removed the explicit System.exit(0) from Main (#3400)
- build_docker.sh can run tests again #3191 #3160 (#3323)
- Minor doc fixes #3173 (#3332)
- Use ReadClipper in BaseQualityClipReadTransformer (#3388)
- PathSeq adapter trimming and simple repeat masking (#3354)
- Add scripts to manage SV spark jobs and copy result (#3370)
- Output empty VQSLOD tranches in scatterTranches mode if no variant has VQSLOD high enough for requested threshold (#3397)
- Option to filter short pathogen reference contigs (#3355)
- Rewrote hapmap autoval wdl (#3379)
- fixed contamination calculation, added error bars to output (#3385)
- wrote wdl for Mutect panel of normals (#3386)
- Turn off tranches plots if no output Rscript is specified (for annotation plots) (#3383)
Mutect2
wdls output the contamination (#3375)- Increased maximum copy-ratio variance slice-sampling bound. (#3378)
- Replace --allowMissingData with --errorIfMissingData (gives opposite default behavior as previously) and print NA for null object in VariantsToTable (#3190)
- docs for proposed tumor-in-normal tool (#3264)
- Fixed the git version for the output jar on docker automatic builds (#3496)
- Use correct logger class in MathUtils (#3479)
- Make ShardBoundaryShard implement Serializable (#3245)