Skip to content
Daniel Cameron edited this page Jan 16, 2020 · 5 revisions

gridss_somatic_filter.R performs a number of filtering steps.

Annotations

SV phasing

LOCAL_LINKED_BY, REMOTE_LINKED_BY containing identifiers indicating structural variant phasing. Breakends sharing an identifier have been determined to phased. The following prefixes are currently used:

prefix phasing meaning
asm definitely cis assembly contig span across both breakpoints.
tra definitely cis transitive breakpoint found. The presence of A-B, B-C, and an imprecise A-C breakpoint call indicates that A-B and B-C are phased and the A-C is due to read pairs spanning across B.
bpbp likely cis nearby breakpoints in opposite orientations consistent with a translocation/templated insertion of distal sequence
bpbe likely cis nearby breakpoint and breakend in opposite orientations. Likely to be a templated insertion.
inv likely cis breakpoints appear to be a simple inversion.

For breakpoints more than 1000bp apart, the SC field can be used to determine if two nearby events are phased trans. If the (homology-adjusted) width of the CIGAR encoding the interval of support spans past the position of the adjacent breakpoint, then the events are not adjacent on the same chromatid.

Complicating matters further, if a breakpoint has been amplified and multiple copies exist, it could be simultaneously cis and trans with another breakpoint if only some of the copies are adjacent.

Tumour Allele Fraction TAF

A TAF field is added that is an estimate of the average variant allele fraction across all tumour samples. This field is not purity adjusted.

VCF Filters

A description for each VCF filter can be found in the VCF header of the somatic filtering script.

  • If a reference genome is not supplied, the small.replacement.fp filter will not be applied as, without knowning the reference sequence, is it not possible to determine if the replacement sequence corresponds to a simple inversion.

Transitive call reduction

In cancer genomes, SVs frequently occur in close proximity (less than 500bp). In such cases, read pairs can span across one or more short inner fragments. Transitive call reduction filters out the transitive calls, and annotates the spanned called

For example if DNA segments A - B - C - D are connected, then, with small B and C, transitive read pairs support will be present for A-C, A-D, and B-D and imprecise variant calls made for these. In this example, A-C, A-D, and B-D variants will be filtered, and A-B, B-C, and C-D with be annotated as linked by the filtered transitive calls.

Equivalent variant

In some circumstances, a variant may be incorrrectly reported in multiple variant. For example, a breakpoint in which one side occurs in low mappability sequence may have a single breakend variant call with high QUAL score, and a breakpoint call to one of candicate locations with a low mappability score (typically these are due to the aligner overestimating the mapq of some reads). To adjust for this, variants sharing a breakend (with 5bp), whose sequences have an edit distance of less than 0.1 per base will be annotated in the LOCAL_LINKED_BY field with a eqv prefix.

As breakpoint sequence determination requires a reference genome, this step is not performed if a reference genome is not supplied.

Assembly-based phasing.

If a GRIDSS assembly breakend spans across multiple structural variants, this variants can be phased as cis. Assembly-phased variants are be annotated with a asm prefix in the LOCAL_LINKED_BY and REMOTE_LINKED_BY fields.

Event-based linkage

Events occurring nearby can sometimes be linked according to known variant types. The following event type linkages are annotated:

  • bebe: two adjacent breakend. Indicative of a simple insertion of non-reference or repetative sequence. Common for LINE insertions.
  • bebp: templated insertion in which only one side can be unambiguously place. Common for LINE insertions due to the poly(A) tail causing assembly truncation and a single breakend variant on one side.
  • inv: simple inversion
  • dsb: likely double-stranded break

Variants in which the supporting fragment counts differ by more than gridss.min_rescue_portion, they will not be linked unless AAAAAA occurs in the inserted sequence of either variant.

In some cases, a variant will be linked via multiple mechanisms or variants. In such cases, only the linkage to the highest QUAL event will be kept.

Finally, event links to variants that are PON filters are removed.

Variant rescue

Low quality variants are rescued and included in the high confidence somatic call set if they are linked to a variant included in the high confidence call set by a mechanism other than equivalence (eqv).

Dangling breakends

If one breakend in a breakpoint is filtered, the other breakend is also filtered.

Configuration settings

The somatic filtering script uses a number of configuration settings from gridss_config.R.

gridss.short_event_size_threshold = 1000

Distance between breakends for defining a small events. Distance is defined in terms of the nominal position (i.e. the middle of any range of uncertainty.

gridss.allowable_normal_contamination = 0.03

Allowable level of contamination of tumour reads in the normal. The default is 3% (0.03).

Note that we have found that small amount of flow cell cross-contamination occurs on Illumina sequencer, so a few reads from amplified tumour regions can be seen on all samples on the same sequencing run.

gridss.min_normal_depth = 8

Minimum depth across the breakend in the matched normal.

gridss.min_direct_read_support

Minimum number of reads providing direct support the variant.

gridss.max_homology_length = 50

Maximum lenght of an exact sequencing homology across a breakpoint.

gridss.max_inexact_homology_length = 50

Maximum lenght of an inexact sequencing homology across a breakpoint.

gridss.max_allowable_short_event_strand_bias = 0.95

Maximum allowable strand bias in the soft clipped/split read support for short events.

gridss.min_qual = 350

Minimum QUAL score required to report a breakpoint

gridss.single_breakend_multiplier = 1000/350

Minimum QUAL score multiplier require for single breakend calling.

gridss.min_af = 0.005

Minimum tumour allele fraction

gridss.pon.min_samples = 1

Minimum number of panel of normal samples required to filter a variant

gridss.dsb.maxgap=35

Maximum gap (in either direction) between breakends to annotate a pair of breakpoints as a double-stranded break.

gridss.insertion.maxgap=gridss.dsb.maxgap

Maximum gap (in either direction) between breakends to annotatea pair of variants as an insertion.

The default of 35 matches the logic used by manta.

gridss.inversion.maxgap=gridss.dsb.maxgap

Maximum gap (in either direction) between breakpoints to annotate a pair of variants as a simple invertion

gridss.templatedinsertion.maxgap=10000

Maximum size of an translocation/templated insertion to consider when performing transitive reduction

gridss.very_hard_filters = c("normalSupport", "SRNormalSupport")

Filters that will cause a variant to be excluded from both the somatic and the full somatic output files. Valid values are VCF filter except for qual.

gridss.soft_filters = c("PON")

Filters that do not exclude the variant from the somatic output file. Valid values are VCF filter except for qual.

gridss.min_rescue_portion = 0.25

Minimum percentage of the rescuing variant that a rescued variant will be rescued by. This limit prevents a noise variant from being rescued by a high quality variant. Portion is calculated using total supporting read count, not QUAL score.

gridss.min_event_size = 8

Minimum simple event size to report. The default matches the minimum event size reported by GRIDSS (so, using default settings, this filter does nothing).

Clone this wiki locally