`AggregateSvPileup` should account for inaccurate split-read breakpoint positions #13

pamelarussell · 2022-05-24T21:39:11Z

Currently AggregateSvPileup merges breakpoints that have left and right breakpoints within a distance threshold of each other, regardless of the type of read evidence of the breakpoints: split-read (breakpoint occurs inside sequenced read) or read-pair (breakpoint occurs in the unsequenced insert between mates).

However, these two types of evidence have different precision of the breakpoint position and should use different distance thresholds. While split-read evidence is likely to point to a very precise position, the position for a read-pair event can be off by as much as the inner distance (insert size minus read lengths). Something similar to the following procedure should be used instead:

"Seed" clusters by clustering only breakpoints that have split-read evidence
"Seed" additional clusters with breakpoints that have read-pair evidence
Use read-pair events to aggregate clusters when the distance is within the inner distance (computed empirically by sampling)

The text was updated successfully, but these errors were encountered:

tfenne · 2023-09-28T22:27:10Z

Agreed - I think a multi-pass strategy would work, though I think I would suggest something different:

Have parameters max-split-read-distance and read-pair-inner-distance (or compute the latter)
Aggregate events with split-read evidence within max-split-read-distance; this parameter should probably be set based on aligner parameters (e.g. a single sequencing error how far from the breakpoint would cause the read to get clipped at that point?)
Take all read-pair evidence and see if it can be said to support a single event defined by aggregating split reads, and if so assign it; in this case I think it should determine compatibility by whether the sum of the distances on both sides is < the max inner distance, rather than evaluating each side independently.
Take remaining read pairs, and if they could support multiple events, try and tie break based on position or split the count?
Take the remaining read pairs and cluster those independently

pamelarussell added the enhancement New feature or request label May 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`AggregateSvPileup` should account for inaccurate split-read breakpoint positions #13

`AggregateSvPileup` should account for inaccurate split-read breakpoint positions #13

pamelarussell commented May 24, 2022

tfenne commented Sep 28, 2023

AggregateSvPileup should account for inaccurate split-read breakpoint positions #13

AggregateSvPileup should account for inaccurate split-read breakpoint positions #13

Comments

pamelarussell commented May 24, 2022

tfenne commented Sep 28, 2023

`AggregateSvPileup` should account for inaccurate split-read breakpoint positions #13

`AggregateSvPileup` should account for inaccurate split-read breakpoint positions #13