-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong filtering in Alignment metrics #943
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -79,7 +79,7 @@ protected PerUnitMetricCollector<AlignmentSummaryMetrics, Comparable<?>, SAMReco | |
|
||
@Override | ||
public void acceptRecord(final SAMRecord rec, final ReferenceSequence ref) { | ||
if (!rec.isSecondaryOrSupplementary()) { | ||
if (!rec.getNotPrimaryAlignmentFlag()){ | ||
super.acceptRecord(rec, ref); | ||
} | ||
} | ||
|
@@ -271,10 +271,12 @@ private void collectQualityData(final SAMRecord record, final ReferenceSequence | |
|
||
// If the read isn't an aligned PF read then look at the read for no-calls | ||
if (record.getReadUnmappedFlag() || record.getReadFailsVendorQualityCheckFlag() || !doRefMetrics) { | ||
final byte[] readBases = record.getReadBases(); | ||
for (int i = 0; i < readBases.length; i++) { | ||
if (SequenceUtil.isNoCall(readBases[i])) { | ||
badCycleHistogram.increment(CoordMath.getCycle(record.getReadNegativeStrandFlag(), readBases.length, i)); | ||
if (!record.getSupplementaryAlignmentFlag()) { | ||
final byte[] readBases = record.getReadBases(); | ||
for (int i = 0; i < readBases.length; i++) { | ||
if (SequenceUtil.isNoCall(readBases[i])) { | ||
badCycleHistogram.increment(CoordMath.getCycle(record.getReadNegativeStrandFlag(), readBases.length, i)); | ||
} | ||
} | ||
} | ||
} | ||
|
@@ -314,7 +316,8 @@ else if (!record.getReadFailsVendorQualityCheckFlag()) { | |
if (mismatch) hqMismatchCount++; | ||
} | ||
|
||
if (mismatch || SequenceUtil.isNoCall(readBases[readBaseIndex])) { | ||
if (!record.getSupplementaryAlignmentFlag() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ditto. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If I understand correctly, we can leave this "if" condition for supplementary reads (according to yfarjoun comment)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that this is right.. @nh13 what do you think? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah no. sorry. this is more complex. Given that we are only looping over the alignment blocks, we need to look at the supplemental reads in order to see the bases....I guess, it's as @nh13 said in the code comment: for metrics that have to do with bases (so the badCycleHistogram is one), we want to see the supplemental reads, but for metrics that have to do with reads, we do not. I think that this means that supplemental reads should be filtered from collectReadData (if supplemental, return) but not from collectQualityData. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for clarifying this! Okay, so I want to make sure that I understand right.. :) I can remove all filtering for supplementary from collectQualityData() , and leave check for secondary in acceptRecord() - this will be necessary logic? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that's right. @nh13 ? |
||
&& (mismatch || SequenceUtil.isNoCall(readBases[readBaseIndex]))) { | ||
badCycleHistogram.increment(CoordMath.getCycle(record.getReadNegativeStrandFlag(), readBases.length, i)); | ||
} | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
>chr6 | ||
NAATTGTTCTTAGTTTCTCGGTTTATGTGCTCTTCCAGGTGGGTAACACA | ||
ATAATGGCCTTCCAGATCGTAAGAGCGACGTGTGTTGCACCAGTGTCGAT | ||
C | ||
>chr8 | ||
CACATCGTGAATCTTACAATCTGCGGTTTCAGATGTGGAGCGATGTGTGA | ||
GAGATTGAGCAACTGATCTGAAAAGCAGACACAGCTATTCCTAAGATGAC | ||
CCCAGGTTCAAATGTGCAGCCCCTTTTGAGAGATTTTTTTTTTGGGCTGG | ||
AAAAAAGACACAGCTATTCCTAAGATGACAAGATCAGAAAAAAAGTCAAG | ||
CA |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
@HD VN:1.0 SO:coordinate | ||
@SQ SN:chr6 LN:101 | ||
@SQ SN:chr8 LN:202 | ||
@RG ID:0 SM:Hi,Momma! LB:whatever PU:me PL:ILLUMINA | ||
SL-XAV:1:1:0:700#0/2 137 chr6 1 255 101M * 0 0 NAATTGTTCTNAGTTTCTCGGTTTATGTGCTCTTCCAGGTGGGTAACACAATAATGGCCTTCCAGATCGTAAGAGCGACGTGTGTTGCACNAGTGTCGATC &0::887::::6/646::838388811/679:87640&./2+/-4/28:3,536/4''&&.78/(/554/./02*)*',-(57()&.6(6:(0601'/(,* RG:Z:0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it would be good if this example were more realistic, so that the primary read were only aligned for, say 60 bases and the supplementary read would be aligned for the remaining 41 bases. also, the bases and the qualities should be the same for the read and its supplemental alignment.... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @PolinaBevad are you going to add a more realistic test? |
||
SL-XAV:1:1:0:105#0/1 99 chr8 102 255 101M = 1 -79 NCAGGTTCAANTGTGCAGCCCNTTTTGAGAGATNNNNNNNNTGNNCTGNAANANNGACACAGCTATTCCTAAGATGACAAGATCAGANAANAAGTCAAGCA &06665578::41.*/7577/&/77403-324.&&&&&&&&/.&&..&&.0&&&&',:9:/-/(55002020+3'12+2/&.2-&//&),&*&&&&&&&51 RG:Z:0 | ||
SL-XAV:1:1:0:105#0/2 2195 chr8 1 255 101M = 102 79 CACATCGTGANTCTTACAATCTGCGGTTTCAGATGTGGAGCGATGTGTGAGAGATTGAGCAACTGATCTGAAAAGCAGACACAGCTATTCNTAAGATGACN /))3--/&*()&)&&+'++.'-&,(.))'4,)&'&&,')8,&&*'.&*0'225/&)3-8//)*,5-*).7851453583.3568526:863688:::85.& RG:Z:0 | ||
SL-XAV:1:1:0:764#0/2 165 * 0 0 * chr6 1 0 NACAGATGCANATATTAACAGGCTTTAAAGGACAGATGGACTGCAATACAATAATAGAGTACGTCAACACTCCACAGATCGCTAGAGCATNACATCGGTGT &/:5358::9999::99998255::7275,,/5567-'+387537857:54-4.51'31059547320;73/720+22.4(6.;((.;(;8()(''&&2&& RG:Z:0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to include secondary reads as well? I think not. we should probably filter secondary reads here. Even if they are filtered upstream somehow (I don't think that they are) this program should be clear that secondary alignements are filtered out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, added check for secondary read here.