Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong filtering in Alignment metrics #943

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/main/java/picard/analysis/AlignmentSummaryMetrics.java
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@ public enum Category { UNPAIRED, FIRST_OF_PAIR, SECOND_OF_PAIR, PAIR }
public double PCT_PF_READS_IMPROPER_PAIRS;

/**
* The number of instrument cycles in which 80% or more of base calls were no-calls.
* The number of instrument cycles in which 80% or more of base calls were no-calls or mismatched the reference.
*/
public long BAD_CYCLES;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ protected PerUnitMetricCollector<AlignmentSummaryMetrics, Comparable<?>, SAMReco

@Override
public void acceptRecord(final SAMRecord rec, final ReferenceSequence ref) {
if (!rec.isSecondaryOrSupplementary()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to include secondary reads as well? I think not. we should probably filter secondary reads here. Even if they are filtered upstream somehow (I don't think that they are) this program should be clear that secondary alignements are filtered out.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, added check for secondary read here.

if (!rec.getNotPrimaryAlignmentFlag()){
super.acceptRecord(rec, ref);
}
}
Expand Down Expand Up @@ -271,10 +271,12 @@ private void collectQualityData(final SAMRecord record, final ReferenceSequence

// If the read isn't an aligned PF read then look at the read for no-calls
if (record.getReadUnmappedFlag() || record.getReadFailsVendorQualityCheckFlag() || !doRefMetrics) {
final byte[] readBases = record.getReadBases();
for (int i = 0; i < readBases.length; i++) {
if (SequenceUtil.isNoCall(readBases[i])) {
badCycleHistogram.increment(CoordMath.getCycle(record.getReadNegativeStrandFlag(), readBases.length, i));
if (!record.getSupplementaryAlignmentFlag()) {
final byte[] readBases = record.getReadBases();
for (int i = 0; i < readBases.length; i++) {
if (SequenceUtil.isNoCall(readBases[i])) {
badCycleHistogram.increment(CoordMath.getCycle(record.getReadNegativeStrandFlag(), readBases.length, i));
}
}
}
}
Expand Down Expand Up @@ -314,7 +316,8 @@ else if (!record.getReadFailsVendorQualityCheckFlag()) {
if (mismatch) hqMismatchCount++;
}

if (mismatch || SequenceUtil.isNoCall(readBases[readBaseIndex])) {
if (!record.getSupplementaryAlignmentFlag()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Copy link
Author

@Ksenomorfa Ksenomorfa Oct 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, we can leave this "if" condition for supplementary reads (according to yfarjoun comment)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is right.. @nh13 what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah no. sorry. this is more complex. Given that we are only looping over the alignment blocks, we need to look at the supplemental reads in order to see the bases....I guess, it's as @nh13 said in the code comment: for metrics that have to do with bases (so the badCycleHistogram is one), we want to see the supplemental reads, but for metrics that have to do with reads, we do not. I think that this means that supplemental reads should be filtered from collectReadData (if supplemental, return) but not from collectQualityData.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for clarifying this! Okay, so I want to make sure that I understand right.. :) I can remove all filtering for supplementary from collectQualityData() , and leave check for secondary in acceptRecord() - this will be necessary logic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's right. @nh13 ?

&& (mismatch || SequenceUtil.isNoCall(readBases[readBaseIndex]))) {
badCycleHistogram.increment(CoordMath.getCycle(record.getReadNegativeStrandFlag(), readBases.length, i));
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,77 @@ public void test() throws IOException {
}
}


@Test
public void testSupplementaryReadsAndBases() throws IOException {
final File input = new File(TEST_DATA_DIR, "summary_alignment_stats_test_supplementary.sam");
final File reference = new File(TEST_DATA_DIR, "summary_alignment_stats_test_supplementary.fasta");
final File outfile = File.createTempFile("alignmentMetrics", ".txt");
outfile.deleteOnExit();
final String[] args = new String[] {
"INPUT=" + input.getAbsolutePath(),
"OUTPUT=" + outfile.getAbsolutePath(),
"REFERENCE_SEQUENCE=" + reference.getAbsolutePath(),
};
Assert.assertEquals(runPicardCommandLine(args), 0);

final MetricsFile<AlignmentSummaryMetrics, Comparable<?>> output = new MetricsFile<>();
output.read(new FileReader(outfile));

Assert.assertEquals(output.getMetrics().size(), 3);
for (final AlignmentSummaryMetrics metrics : output.getMetrics()) {
Assert.assertEquals(metrics.MEAN_READ_LENGTH, 101.0);
switch (metrics.CATEGORY) {
case FIRST_OF_PAIR:
Assert.assertEquals(metrics.TOTAL_READS, 1);
Assert.assertEquals(metrics.PF_READS, 1);
Assert.assertEquals(metrics.PF_NOISE_READS, 0);
Assert.assertEquals(metrics.PF_HQ_ALIGNED_READS, 1);
Assert.assertEquals(metrics.PF_HQ_ALIGNED_Q20_BASES, 21);
Assert.assertEquals(metrics.PF_HQ_MEDIAN_MISMATCHES, 19.0);
Assert.assertEquals(metrics.PF_READS_ALIGNED, 1);
Assert.assertEquals(metrics.PF_READS_IMPROPER_PAIRS, 0);
Assert.assertEquals(metrics.PCT_PF_READS_IMPROPER_PAIRS, 0.0);
Assert.assertEquals(metrics.PF_ALIGNED_BASES, 101);
Assert.assertEquals(metrics.PF_MISMATCH_RATE, 0.188119);
Assert.assertEquals(metrics.BAD_CYCLES, 19);
break;
case SECOND_OF_PAIR:
Assert.assertEquals(metrics.TOTAL_READS, 2);
Assert.assertEquals(metrics.PF_READS, 2);
Assert.assertEquals(metrics.PF_NOISE_READS, 0);
Assert.assertEquals(metrics.PF_HQ_ALIGNED_READS, 1);
Assert.assertEquals(metrics.PF_HQ_ALIGNED_Q20_BASES, 66);
Assert.assertEquals(metrics.PF_HQ_MEDIAN_MISMATCHES, 2.5);
Assert.assertEquals(metrics.PF_READS_ALIGNED, 1);
Assert.assertEquals(metrics.PF_READS_IMPROPER_PAIRS, 1);
Assert.assertEquals(metrics.PCT_PF_READS_IMPROPER_PAIRS, 1.0);
Assert.assertEquals(metrics.PF_ALIGNED_BASES, 202);
Assert.assertEquals(metrics.PCT_READS_ALIGNED_IN_PAIRS, 0.0);
Assert.assertEquals(metrics.PF_MISMATCH_RATE, 0.024752);
Assert.assertEquals(metrics.BAD_CYCLES, 3);
break;
case PAIR:
Assert.assertEquals(metrics.TOTAL_READS, 3);
Assert.assertEquals(metrics.PF_READS, 3);
Assert.assertEquals(metrics.PF_NOISE_READS, 0);
Assert.assertEquals(metrics.PF_HQ_ALIGNED_READS, 2);
Assert.assertEquals(metrics.PF_HQ_ALIGNED_Q20_BASES, 87);
Assert.assertEquals(metrics.PF_HQ_MEDIAN_MISMATCHES, 3.0);
Assert.assertEquals(metrics.PF_READS_ALIGNED, 2);
Assert.assertEquals(metrics.PF_READS_IMPROPER_PAIRS, 1);
Assert.assertEquals(metrics.PCT_PF_READS_IMPROPER_PAIRS, 0.5);
Assert.assertEquals(metrics.PF_ALIGNED_BASES, 303);
Assert.assertEquals(metrics.PF_MISMATCH_RATE, 0.079208);
Assert.assertEquals(metrics.BAD_CYCLES, 22);
break;
case UNPAIRED:
default:
Assert.fail("Data does not contain this category: " + metrics.CATEGORY);
}
}
}

@Test
public void testBisulfite() throws IOException {
final File input = new File(TEST_DATA_DIR, "summary_alignment_bisulfite_test.sam");
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
>chr6
NAATTGTTCTTAGTTTCTCGGTTTATGTGCTCTTCCAGGTGGGTAACACA
ATAATGGCCTTCCAGATCGTAAGAGCGACGTGTGTTGCACCAGTGTCGAT
C
>chr8
CACATCGTGAATCTTACAATCTGCGGTTTCAGATGTGGAGCGATGTGTGA
GAGATTGAGCAACTGATCTGAAAAGCAGACACAGCTATTCCTAAGATGAC
CCCAGGTTCAAATGTGCAGCCCCTTTTGAGAGATTTTTTTTTTGGGCTGG
AAAAAAGACACAGCTATTCCTAAGATGACAAGATCAGAAAAAAAGTCAAG
CA
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
@HD VN:1.0 SO:coordinate
@SQ SN:chr6 LN:101
@SQ SN:chr8 LN:202
@RG ID:0 SM:Hi,Momma! LB:whatever PU:me PL:ILLUMINA
SL-XAV:1:1:0:700#0/2 137 chr6 1 255 101M * 0 0 NAATTGTTCTNAGTTTCTCGGTTTATGTGCTCTTCCAGGTGGGTAACACAATAATGGCCTTCCAGATCGTAAGAGCGACGTGTGTTGCACNAGTGTCGATC &0::887::::6/646::838388811/679:87640&./2+/-4/28:3,536/4''&&.78/(/554/./02*)*',-(57()&.6(6:(0601'/(,* RG:Z:0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be good if this example were more realistic, so that the primary read were only aligned for, say 60 bases and the supplementary read would be aligned for the remaining 41 bases. also, the bases and the qualities should be the same for the read and its supplemental alignment....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PolinaBevad are you going to add a more realistic test?

SL-XAV:1:1:0:105#0/1 99 chr8 102 255 101M = 1 -79 NCAGGTTCAANTGTGCAGCCCNTTTTGAGAGATNNNNNNNNTGNNCTGNAANANNGACACAGCTATTCCTAAGATGACAAGATCAGANAANAAGTCAAGCA &06665578::41.*/7577/&/77403-324.&&&&&&&&/.&&..&&.0&&&&',:9:/-/(55002020+3'12+2/&.2-&//&),&*&&&&&&&51 RG:Z:0
SL-XAV:1:1:0:105#0/2 2195 chr8 1 255 101M = 102 79 CACATCGTGANTCTTACAATCTGCGGTTTCAGATGTGGAGCGATGTGTGAGAGATTGAGCAACTGATCTGAAAAGCAGACACAGCTATTCNTAAGATGACN /))3--/&*()&)&&+'++.'-&,(.))'4,)&'&&,')8,&&*'.&*0'225/&)3-8//)*,5-*).7851453583.3568526:863688:::85.& RG:Z:0
SL-XAV:1:1:0:764#0/2 165 * 0 0 * chr6 1 0 NACAGATGCANATATTAACAGGCTTTAAAGGACAGATGGACTGCAATACAATAATAGAGTACGTCAACACTCCACAGATCGCTAGAGCATNACATCGGTGT &/:5358::9999::99998255::7275,,/5567-'+387537857:54-4.51'31059547320;73/720+22.4(6.;((.;(;8()(''&&2&& RG:Z:0