Allele Balance #14

mullerbsf · 2019-10-11T01:48:47Z

Dear Vikas Bansal,

I am trying to estimate the allele balance for each sample genotyped with CRISP. Is there a way to obtain that information from the output?

For example, from a total of 100 reads supporting the call, a heterozygous sample might have 80 reads to the reference allele and 20 reads to the alternative allele.

From the FORMAT field, I see the following options: GT:GQ:DP:ADf:ADr:ADb. The total depth seems to be included, but it is not clear to me how to obtain how many of these reads were relative to each allele.

Thank you!

vibansal · 2019-10-16T16:56:36Z

ADf is a comma separated list of the read counts for each allele on the forward strand. You can sum up the ADf, ADr and ADb values for each allele to get the allele balance.

mullerbsf · 2019-10-18T00:01:50Z

Thank you.

I am still confused about some of the results, as often they seem to not add up. Take the examples below that I retrieved from a representative marker:

FORMAT: GT:GQ:DP:ADf:ADr:ADb

SAMPLE 1: 0/0/0/1/1:0:18:2,3:4,1:0,0
SAMPLE 2: 0/0/1/1/1:3:25:2,9:4,3:0,0

It seems like the total depth is 18 and 25, respectively. The sums of the ADf, ADr and ADb fields, however, are not equal to the total depth.

I appreciate your attention.
Regards

vibansal · 2019-10-21T22:28:26Z

The total read depth (DP) includes all reads. Some of these may be filtered out for the calculation of ADf, ADr and ADb which are used for variant calling.

Update: this is not correct

mullerbsf · 2019-10-22T02:50:01Z

Dear Bansal,

Then, I did not understand why the DP definition from CRISP VCF output says: "filtered reads only"?
DP: "Total number of reads (+strand,-strand,bidirectional) across all pools (filtered reads only)".

This definition generates all the misunderstandings about CRISP output.

Thank you!

vibansal · 2019-10-22T18:36:41Z

I have checked the code and the above definition is correct.

DP includes counts for all alleles at the site so it can be greater than the counts for the two alleles output in the VCF. For example, a site with a ref allele = 'A' and variant allele = 'C' can have reads supporting 'G' or 'T'. These reads are included in the DP count.

mullerbsf · 2019-10-22T20:16:39Z

Dear Bansal,

I checked the BAM files, and your argument about other reads supporting other possible alleles is not valid (see attached the screenshot from IGV). This a clear SNP with only reads supporting REF=C and ALT=T, no reads are supporting the A or G. Therefore, the DP field from CRISP output is showing the total reads depth without filtering. For the ADf, ADr and ADb there are some filters applied in the counts.

SAMPLE 1

SAMPLE 2

When compare the CRISP and FreeBayes outputs, the FreeBayes is giving the correct number of DP and AD fields without filtering.

CRISP output

1 19545 Chla_rei_1_19545 C T 3470 PASS
FORMAT: GT:GQ:DP:ADf:ADr:ADb
SAMPLE 1: 0/0/0/0/1:9:40:10,5:11,2:1,0
SAMPLE 2: 0/0/0/0/1:14:51:17,2:9,4:0,1

FreeBayes output

1 19545 Chla_rei_1_19545 C T 12110.9 .
FORMAT: GT:DP:AD:RO:QR:AO:QA
SAMPLE 1: 0/0/0/1/1:41:23,18:23:893:18:552
SAMPLE 2: 0/0/1/1/1:52:26,26:26:1057:26:909

I want to use the CRISP output to run SNPGenie software. To be able to do this, I need to double-check with you about the DP field and all the ADf, ADr and ADb.

Based on my interpretation of CRISP output, I will need to create a code to sum all these fields to replace the DP and AD fields.

Then, for the first sample, it will be something like this:
FORMAT: GT:GQ:DP:ADf:ADr:ADb
SAMPLE 1: 0/0/0/0/1:9:40:10,5:11,2:1,0

The update file will be:
FORMAT: GT:GQ:DP:AD
SAMPLE 1: 0/0/0/0/1:9:29:22,7

Can you confirm if this is correct?

Thank you very much for this help.

vibansal · 2019-10-23T00:46:28Z

Yes, you are correct. I will have to update the code to output the correct DP value.

vibansal closed this as completed Oct 16, 2019

vibansal reopened this Oct 23, 2019

mullerbsf mentioned this issue Oct 23, 2019

SNPGenie with CRISP output chasewnelson/SNPGenie#24

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allele Balance #14

Allele Balance #14

mullerbsf commented Oct 11, 2019

vibansal commented Oct 16, 2019

mullerbsf commented Oct 18, 2019 •

edited

Loading

vibansal commented Oct 21, 2019 •

edited

Loading

mullerbsf commented Oct 22, 2019

vibansal commented Oct 22, 2019

mullerbsf commented Oct 22, 2019 •

edited

Loading

vibansal commented Oct 23, 2019

Allele Balance #14

Allele Balance #14

Comments

mullerbsf commented Oct 11, 2019

vibansal commented Oct 16, 2019

mullerbsf commented Oct 18, 2019 • edited Loading

vibansal commented Oct 21, 2019 • edited Loading

Update: this is not correct

mullerbsf commented Oct 22, 2019

vibansal commented Oct 22, 2019

mullerbsf commented Oct 22, 2019 • edited Loading

SAMPLE 1

SAMPLE 2

CRISP output

FreeBayes output

vibansal commented Oct 23, 2019

mullerbsf commented Oct 18, 2019 •

edited

Loading

vibansal commented Oct 21, 2019 •

edited

Loading

mullerbsf commented Oct 22, 2019 •

edited

Loading