Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allele Balance #14

Open
mullerbsf opened this issue Oct 11, 2019 · 7 comments
Open

Allele Balance #14

mullerbsf opened this issue Oct 11, 2019 · 7 comments

Comments

@mullerbsf
Copy link

Dear Vikas Bansal,

I am trying to estimate the allele balance for each sample genotyped with CRISP. Is there a way to obtain that information from the output?

For example, from a total of 100 reads supporting the call, a heterozygous sample might have 80 reads to the reference allele and 20 reads to the alternative allele.

From the FORMAT field, I see the following options: GT:GQ:DP:ADf:ADr:ADb. The total depth seems to be included, but it is not clear to me how to obtain how many of these reads were relative to each allele.

Thank you!

@vibansal
Copy link
Owner

ADf is a comma separated list of the read counts for each allele on the forward strand. You can sum up the ADf, ADr and ADb values for each allele to get the allele balance.

@mullerbsf
Copy link
Author

mullerbsf commented Oct 18, 2019

Thank you.

I am still confused about some of the results, as often they seem to not add up. Take the examples below that I retrieved from a representative marker:

FORMAT: GT:GQ:DP:ADf:ADr:ADb

SAMPLE 1: 0/0/0/1/1:0:18:2,3:4,1:0,0
SAMPLE 2: 0/0/1/1/1:3:25:2,9:4,3:0,0

It seems like the total depth is 18 and 25, respectively. The sums of the ADf, ADr and ADb fields, however, are not equal to the total depth.

I appreciate your attention.
Regards

@vibansal
Copy link
Owner

vibansal commented Oct 21, 2019

The total read depth (DP) includes all reads. Some of these may be filtered out for the calculation of ADf, ADr and ADb which are used for variant calling.

Update: this is not correct

@mullerbsf
Copy link
Author

Dear Bansal,

Then, I did not understand why the DP definition from CRISP VCF output says: "filtered reads only"?
DP: "Total number of reads (+strand,-strand,bidirectional) across all pools (filtered reads only)".

This definition generates all the misunderstandings about CRISP output.

Thank you!

@vibansal
Copy link
Owner

I have checked the code and the above definition is correct.

DP includes counts for all alleles at the site so it can be greater than the counts for the two alleles output in the VCF. For example, a site with a ref allele = 'A' and variant allele = 'C' can have reads supporting 'G' or 'T'. These reads are included in the DP count.

@mullerbsf
Copy link
Author

mullerbsf commented Oct 22, 2019

Dear Bansal,

I checked the BAM files, and your argument about other reads supporting other possible alleles is not valid (see attached the screenshot from IGV). This a clear SNP with only reads supporting REF=C and ALT=T, no reads are supporting the A or G. Therefore, the DP field from CRISP output is showing the total reads depth without filtering. For the ADf, ADr and ADb there are some filters applied in the counts.

SAMPLE 1

SAMPLE1_D710_D501

SAMPLE 2

SAMPLE2_D701_D501

When compare the CRISP and FreeBayes outputs, the FreeBayes is giving the correct number of DP and AD fields without filtering.

CRISP output

1 19545 Chla_rei_1_19545 C T 3470 PASS
FORMAT: GT:GQ:DP:ADf:ADr:ADb
SAMPLE 1: 0/0/0/0/1:9:40:10,5:11,2:1,0
SAMPLE 2: 0/0/0/0/1:14:51:17,2:9,4:0,1

FreeBayes output

1 19545 Chla_rei_1_19545 C T 12110.9 .
FORMAT: GT:DP:AD:RO:QR:AO:QA
SAMPLE 1: 0/0/0/1/1:41:23,18:23:893:18:552
SAMPLE 2: 0/0/1/1/1:52:26,26:26:1057:26:909

I want to use the CRISP output to run SNPGenie software. To be able to do this, I need to double-check with you about the DP field and all the ADf, ADr and ADb.

Based on my interpretation of CRISP output, I will need to create a code to sum all these fields to replace the DP and AD fields.

Then, for the first sample, it will be something like this:
FORMAT: GT:GQ:DP:ADf:ADr:ADb
SAMPLE 1: 0/0/0/0/1:9:40:10,5:11,2:1,0

The update file will be:
FORMAT: GT:GQ:DP:AD
SAMPLE 1: 0/0/0/0/1:9:29:22,7

Can you confirm if this is correct?

Thank you very much for this help.

@vibansal vibansal reopened this Oct 23, 2019
@vibansal
Copy link
Owner

Yes, you are correct. I will have to update the code to output the correct DP value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants