Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does ivar handle samtools mpileup output for overlapping Paired-End Reads? #113

Open
ggrimes opened this issue Dec 13, 2021 · 1 comment

Comments

@ggrimes
Copy link

ggrimes commented Dec 13, 2021

When you have overlapping paired-end reads, e.g. paired-reads covering short amplicons how does ivar treat the overlapping base

e.g. for this pairs file with one read having a mismatch at position 22376
#pairs.bam content

@HD     VN:1.0  SO:coordinate
@SQ     SN:MN908947.3   LN:29903
@PG     ID:bowtie2      PN:bowtie2      VN:2.4.2        CL:"/usr/local/bin/bowtie2-align-s --wrapper basic-0 -x ./bowtie2/nCoV-2019.reference --threads 6.0 --local --very-sensitive-local --seed 1 -1 19980GN0002L01_1.trim.fastq.gz -2 19980GN0002L01_2.trim.fastq.gz"
@PG     ID:samtools     PN:samtools     PP:bowtie2      VN:1.11 CL:samtools view -@ 6.0 -F4 -bhS -o 19980GN0002L01.bam -
@PG     ID:samtools.1   PN:samtools     PP:samtools     VN:1.12 CL:samtools sort -@ 6 -o 19980GN0002L01.sorted.bam -T 19980GN0002L01.sorted 19980GN0002L01.bam
@PG     ID:ivar-trim    PN:ivar VN:1.3.1        CL:ivar trim -m 30 -q 20 -e -i 19980GN0002L01.sorted.bam -b subARTIC_primers_v4.bed -p 19980GN0002L01.ivar_trim
@PG     ID:samtools.2   PN:samtools     PP:samtools.1   VN:1.12 CL:samtools sort -@ 6 -o 19980GN0002L01.ivar_trim.sorted.bam -T 19980GN0002L01.ivar_trim.sorted 19980GN0002L01.ivar_trim.bam
M05898:173:000000000-K4CKD:1:2109:17009:11978   163     MN908947.3      22325   44      26S121M =       22343   191     AAGTTTATTTGACTCCTGGTGATTCTTCTTCAGGTTGGACAGCTGGTGCTGCAGCTTATTATGTGGGTTATCTTCAACCTAGGACTTTTCTATTAAAATATAATGAAAATGGAACCATTACAGATGCTGTAGACTGTGCACTTGACC     BBBABFFFFFFFGGGGGGGGGGHHHHHHHHHHHHHHHGHHHGHHHHHGHHHHHHHHHHHHHHHHHHGGGHHHHHHHHHHHHHHGHHHHGHHGHHHHHHGHHHHHHHHGHHHHHHHGHHHHHHHHGFHHHHHGHHHHFHGHHHHHGHH     AS:i:286        XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:143        YS:i:287        YT:Z:CP XA:i:16
M05898:173:000000000-K4CKD:1:2109:17009:11978   83      MN908947.3      22343   44      121M26S =       22303   -191    GGTGCTGCAGCTTATTATGTGGGTTATCTTCAAACTAGGACTTTTCTATTAAAATATAATGAAAATGGAACCATTACAGATGCTGTAGACTGTGCACTTGACCCTCTCTCAGAAACAAAGTGTACGTTGAAATCCTTCACTGTAGAA     HHHHHHHHHHHHHHHHHGGGHHHHHHHHHHHHHEHHHHHHHHHHHHHHGHHHHHFHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGHHHHHFHHHHGGGGGHHHGHHHHHHHHHHHGHHHGGGGGGGGGGGFFFFFFFCCCCC     AS:i:287        XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:33C113     YS:i:286        YT:Z:CP XA:i:17

If I run

samtools mpileup -aa -A -d 600000 -B -Q 0 pairs.bam| ivar variants -p test -q 0 -t 0 -r ../nCoV-2019.reference.fasta.txt -m 0

The output of test.csv

REGION	POS	REF	ALT	REF_DP	REF_RV	REF_QUAL	ALT_DP	ALT_RV	ALT_QUAL	ALT_FREQ	TOTAL_DP	PVAL	PASS	GFF_FEATURE	REF_CODON	REF_AA	ALT_CODON	ALT_AA
MN908947.3	22376	C	A	1	0	31	1	1	0	0.5	2	1	FALSE	NA	NA	NA	NA	NA

With Total Depth of 2

If I change the samtools mpileup option -Q 1 , to skip bases with baseQ/BAQ smaller than ,I don't get any output

REGION	POS	REF	ALT	REF_DP	REF_RV	REF_QUAL	ALT_DP	ALT_RV	ALT_QUAL	ALT_FREQ	TOTAL_DP	PVAL	PASS	GFF_FEATURE	REF_CODON	REF_AA	ALT_CODON	ALT_AA

Should I be changing the per-Base Alignment Quality to >0 for mpileup?

There is a thread here on the issue, which suggests setting -Q0 also disables overlap read detection.

samtools/samtools#1036

from this thread someone states

The overlap detection is a hack on top of the original code. It lowers base qualities of overlapping bases so that they are effectively removed from variant calling. This implies that the overlapping bases are counted twice in raw depth calculation, but are counted correctly when counting only high-quality bases. 

Any help with how I should process this would be greatly appreciated.

Software Versions

iVar version 1.3.1
samtools 1.12
Using htslib 1.12

Thanks,

Graeme

@ggrimes
Copy link
Author

ggrimes commented Dec 13, 2021

I should also that mention mpileup sets the lower quality base to BAQ of zero in the case of overlapping reads but the depth is reported as 2

samtools mpileup -aa -A -d 600000 -B -Q 0 pairs.bam|grep 22376
[mpileup] 1 samples in 1 input files
MN908947.3	22376	N	2	Ca	@!

In the case of -Q 1 only the higher quality base is reported with a depth of 1

samtools mpileup -aa -A -d 600000 -B -Q 1 pairs.bam|grep 22376
[mpileup] 1 samples in 1 input files
MN908947.3	22376	N	1	C	@

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant