The number of primer hits much more less than the total reads. #2087

MizanurRahman24 · 2025-02-26T02:44:03Z

Hi,
I am processing raw NovaSeq data using DADA2 and have used Cutadapt for primer removal. I used the following command to identify primer sequences in the raw reads and then remove them using Cutadapt.

"
#Identify primers
FWD <- "GTGYCAGCMGCCGCGGTAA" ## 515f
REV <- "GGACTACNVGGGTWTCTAAT" ## 806r

allOrients <- function(primer) {

Create all orientations of the input sequence

require(Biostrings)
dna <- DNAString(primer) # The Biostrings works w/ DNAString objects rather than character vectors
orients <- c(Forward = dna, Complement = complement(dna), Reverse = reverse(dna),
RevComp = reverseComplement(dna))
return(sapply(orients, toString)) # Convert back to character vector
}
FWD.orients <- allOrients(FWD)
REV.orients <- allOrients(REV)
FWD.orients
REV.orients
fnFs.filtN <- file.path(getwd(), "filtN", basename(fnFs)) # Put N-filterd files in filtN/ subdirectory
fnRs.filtN <- file.path(getwd(), "filtN", basename(fnRs))
filterAndTrim(fnFs, fnFs.filtN, fnRs, fnRs.filtN, maxN = 0, multithread = TRUE)

primerHits <- function(primer, fn) {

Counts number of reads in which the primer is found

nhits <- vcountPattern(primer, sread(readFastq(fn)), fixed = FALSE)
return(sum(nhits > 0))
}
rbind(FWD.ForwardReads = sapply(FWD.orients, primerHits, fn = fnFs.filtN[[1]]),
FWD.ReverseReads = sapply(FWD.orients, primerHits, fn = fnRs.filtN[[1]]),
REV.ForwardReads = sapply(REV.orients, primerHits, fn = fnFs.filtN[[1]]),
REV.ReverseReads = sapply(REV.orients, primerHits, fn = fnRs.filtN[[1]]))

"

But when I checked the results of primer hits, the number of primer sequence hits was much lower than the total number of reads, which is 433,332. My professor asked me why the number of primer hits was significantly lower than the total reads, but I couldn't answer. I tried to study this topic, but unfortunately, I couldn't find a solution to this issue. Could you please help me understand why the number of primer hits is much lower than the total number of reads?

benjjneb · 2025-02-26T02:58:13Z

Does your amplicon sequencing protocol lead to sequencing of the primers? The most common 515f/806r sequencing protocol is the EMP protocol, and that does not include the primers on the sequenced portions of the amplicons. Hence you will find little to no primer sequences in your data.

This is something the sequencing provider should know.

MizanurRahman24 · 2025-02-26T05:24:03Z

Hi @benjjneb
Thank you very much for your prompt response. I checked the sequencing information, and my data was generated following the EMP protocol for the 16S rRNA V4 region. Since the EMP protocol does not include primers in the sequenced portion, should I check for primer sequences before data processing? I have already checked and found some hits in the sequences. Should I remove these primers using Cutadapt, or can I proceed without trimming them, considering that the EMP protocol typically does not include primer sequences?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The number of primer hits much more less than the total reads. #2087

The number of primer hits much more less than the total reads. #2087

MizanurRahman24 commented Feb 26, 2025

benjjneb commented Feb 26, 2025

MizanurRahman24 commented Feb 26, 2025

The number of primer hits much more less than the total reads. #2087

The number of primer hits much more less than the total reads. #2087

Comments

MizanurRahman24 commented Feb 26, 2025

Create all orientations of the input sequence

Counts number of reads in which the primer is found

benjjneb commented Feb 26, 2025

MizanurRahman24 commented Feb 26, 2025