Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The number of primer hits much more less than the total reads. #2087

Open
MizanurRahman24 opened this issue Feb 26, 2025 · 2 comments
Open

Comments

@MizanurRahman24
Copy link

Hi,
I am processing raw NovaSeq data using DADA2 and have used Cutadapt for primer removal. I used the following command to identify primer sequences in the raw reads and then remove them using Cutadapt.

"
#Identify primers
FWD <- "GTGYCAGCMGCCGCGGTAA" ## 515f
REV <- "GGACTACNVGGGTWTCTAAT" ## 806r

allOrients <- function(primer) {

Create all orientations of the input sequence

require(Biostrings)
dna <- DNAString(primer) # The Biostrings works w/ DNAString objects rather than character vectors
orients <- c(Forward = dna, Complement = complement(dna), Reverse = reverse(dna),
RevComp = reverseComplement(dna))
return(sapply(orients, toString)) # Convert back to character vector
}
FWD.orients <- allOrients(FWD)
REV.orients <- allOrients(REV)
FWD.orients
REV.orients
fnFs.filtN <- file.path(getwd(), "filtN", basename(fnFs)) # Put N-filterd files in filtN/ subdirectory
fnRs.filtN <- file.path(getwd(), "filtN", basename(fnRs))
filterAndTrim(fnFs, fnFs.filtN, fnRs, fnRs.filtN, maxN = 0, multithread = TRUE)

primerHits <- function(primer, fn) {

Counts number of reads in which the primer is found

nhits <- vcountPattern(primer, sread(readFastq(fn)), fixed = FALSE)
return(sum(nhits > 0))
}
rbind(FWD.ForwardReads = sapply(FWD.orients, primerHits, fn = fnFs.filtN[[1]]),
FWD.ReverseReads = sapply(FWD.orients, primerHits, fn = fnRs.filtN[[1]]),
REV.ForwardReads = sapply(REV.orients, primerHits, fn = fnFs.filtN[[1]]),
REV.ReverseReads = sapply(REV.orients, primerHits, fn = fnRs.filtN[[1]]))

"

Image

But when I checked the results of primer hits, the number of primer sequence hits was much lower than the total number of reads, which is 433,332. My professor asked me why the number of primer hits was significantly lower than the total reads, but I couldn't answer. I tried to study this topic, but unfortunately, I couldn't find a solution to this issue. Could you please help me understand why the number of primer hits is much lower than the total number of reads?

@benjjneb
Copy link
Owner

Does your amplicon sequencing protocol lead to sequencing of the primers? The most common 515f/806r sequencing protocol is the EMP protocol, and that does not include the primers on the sequenced portions of the amplicons. Hence you will find little to no primer sequences in your data.

This is something the sequencing provider should know.

@MizanurRahman24
Copy link
Author

Hi @benjjneb
Thank you very much for your prompt response. I checked the sequencing information, and my data was generated following the EMP protocol for the 16S rRNA V4 region. Since the EMP protocol does not include primers in the sequenced portion, should I check for primer sequences before data processing? I have already checked and found some hits in the sequences. Should I remove these primers using Cutadapt, or can I proceed without trimming them, considering that the EMP protocol typically does not include primer sequences?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants