You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I am processing raw NovaSeq data using DADA2 and have used Cutadapt for primer removal. I used the following command to identify primer sequences in the raw reads and then remove them using Cutadapt.
But when I checked the results of primer hits, the number of primer sequence hits was much lower than the total number of reads, which is 433,332. My professor asked me why the number of primer hits was significantly lower than the total reads, but I couldn't answer. I tried to study this topic, but unfortunately, I couldn't find a solution to this issue. Could you please help me understand why the number of primer hits is much lower than the total number of reads?
The text was updated successfully, but these errors were encountered:
Does your amplicon sequencing protocol lead to sequencing of the primers? The most common 515f/806r sequencing protocol is the EMP protocol, and that does not include the primers on the sequenced portions of the amplicons. Hence you will find little to no primer sequences in your data.
This is something the sequencing provider should know.
Hi @benjjneb
Thank you very much for your prompt response. I checked the sequencing information, and my data was generated following the EMP protocol for the 16S rRNA V4 region. Since the EMP protocol does not include primers in the sequenced portion, should I check for primer sequences before data processing? I have already checked and found some hits in the sequences. Should I remove these primers using Cutadapt, or can I proceed without trimming them, considering that the EMP protocol typically does not include primer sequences?
Hi,
I am processing raw NovaSeq data using DADA2 and have used Cutadapt for primer removal. I used the following command to identify primer sequences in the raw reads and then remove them using Cutadapt.
"
#Identify primers
FWD <- "GTGYCAGCMGCCGCGGTAA" ## 515f
REV <- "GGACTACNVGGGTWTCTAAT" ## 806r
allOrients <- function(primer) {
Create all orientations of the input sequence
require(Biostrings)
dna <- DNAString(primer) # The Biostrings works w/ DNAString objects rather than character vectors
orients <- c(Forward = dna, Complement = complement(dna), Reverse = reverse(dna),
RevComp = reverseComplement(dna))
return(sapply(orients, toString)) # Convert back to character vector
}
FWD.orients <- allOrients(FWD)
REV.orients <- allOrients(REV)
FWD.orients
REV.orients
fnFs.filtN <- file.path(getwd(), "filtN", basename(fnFs)) # Put N-filterd files in filtN/ subdirectory
fnRs.filtN <- file.path(getwd(), "filtN", basename(fnRs))
filterAndTrim(fnFs, fnFs.filtN, fnRs, fnRs.filtN, maxN = 0, multithread = TRUE)
primerHits <- function(primer, fn) {
Counts number of reads in which the primer is found
nhits <- vcountPattern(primer, sread(readFastq(fn)), fixed = FALSE)
return(sum(nhits > 0))
}
rbind(FWD.ForwardReads = sapply(FWD.orients, primerHits, fn = fnFs.filtN[[1]]),
FWD.ReverseReads = sapply(FWD.orients, primerHits, fn = fnRs.filtN[[1]]),
REV.ForwardReads = sapply(REV.orients, primerHits, fn = fnFs.filtN[[1]]),
REV.ReverseReads = sapply(REV.orients, primerHits, fn = fnRs.filtN[[1]]))
"
But when I checked the results of primer hits, the number of primer sequence hits was much lower than the total number of reads, which is 433,332. My professor asked me why the number of primer hits was significantly lower than the total reads, but I couldn't answer. I tried to study this topic, but unfortunately, I couldn't find a solution to this issue. Could you please help me understand why the number of primer hits is much lower than the total number of reads?
The text was updated successfully, but these errors were encountered: