Missing reads creating phyloseq object after dada2 pipeline #1746

GingerMATE · 2024-04-22T11:23:59Z

Hello,

im encountering a similar issue to #1681. After completing the Dada2 pipeline according to the tutorial, i created a phyloseq object, but in doing so i loose reads in almost all my samples. Contrary to issue #1681 none of my samples go to 0 reads but some loose over 50%. I also tried @Rosiekstein solution to that problem, but it didnt work for me.

Here is an excerpt from my track_reads file with added entry for when i create my phyloseq object:

sample | input | filtered | dadaF | dadaR | merged | nonchim | finalPercReadsKept % | reads physeq obj
ArcB1 | 82650 | 47038 | 45576 | 46265 | 43855 | 42770 | 51.7 | 42770
AuB1 | 90559 | 56066 | 54249 | 55233 | 52257 | 50278 | 55.5 | 32479
AuB2 | 100718 | 62615 | 59976 | 60933 | 57438 | 55009 | 54.6 | 53549
AuB3 | 137090 | 81540 | 80356 | 80983 | 79213 | 76150 | 55.5 | 27221

There is a large iscrepancy between the seqtab. nochim reads and the phyloseq object.

Here is the code i use to create my phyloseq object:

Read sample Metadata and sample names

sample_metadata <- read.table("dna_metadata.csv", header=TRUE, sep=",", stringsAsFactors=FALSE)
rownames(sample_metadata) <- sample_metadata$sample_id
head(sample_metadata)

Read asv and taxa files

asv_counts_table <- readRDS("seqtab_nochim.rds")
asv_seqs <- colnames(asv_counts_table)
asv_headers <- vector(dim(asv_counts_table)[2], mode="character")
for (i in 1:dim(asv_counts_table)[2]) {
asv_headers[i] <- paste("ASV", i, sep="_")
}
colnames(asv_counts_table) <- asv_headers
head(asv_counts_table)
class(asv_counts_table)
sample_id <- sample_metadata$sample_id
rownames(asv_counts_table) <- sample_id

load taxa file

taxa_with_species <- read.table("taxa_.csv", header=TRUE, row.names=1, sep=",", stringsAsFactors=FALSE)
head(taxa_with_species)
class(taxa_with_species)
taxa_with_species <- as.matrix(taxa_with_species)

Create phyloseq object

physeq_ps137 <- phyloseq(otu_table(asv_counts_table, taxa_are_rows=FALSE),
tax_table(taxa_with_species), sample_data(sample_metadata))

physeq_ps137
readcount(physeq_ps137)

#And this is the code i used to create the seqtab.nochim table in the Dada2 pipeline:

seqtab <- makeSequenceTable(mergers)

output_directory="/Analysis/Pipeline"

saveRDS(mergers, "/Analysis/RData/mergers.rds")

save.image(file.path(output_directory, "mergers.RData"))

#Inspect the merger data.frame from the first sample

head(mergers)

class(mergers)

dim(seqtab)

#Inspect distribution of sequence lengths

table(nchar(getSequences(seqtab)))

seqtab.nochim <- removeBimeraDenovo(seqtab, method="consensus", multithread=TRUE, verbose=TRUE) # minFoldParentOverAbundance = 8, multithread= parallel::detectCores(),

table(nchar(getSequences(seqtab.nochim)))

dim(seqtab.nochim)

sum(seqtab.nochim)/sum(seqtab)

saveRDS(seqtab, "/Analysis/RData/seqtab.rds")

saveRDS(seqtab.nochim, "/Analysis/RData/seqtab_nochim.rds")

Any idea what could cause the reads to disappear? Thanks in advance!

GingerMATE · 2024-04-22T11:43:58Z

I also dont fully understand what exactly the readcount function does. Maybe thats my mistake?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing reads creating phyloseq object after dada2 pipeline #1746

Missing reads creating phyloseq object after dada2 pipeline #1746

GingerMATE commented Apr 22, 2024 •

edited

Loading

GingerMATE commented Apr 22, 2024

Missing reads creating phyloseq object after dada2 pipeline #1746

Missing reads creating phyloseq object after dada2 pipeline #1746

Comments

GingerMATE commented Apr 22, 2024 • edited Loading

Read sample Metadata and sample names

Read asv and taxa files

load taxa file

Create phyloseq object

GingerMATE commented Apr 22, 2024

GingerMATE commented Apr 22, 2024 •

edited

Loading