DspikeIn

Welcome to the DspikeIn R package repository!

📚 Table of Contents

Getting Started
Data Preparation
Processing
Bias Correction
Visualization
- Visualization
- Detect common ASVs/OTUs
Credits
- Acknowledgement
- Citing DspikeIn

DspikeIn Package

The DspikeIn package was developed to facilitate:

Verifying the phylogenetic distances of ASVs/OTUs resulting from spiked species.
Preprocessing data.
Calculating the spike-in scaling factor.
Converting relative abundance to absolute abundance.
Estimating acceptable spiked species retrieval %
Data transformation, Differential abundance and visualization.

Tetragenococcus halophilus and Dekkera bruxellensis were selected as taxa to spike into gut microbiome samples based on our previous studies WalkerLab.

GCN Normalization with QIIME2 Plugin

Opinions on gene copy number (GCN) correction for the 16S rRNA marker vary, with proponents citing improved accuracy and critics noting limitations. While GCN correction is not included in the DspikeIn package, it can be applied to relative abundance counts using tools like the q2-gcn-norm plugin in Qiime2 (rrnDB v5.7) or methods outlined by Louca et al., 2018,including PICRUSt, CopyRighter, and PAPRICA. Due to variability in rDNA gene copy numbers (Lavrinienko et al., 2021), GCN corrections were not applied. However, targeted adjustments can be made to prevent overestimating specific fungal taxa.

Command Example

qiime gcn-norm copy-num-normalize \
  --i-table table-dada2.qza \
  --i-taxonomy taxonomy.qza \
  --o-gcn-norm-table table-normalized.qza

Installation

To install the DspikeIn package, follow these steps...

If you encounter issues installing the package due to missing dependencies, follow these steps to install all required packages first:

Step 1: Install Required Packages

To install the required packages, use the following script:

CRAN packages

# Install CRAN packages
install.packages(c("stats", "dplyr", "ggplot2", "flextable","ggpubr", "randomForest", "ggridges", "ggalluvial","tibble", "matrixStats", "RColorBrewer", "ape", "rlang", "scales", "magrittr", "phangorn"))

# Load CRAN packages
lapply(c("stats", "dplyr", "ggplot2", "flextable","ggpubr","randomForest", "ggridges", "ggalluvial","tibble", "matrixStats", "RColorBrewer", "ape", "rlang", "scales", "magrittr", "phangorn"), library, character.only = TRUE)

Bioconductor Packages

# Install BiocManager if not installed
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")

# Install Bioconductor packages
BiocManager::install(c("phyloseq", "msa", "DESeq2","ggtree", "edgeR", "Biostrings", "DECIPHER", "microbiome"))

# Load Bioconductor packages
lapply(c("phyloseq", "msa", "DESeq2", "edgeR", "Biostrings","ggtree", "DECIPHER", "microbiome"), library, character.only = TRUE)

GitHub Packages

# Install remotes if not installed
install.packages("remotes")

# Install GitHub packages
remotes::install_github("mikemc/speedyseq")
remotes::install_github("microsud/microbiomeutilities")

# Load GitHub packages
library(speedyseq)
library(microbiomeutilities)

Step 2: Install DspikeIn Package

# Installation
#Instructions for how to install the DspikeIn package.

# Using devtools
install.packages("devtools")
devtools::install_github("mghotbi/DspikeIn")
library(DspikeIn)

# Or using remotes
install.packages("remotes")
remotes::install_github("mghotbi/DspikeIn")
library(DspikeIn)

Acknowledgement

DspikeIn builds on the excellent phyloseq package.

Requirements

DspikeIn works with a phyloseq object containing 7 taxonomic ranks. To estimate absolute abundance, the spiked.volume column in the metadata is required.

To remove strain from the taxonomic ranks

library(phyloseq)
# Function to remove strain information from taxonomy columns
remove_strain_info <- function(tax_table) {
  # Define the regex pattern for common strain identifiers
  pattern <- "Strain.*|strain.*|\\s*\\[.*\\]|\\s*\\(.*\\)"  # Adjust the regex to match specific strain formats
  # Apply the pattern to each column of the taxonomy table
  for (col in colnames(tax_table)) { 
    tax_table[, col] <- gsub(pattern, "", tax_table[, col])  # Remove strain info
    tax_table[, col] <- trimws(tax_table[, col])  # Trim trailing whitespace
  }
    return(tax_table)}
# Step 1: Extract the taxonomy table
taxonomy <- tax_table(ps)
# Step 2: Remove strain information (including the `Strain` column)
cleaned_taxonomy <- remove_strain_info(taxonomy)
# Remove the `Strain` column if it exists
if ("Strain" %in% colnames(cleaned_taxonomy)) {
  cleaned_taxonomy <- cleaned_taxonomy[, colnames(cleaned_taxonomy) != "Strain"]}
# Step 3: Update the taxonomy table in the phyloseq object
tax_table(ps) <- cleaned_taxonomy
# Step 4: Verify the changes
print(head(tax_table(ps)))  # Display the first few rows

To add species rank to the taxonomic ranks

library(phyloseq)
# Step 1: Extract the taxonomy table from the phyloseq object
taxonomy <- tax_table(ps)
# Step 2: Add a `species` column based on the Genus column and unique identifiers
taxonomy[, "species"] <- paste0(taxonomy[, "Genus"], "_OTU", seq_len(nrow(taxonomy)))
# Step 3: Update the taxonomy table in the phyloseq object
tax_table(ps) <- taxonomy
# Step 4: Verify the changes
print(head(tax_table(ps)))  # Display the first few rows to confirm

Dataset for practicing DspikeIn Package 📊

You can download the practice dataset for the DspikeIn package by clicking the link below:
👉 Download Dataset

# Make a new directory and set it as your working directory
create_directory("DspikeIn_16S_OTU", set_working_dir = TRUE)
getwd()


# Therefore, please start by creating a phyloseq object and follow the instructions.
# To create your phyloseq object, please refer to the phyloseq tutorial (https://joey711.github.io/phyloseq).
# The phyloseq object needs to include OTU/ASV, Taxa, phylogenetic tree, DNA reference, 
# and metadata containing spiked species volume, starting from 0 (no spike species added) to 4 (4 μl of spike cell added).

# Note: DspikeIn requires 'spiked.volume'; any other format is not readable."¯\\_(ツ)_/¯  ¯\\_(ツ)_/¯  ¯\\_(ツ)_/¯  ¯\\_(ツ)_/¯"

# We are going to work with a subset of the dataset for both ASVs and OTUs
# approaches to accelerate this workshop.

physeq_16SOTU <-readRDS("Relative16SOTU.rds")
physeq_ITSOTU <-readRDS("RelativeITSOTU.rds")

physeq_16SOTU <- tidy_phyloseq(physeq_16SOTU)

# Ensure your metadata contains spiked volumes:
physeq_16SOTU@sam_data$spiked.volume

Prepare the required information for our Protocol

Pre-process one Spiked-in Species

# Required Information 
# Please note that the Spike cell numbers, species name, and selected hashcodes are customizable and can be tailored to the specific needs of individual studies.
# Moreover, to proceed with the DspikeIn package, you only need to select one method to specify your spiked species: either by hashcodes or species name.

library(phyloseq)
# 16S rRNA
# presence of 'spiked.volume' column in metadata
spiked_cells <-1847
species_name <- spiked_species <- c("Tetragenococcus_halophilus", "Tetragenococcus_sp")
merged_spiked_species<-"Tetragenococcus_halophilus"
Tetra <- subset_taxa(physeq_16SOTU,Species=="Tetragenococcus_halophilus" | Species=="Tetragenococcus_sp")
hashcodes <- row.names(phyloseq::tax_table(Tetra))

# ITS rDNA
# presence of 'spiked.volume' column in metadata
spiked_cells <- 733
species_name <- spiked_species<-merged_spiked_species<-"Dekkera_bruxellensis"
Dekkera <- subset_taxa(physeq_ITSOTU, Species=="Dekkera_bruxellensis")
hashcodes <- row.names(phyloseq::tax_table(Dekkera))

Prepare the Required Information for the Synthetic Community

Pre-process List of Spiked-in Species

# Define the list of spiked-in species
spiked_species <- c("Pseudomonas aeruginosa", "Escherichia coli", "Clostridium difficile")

# Define the corresponding copy numbers for each spiked-in species
spiked_cells_list <- c(10000, 20000, 15000) # or equal number of copies-> spiked_cells_list <- c(200, 200, 200)

Validation using phylogenetic tree

Plot phylogenetic tree with Bootstrap Values

This step will be helpful for handling ASVs with/without Gene Copy Number Correction This section demonstrates how to use various functions from the package to plot and analyze phylogenetic trees.

# In case there are several OTUs/ASVs resulting from the spiked species, you may want to check the phylogenetic distances.
# We first read DNA sequences from a FASTA file, to perform multiple sequence alignment and compute a distance matrix using the maximum likelihood method, then we construct a phylogenetic tree
# Use the Neighbor-Joining method  based on a Jukes-Cantor distance matrix and plot the tree with bootstrap values.
# we compare the Sanger read of Tetragenococcus halophilus with the FASTA sequence of Tetragenococcus halophilus from our phyloseq object.

# Subset the phyloseq object to include only Tetragenococcus species first
Tetra <- subset_taxa(Tetra, !is.na(taxa_names(Tetra)) & taxa_names(Tetra) != "")
tree <- phy_tree(Tetra)
ref_sequences_Tetra <- refseq(Tetra)
writeXStringSet(ref_sequences_Tetra, "ref_sequences_Tetra.fasta")
# postitive control 
Tetra_control_sequences <- Biostrings::readDNAStringSet("~/Tetra_Ju.fasta")

# combine the Tetragenococcus FASTA files (from your dataset and the Sanger fasta of Tetragenococcus, positive control)
combined_sequences <- c(ref_sequences_Tetra, Tetra_control_sequences)
writeXStringSet(combined_sequences, filepath = "~/combined_fasta_file")
combined_sequences <- Biostrings::readDNAStringSet("~/combined_fasta_file")

# Plot Neighbor-Joining tree with bootstrap values to compare Tetragenococcus in your dataset with your positive control
fasta_path <- "~/combined_fasta_file"
plot_tree_nj(fasta_path, output_file = "neighbor_joining_tree_with_bootstrap.png")


# Plot phylogenetic tree
plot_tree_custom(Tetra, output_prefix = "p0", width = 18, height = 18, layout = "circular")

# Plot the tree with glommed OTUs at 0.2 resolution/ or modify it
plot_glommed_tree(Tetra, resolution = 0.2, output_prefix = "top", width = 18, height = 18)

# Plot the phylogenetic tree with multiple sequence alignment
plot_tree_with_alignment(Tetra, output_prefix = "tree_alignment", width = 15, height = 15)

# Plot phylogenetic tree with bootstrap values and cophenetic distances
Bootstrap_phy_tree_with_cophenetic(Tetra, output_file = "tree_with_bootstrap_and_cophenetic.png", bootstrap_replicates = 500)

Figure 1.

Neighbor Joining Tree with Bootstrap	Tetra Plot with Bootstrap	Cophenetic tree with Bootstrap

## Aligned Sequences

The result of the aligned sequences is shown below:
DNAStringSet object of length 5:
    width seq                                                                                                                                         names               
[1]   292 -------------------TACGTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGCGC...CTGGTCTGTAACTGACGCTGAGGCTCGAAAGCGTGGGTAGCAAACAGG-------------------- 2ddb215ff668b6a24...
[2]   292 -------------------TACGTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGCGC...CTGGTCTGTAACTGACGCTGAGGCTCGAAAGCGTGGGTAGCAAACAGG-------------------- Tetragenococcus h...
[3]   292 -------------------TACGTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGCGC...CTGGTCTGTAACTGACGCTGAGGCTCGAAAGCGTAGGTAGCAAACAGG-------------------- 65ab824f29da71010...
[4]   292 -------------------TACGTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGCGC...CTGGTCTGTAACTGACGCTGAGACTCGAAAGCGTGGGTAGCAAACAGG-------------------- e49935179f23c00fb...
[5]   292 -------------------TACGTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGCGC...CTGGACTGTAACTGACGCTGAGGCTCGAAAGCGTGGGGAGCAAACAGG-------------------- 0350f990080b4757a...

We selected the OTU approach using de novo robust clustering algorithms at a 97% similarity threshold, following the methods outlined by Westcott and Schloss (2015).

Subsetting and Preprocessing Spiked Data

Subset the part of the data which is spiked. Keep solely spiked samples using the spiked.volume column.

# Subset spiked samples (264 samples are spiked)
spiked_16S_OTU <- subset_samples(physeq16S_OTU, spiked.volume %in% c("2", "1"))
spiked_16S_OTU <- tidy_phyloseq(spiked_16S_OTU)

Examine Your Count Table/Biom File Before Going Further

# Summarize the initial statistics for ASVs/OTUs
initial_stat_ASV <- summ_phyloseq_ASV_OTUID(spiked_16S_OTU)

# Summarize the initial statistics sample-wise
initial_stat_sampleWise <- summ_phyloseq_sampleID(spiked_16S_OTU)

# Summarize the count data
summ_count_phyloseq(physeq_16S_OTU)

# Check the summary statistics
# Ensure the input is in dataframe format for this function
calculate_summary_stats_table(initial_stat_sampleWise)

Check if transformation is required for spike volume variation.

# Adjust abundance by one-third
readAdj16S <- adjust_abundance_one_third(spiked_16S_OTU, factor = 3)
summ_count_phyloseq(readAdj16S)

# Random subsampling with reduction factor foe count and taxa
red16S <- random_subsample_WithReductionFactor(spiked_16S_OTU, reduction_factor = 3)
summ_count_phyloseq(red16S)

Preprocessing for Scaling Factor Calculation

Preprocessing One Species Scaling Factor

If the spiked species appear in several OTUs/ASVs, check their phylogenetic distances and compare them to the reference sequences of your positive control.

# Modify the threshold of acceptable spiked species % as needed. 
# For detailed guidance on acceptable thresholds (passed_range), 
# please refer to the instructions in our upcoming paper.

# Merge the spiked species
# merge_method = "max": Selects the maximum abundance among OTUs/ASVs of the spiked species, ensuring the most abundant ASV is retained.
# merge_method = "sum": Sums the abundances of OTUs/ASVs of the spiked species, providing a cumulative total.

species_name <- "Tetragenococcus_halophilus"

# Merge using "sum" or "max" method
Spiked_16S_sum_scaled <- Pre_processing_species(
  spiked_16S_OTU, 
  species_name, 
  merge_method = "sum", 
  output_file = "merged_physeq_sum.rds")

# Merge hashcodes using "sum" or "max" method
Spiked_16S_sum_scaled <- Pre_processing_hashcodes(
  spiked_16S_OTU, 
  hashcodes, 
  merge_method = "sum", 
  output_prefix = "merged_physeq_sum")

# Summarize count
summ_count_phyloseq(Spiked_16S_sum_scaled)

# Tidy phyloseq object
Spiked_16S_OTU_scaled <- tidy_phyloseq(Spiked_16S_sum_scaled)


# Customize the passed_range and merged_spiked_species/merged_spiked_hashcodes based on your preferences.

Spiked Species Retrieval %

Calculate Spiked Species Retrieval % for One Species

# Customize the passed_range and merged_spiked_species/merged_spiked_hashcodes based on your preferences.
# passed_range = "c(0.1, 11)": threshold of acceptable spiked species %
# Select either merged_spiked_species or merged_spiked_hashcodes

merged_spiked_species <- c("Tetragenococcus_halophilus")
result <- calculate_spike_percentage(
  Spiked_16S_sum_scaled, 
  merged_spiked_species, 
  passed_range = c(0.1, 11))
calculate_summary_stats_table(result)

# Define your merged_spiked_hashcodes
merged_Tetra <- subset_taxa(
  Spiked_16S_OTU_scaled, 
  Species == "Tetragenococcus_halophilus")

merged_spiked_hashcodes <- row.names(tax_table(merged_Tetra))
result <- calculate_spike_percentage(
  Spiked_16S_OTU_scaled,  
  merged_spiked_hashcodes, 
  passed_range = c(0.1, 11))

calculate_summary_stats_table(result)

# If you decide to remove the failed reads and go forward with passed reads, here is what you need to do
# You can also go forward with the original file and remove the failed reads 
# after converting relative to absolute abundance

# Filter to get only the samples that passed
passed_samples <- result$Sample[result$Result == "passed"]

# Subset the original phyloseq object to keep only the samples that passed
passed_physeq <- prune_samples(
  passed_samples, 
  Spiked_16S_OTU_scaled)

Calculate Spiked Species Retrieval % for List of Species

#ps= is phyloseq obj
spiked_species_list <- c("Pseudomonas aeruginosa", "Escherichia coli", "Clostridium difficile")
result <- calculate_spike_percentage_list(ps, merged_spiked_species = spiked_species_list, passed_range = c(0.1, 10)) #change the range based on your system specifications
print(result)

Preprocessing List of Species Scaling Factor

spiked_species <- c("Pseudomonas aeruginosa", "Escherichia coli", "Clostridium difficile")
merged_physeq_sum <- Pre_processing_species_list(physeq, spiked_species, merge_method = "sum")

Estimating Scaling Factors After Pre-Processing

Scaling Factors for One Spiked Species

To estimate scaling factors, ensure you have the merged_spiked_species data, which contains the merged species derived from the spiking process. As we have already merged either hashcodes or spiked species and are aware of the contents of the taxa table, we can proceed from here with merged_spiked_species.

# Define the merged spiked species
merged_spiked_species <- c("Tetragenococcus_halophilus")

# Calculate spikeIn factors
result <- calculate_spikeIn_factors(Spiked_16S_OTU_scaled, spiked_cells, merged_spiked_species)

# Check the outputs
scaling_factors <- result$scaling_factors
physeq_no_spiked <- result$physeq_no_spiked
spiked_16S_total_reads <- result$spiked_16S_total_reads
spiked_species_reads <- result$spiked_species_reads

Scaling Factors for a List of Spiked Species

This example demonstrates how to calculate scaling factors after merging redundant spike-in species.

# Define spiked-in species and their corresponding cell counts
spiked_species_list <- list(
  c("Pseudomonas aeruginosa"),
  c("Escherichia coli"),
  c("Clostridium difficile"))

spiked_cells_list <- c(10000, 20000, 15000)

scaling_factors <- calculate_list_average_scaling_factors(
  merged_physeq_sum,
  spiked_species_list,
  spiked_cells_list,
  merge_method = "sum" ) # or max

Convert Relative Counts to Absolute Counts and Create a New Phyloseq Object

# Convert relative counts data to absolute counts
absolute <- convert_to_absolute_counts(Spiked_16S_OTU_scaled, scaling_factors)
absolute_counts <- absolute$absolute_counts
physeq_absolute_abundance_16S_OTU <- absolute$physeq_obj


# summary statistics 
post_eval_summary <- calculate_summary_stats_table(absolute_counts)
print(post_eval_summary)

Let's check the conclusion and get the report table of spiked species success or failure.

Conclusion

# Define the parameters once.
merged_spiked_species <- c("Tetragenococcus_halophilus")
max_passed_range <- 35

# Subset the phyloseq object to exclude blanks
physeq_absolute_abundance_16S_OTU_perc <- subset_samples(physeq_absolute_abundance_16S_OTU, sample.or.blank != "blank")

# Generate the spike success report and summary statistics
summary_stats <- conclusion(physeq_absolute_abundance_16S_OTU_perc, merged_spiked_species, max_passed_range)
print(summary_stats)

Here is an example of a success or failure report:

#Save your file for later. Please stay tuned for the rest: Comparisons and several visualization methods to show how important it is to convert relative to absolute abundance in the context of microbial ecology.

physeq_absolute_16S_OTU <- tidy_phyloseq(physeq_absolute_abundance_16S_OTU_perc)
saveRDS(physeq_absolute_16S_OTU, "physeq_absolute_16S_OTU.rds")

Normalization and Differential Abundance

# Bolstad, B.M., Irizarry, R.A., Åstrand, M. and Speed, T.P., 2003. A comparison of normalization methods for high-density oligonucleotide array data based on variance and bias. Bioinformatics, 19(2), pp.185-193.
# Gagnon-Bartsch, J.A. and Speed, T.P., 2012. Using control genes to correct for unwanted variation in microarray data. Biostatistics, 13(3), pp.539-552.
# Risso, D., Ngai, J., Speed, T.P. and Dudoit, S., 2014. Normalization of RNA-seq data using factor analysis of control genes or samples. Nature biotechnology, 32(9), pp.896-902.
# Gagnon-Bartsch, J.A., Jacob, L. and Speed, T.P., 2013. Removing unwanted variation from high dimensional data with negative controls. Berkeley: Tech Reports from Dep Stat Univ California, pp.1-112.


# Normalization Methods:
# group_var <- "Diet"  
# result_TMM <- normalization_set(ps, method = "TMM", groups = "group_var")
# result_UQ <- normalization_set(ps, method = "UQ", groups = group_var)
# result_med <- normalization_set(ps, method = "med", groups = group_var)
# result_Pois <- normalization_set(ps, method = "Poisson", groups = "group_var")
# result_DES <- normalization_set(ps, method = "DESeq", groups = "group_var")
# result_med <- normalization_set(ps, method = "med", groups = "group_var")
# result_rle <- normalization_set(ps, method = "rle")
# result_css <- normalization_set(ps, method = "CSS")
# result_tss <- normalization_set(ps, method = "tss")
# result_rar <- normalization_set(ps, method = "rar")
# result_CLR <- normalization_set(ps, method = "clr")


results_edgeR <- perform_and_visualize_DA(
  ps = ps,
  method = "edgeR",
  group_var = "Treatment",
  contrast = c("Control", "Diet"),
  output_csv_path = "DA_edgeR.csv",
  target_glom = "Genus",
  significance_level = 0.05)

print(results_edgeR$plot)
head(results_edgeR$results)  # View significant taxa
ps_sig<- results_edgeR$ps_significant # extract significant taxa in phyloseq obj for plotting

results_DESeq2 <- perform_and_visualize_DA(
  ps = ps,
  method = "DESeq2",                # method
  group_var = "Treatment",          # factor
  contrast = c("Control", "Diet"),  # levels of contrast
  output_csv_path = "DA_DESeq2.csv",
  target_glom = "Genus",
  significance_level = 0.05)

print(results_DESeq2$plot)    # Volcano plot
head(results_DESeq2$results)  # View significant taxa
ps_sig<- results_DESeq2$ps_significant # extract significant taxa in phyloseq obj for plotting

Customized filtering

# Proportion adjustment
normalized_physeq <- proportion_adj(ps, output_file = "proportion_adjusted_physeq.rds")
summ_count_phyloseq(normalized_16S)


# Relativize and filter taxa based on selected thresholds
FT_physeq <- relativized_filtered_taxa(
  ps,
  threshold_percentage = 0.0001,
  threshold_mean_abundance = 0.0001,
  threshold_count = 5,
  threshold_relative_abundance = 0.0001)
summ_count_phyloseq(FT_physeq)

# Adjust prevalence based on the minimum reads
physeq_min <- adjusted_prevalence(ps, method = "min")

Estimating the system-specific optimal range of spiked species retrieval using biological metrics

system specific spiked species retrieval

Previously, Roa et al., 2021 reported an acceptable range of 0.1% to 10% for spiked species retrieval. Here, we demonstrate that retrieved spiked species are correlated with biological metrics such as abundance, richness, evenness, and beta dispersion, indicating that the acceptable range is system-dependent and can be changed based on system specifications.

plot_object <- regression_plot(data = metadata,
x_var = "Richness",  #  metadata needs to be in data frame format
y_var = "Total_Reads_spiked",
 custom_range = c(0.1, 15, 30, 50, 75, 100),  # ranges of percentage 
 plot_title = NULL)  # title/ either NULL or you add it

print(plot_object)

Beta Dispersion (16S)	Evenness (16S)

Visualization

# taxa barplot
# taxa barplot
 #abundance_type = "absolute"/"relative"
 #ps_ABS= phyloseq object with absolute counts
 
 
 bp_free <- taxa_barplot( physeq = ps_ABS,
                          target_glom = "Genus",
                          treatment_variable = "Genotype",
                          fill_variable = "Genus",           # Fill bars by Genus
                          abundance_type = "absolute",
                          facet_variable = "Treatment",      # Facet by Treatment 
                          x_scale = "free",                  # can be "fixed"
                          legend_size = 10,                
                          top_n_taxa = 30,                   # the top 30 taxa
                          xlab=NULL,                          
                          legend_columns = 1,
                          palette = color_palette$MG)  # This is DspikeIn custom color palette (MG)
 print(bp_free$barplot)
 
 #ps_rel= phyloseq object with relative counts
 # x_scale = "fixed" or "free" 
 
 bp_fix <- taxa_barplot(physeq = ps_rel,
                        target_glom = "Genus",
                        treatment_variable = "Genotype",
                        abundance_type = "relative",
                        facet_variable = "Treatment",
                        fill_variable = "Genus",
                        x_scale = "free", 
                        legend_size = 10,
                        top_n_taxa = 30,
                        xlab=NULL,
                        legend_columns = 1,    # legend col
                        palette = color_palette$MG)
 print(bp_fix$barplot)

Absolute Abundance	Relative Abundance

# Check abundance distribution via Ridge Plots before and after converting to absolute abundance
ridgeP_before <- ridge_plot_it(spiked_16S_OTU, taxrank = "Family", top_n = 10)
ridgeP_after <- ridge_plot_it(physeq_absolute_16S_OTU, taxrank = "Family", top_n = 10)+ my_custom_theme() #custome theme embedded in DspikeIn

Absolute Abundance	Relative Abundance

# core_microbiome
#ps_ABS=absolute count
#ps_Rel=relative count

custom_detections <- list(
  prevalences = seq(0.05, 1, 0.01),  # Custom prevalences
  thresholds = 10^seq(log10(0.05), log10(1), length = 8),  # detection thresholds
  min_prevalence = 0.4,  # Custom minimum prevalence
  taxa_order = "ascending")  # Order taxa by ascending/desc abundance


# Create and save the plot
plot_result <- plot_core_microbiome_custom(physeq = ps_ABS, 
                                           detections = custom_detections, 
                                           taxrank = "Family", 
                                           output_core_rds = "core_microbiome.rds", 
                                           output_core_csv = "core_microbiome.csv")
print(plot_result)

plot_result <- plot_core_microbiome_custom(physeq = ps_Rel, 
                                           detections = custom_detections, 
                                           taxrank = "Family", 
                                           output_core_rds = "core_microbiome.rds", 
                                           output_core_csv = "core_microbiome.csv")

# core.microbiome is automatically saved in your working directory so you can go ahead and plot it in any way you prefer
core.microbiome <- readRDS("core.microbiome.rds")

Absolute Abundance	Relative Abundance

Visualization Core microbiome distribution

# shift to long-format data frame and plot the abundance of taxa across the factor of your interest
# Generate alluvial plot
# Convert a phyloseq object to a long-format data frame
# pps_Abs <- phyloseq::psmelt(ps_ABS)

# Example of total reads calculation for relative abundance
# total_reads <- sum(pps_Abs$Abundance)

# Generate an alluvial plot
# Alluvial plot accepts facet

alluvial_plot_abs <- alluvial_plot( data = pps_Abs,
axes = c("Clade.Order","Animal.ecomode","Ecoregion.III", "Diet", "Reproduction","Metamorphosis"),
abundance_threshold = 10000,
fill_variable = "Genus",
silent = TRUE,
abundance_type = "absolute",
top_taxa = 23,
text_size = 4,
legend_ncol = 1,
custom_colors = DspikeIn::color_palette$light_MG)  # Use the extended palette from your package


# Generate alluvial plot for relative abundance
 pps_Rel <- phyloseq::psmelt(ps_Rel)
total_reads <- sum(pps_Rel$Abundance)

alluvial_plot_rel <- alluvial_plot(data = pps_rel,
axes = c("Clade.Order","Animal.ecomode","Ecoregion.III", "Diet", "Reproduction","Metamorphosis"),
abundance_threshold = 0.1,
fill_variable = "Genus",
silent = TRUE,
abundance_type = "relative",
total_reads = total_reads,
top_taxa = 23,
 text_size = 4,
 legend_ncol = 1,
custom_colors = DspikeIn::color_palette$light_MG)

Absolute Abundance	Relative Abundance

# selecting the most important ASVs/OTUs through RandomForest classification
# Salamander_absolute= subset of our phyloseq object
rf_physeq <- RandomForest_selected_ASVs(ps_physeq_absolute_16S_OTU, response_var = "Host_Species", na_vars = c("Habitat","Diet", "Ecoregion_III", "Host_genus", "Animal_type"))
RP=ridge_plot_it(rf_physeq)
RP+facet_wrap(~Diet)

Detect common ASVs-OTUs

DspikeIn includes a feature for detecting common taxa that are dominant or essential across multiple methods.

#detect common ASVs/OTUs
# The input is the list of phyloseq objects
results <- detect_common_asvs_taxa(list(rf_physeq, FTspiked_16S , core.microbiome), 
                                    output_common_asvs_rds = "common_asvs.rds", 
                                    output_common_taxa_rds = "common_taxa.rds")

common_asvs_phyloseq <- results$common_asvs_phyloseq
common_taxa_phyloseq <- results$common_taxa_phyloseq

If you use this package and find it useful:

⭐ Give us stars on GitHub!
🖋️ Cite us using the link below:

Citing DspikeIn

Name		Name	Last commit message	Last commit date
Latest commit History 567 Commits
R		R
inst/extdata		inst/extdata
man		man
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
README.md		README.md

mghotbi/DspikeIn

Folders and files

Latest commit

History

Repository files navigation