technical question using ordinate phyloseq function with envfit of vegan Package #1761

cabraham03 · 2024-06-26T05:21:02Z

Hi. I have a technical question.
I want to generate a NMDS plot along with the OTUs based on the envfit statistics. If I have less than 3,000 OTUs I don’t have any problem, but if I have a >19,000 OTUs my machine just simply stop or get slow with envfit.

So, I want to generate a NMDS ordination using ordinate phyloseq function, and then obtain the distribution of the Samples ($points) and the distribution of the Species ($species), and then make a envfit analyses and used the significant (< 0.05) to filter and plot them together, but using the distribution generated with ordinate function.

I will use GlobalPatterns as examples, its a good example because it present >19,000 OTUs

GP <- GlobalPatterns
set.seed(1000)
GP.ord <- ordinate(GP, "NMDS", "bray", k=3)

extract Samples (points)

Sites <- data.frame(GP.ord$points)
Sites$SampleID <- rownames(Sites)
SD <- data.frame(sample_data(GP))
SD <- SD[, c(1,6)]
colnames(SD)[1] <- "SampleID"
Sites <- inner_join(Sites, SD, by = "SampleID")

now Species

Species <- data.frame(GP.ord$species)
Species$OTUsID <- rownames(Species)
taxt <- data.frame(as(tax_table(GP), "matrix"))
taxt$OTUsID <- rownames(taxt)
Species <- inner_join(Species, taxt, by = "OTUsID")
rownames(Species) <- Species$OTUsID                         
Species$OTUsID <- NULL

To avoid problems, I will generate a filter and then use envfit to obtain the significant

wh0 = genefilter_sample(GP, filterfun_sample(function(x) x > 5), A=0.3*nsamples(GP))
GP1 = prune_taxa(wh0, GP)
set.seed(1000)
GP1.ord <- ordinate(GP1, "NMDS", "bray", k=3)
Otus <- data.frame(as(otu_table(GP1), "matrix"))
Otus <- t(Otus)

now envfit

set.seed(1000)
spp.fit <- envfit(GP1.ord, Otus, permutations = 999, choices=c(1:3))

obtain the significant

spp.fit_df <- data.frame(cbind(r2 = spp.fit$vectors$r, spp.pvals = spp.fit$vectors$pvals ))
spp.fit_df <- spp.fit_df[spp.fit_df$spp.pvals <= 0.05, ]

then filter the significant OTUs from the Species generated with ordinate function



head(Species, 3)
             MDS1        MDS2       MDS3 Kingdom        Phylum        Class        Order        Family      Genus                  Species
549322 -1.2016958 -0.01723596 -0.3002742 Archaea Crenarchaeota Thermoprotei         <NA>          <NA>       <NA>                     <NA>
522457 -1.2311695  0.08762658 -0.7164607 Archaea Crenarchaeota Thermoprotei         <NA>          <NA>       <NA>                     <NA>
951    -0.4113979  1.48264357  0.2590104 Archaea Crenarchaeota Thermoprotei Sulfolobales Sulfolobaceae Sulfolobus Sulfolobusacidocaldarius

head(spp.fit_df, 3)
              r2 spp.pvals
250392 0.4012361     0.001
365828 0.3045288     0.017
12812  0.4500423     0.001

Species.sig <- Species[rownames(Species) %in% rownames(spp.fit_df), ]

and then generate a plot !!


ggplot(Sites, aes(x = MDS1, y = MDS2)) +                                                                           # Sites
    geom_vline(xintercept = 0, linetype="dashed", size = 0.5, color= "#999999") +                                  # V line
    geom_hline(yintercept = 0, linetype="dashed", size = 0.5, color= "#999999") +                                  # H line
    geom_point(aes(fill = SampleType), size = 7, shape = 21 ,color = "00000000", alpha=1) +                        # Sites points
    scale_fill_manual(values = MyPalette1 ) +                                                             # Colors Points
    geom_point(data = Species.sig, aes(x=MDS1, y=MDS2, color = Family), size = 2.5, alpha=0.6, shape=18) +     # Species
    scale_color_manual(values = MyPalette2) +
    geom_segment(data = Species.sig, aes(x = 0, xend=MDS1, y=0, yend=MDS2), 
                              arrow = arrow(length = unit(0.25, "cm")), colour = "grey",
                              lwd=0.25, linetype="dashed", alpha=0.7) +                           
    theme_bw()

My question is, if I have so many OTUs is valid to filter and then obtain the statistics with envfit, and generate the distribution with ordinate phyloseq function ? I mean, the direction, and length of the arrow in Species is representative of the relatedness with the Samples distribution using this method ?

I know that I can generate the the plot with for Samples and Species obtained from the filter data (GP1) and generate the ordinate and envfit, but I rather to use all OTUs to generate the Sample Distribution !!!

any suggestion ?
Thanks

The text was updated successfully, but these errors were encountered:

cabraham03 changed the title ~~technical question using ordinate phyloseq function with envfit of vegan libray~~ technical question using ordinate phyloseq function with envfit of vegan Package Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

technical question using ordinate phyloseq function with envfit of vegan Package #1761

technical question using ordinate phyloseq function with envfit of vegan Package #1761

cabraham03 commented Jun 26, 2024

technical question using ordinate phyloseq function with envfit of vegan Package #1761

technical question using ordinate phyloseq function with envfit of vegan Package #1761

Comments

cabraham03 commented Jun 26, 2024