Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

technical question using ordinate phyloseq function with envfit of vegan Package #1761

Open
cabraham03 opened this issue Jun 26, 2024 · 0 comments

Comments

@cabraham03
Copy link

Hi. I have a technical question.
I want to generate a NMDS plot along with the OTUs based on the envfit statistics. If I have less than 3,000 OTUs I don’t have any problem, but if I have a >19,000 OTUs my machine just simply stop or get slow with envfit.

So, I want to generate a NMDS ordination using ordinate phyloseq function, and then obtain the distribution of the Samples ($points) and the distribution of the Species ($species), and then make a envfit analyses and used the significant (< 0.05) to filter and plot them together, but using the distribution generated with ordinate function.

I will use GlobalPatterns as examples, its a good example because it present >19,000 OTUs

GP <- GlobalPatterns
set.seed(1000)
GP.ord <- ordinate(GP, "NMDS", "bray", k=3)

extract Samples (points)

Sites <- data.frame(GP.ord$points)
Sites$SampleID <- rownames(Sites)
SD <- data.frame(sample_data(GP))
SD <- SD[, c(1,6)]
colnames(SD)[1] <- "SampleID"
Sites <- inner_join(Sites, SD, by = "SampleID")

now Species

Species <- data.frame(GP.ord$species)
Species$OTUsID <- rownames(Species)
taxt <- data.frame(as(tax_table(GP), "matrix"))
taxt$OTUsID <- rownames(taxt)
Species <- inner_join(Species, taxt, by = "OTUsID")
rownames(Species) <- Species$OTUsID                         
Species$OTUsID <- NULL

To avoid problems, I will generate a filter and then use envfit to obtain the significant

wh0 = genefilter_sample(GP, filterfun_sample(function(x) x > 5), A=0.3*nsamples(GP))
GP1 = prune_taxa(wh0, GP)
set.seed(1000)
GP1.ord <- ordinate(GP1, "NMDS", "bray", k=3)
Otus <- data.frame(as(otu_table(GP1), "matrix"))
Otus <- t(Otus)

now envfit

set.seed(1000)
spp.fit <- envfit(GP1.ord, Otus, permutations = 999, choices=c(1:3))

obtain the significant

spp.fit_df <- data.frame(cbind(r2 = spp.fit$vectors$r, spp.pvals = spp.fit$vectors$pvals ))
spp.fit_df <- spp.fit_df[spp.fit_df$spp.pvals <= 0.05, ]

then filter the significant OTUs from the Species generated with ordinate function



head(Species, 3)
             MDS1        MDS2       MDS3 Kingdom        Phylum        Class        Order        Family      Genus                  Species
549322 -1.2016958 -0.01723596 -0.3002742 Archaea Crenarchaeota Thermoprotei         <NA>          <NA>       <NA>                     <NA>
522457 -1.2311695  0.08762658 -0.7164607 Archaea Crenarchaeota Thermoprotei         <NA>          <NA>       <NA>                     <NA>
951    -0.4113979  1.48264357  0.2590104 Archaea Crenarchaeota Thermoprotei Sulfolobales Sulfolobaceae Sulfolobus Sulfolobusacidocaldarius

head(spp.fit_df, 3)
              r2 spp.pvals
250392 0.4012361     0.001
365828 0.3045288     0.017
12812  0.4500423     0.001

Species.sig <- Species[rownames(Species) %in% rownames(spp.fit_df), ]

and then generate a plot !!


ggplot(Sites, aes(x = MDS1, y = MDS2)) +                                                                           # Sites
    geom_vline(xintercept = 0, linetype="dashed", size = 0.5, color= "#999999") +                                  # V line
    geom_hline(yintercept = 0, linetype="dashed", size = 0.5, color= "#999999") +                                  # H line
    geom_point(aes(fill = SampleType), size = 7, shape = 21 ,color = "00000000", alpha=1) +                        # Sites points
    scale_fill_manual(values = MyPalette1 ) +                                                             # Colors Points
    geom_point(data = Species.sig, aes(x=MDS1, y=MDS2, color = Family), size = 2.5, alpha=0.6, shape=18) +     # Species
    scale_color_manual(values = MyPalette2) +
    geom_segment(data = Species.sig, aes(x = 0, xend=MDS1, y=0, yend=MDS2), 
                              arrow = arrow(length = unit(0.25, "cm")), colour = "grey",
                              lwd=0.25, linetype="dashed", alpha=0.7) +                           
    theme_bw()

My question is, if I have so many OTUs is valid to filter and then obtain the statistics with envfit, and generate the distribution with ordinate phyloseq function ? I mean, the direction, and length of the arrow in Species is representative of the relatedness with the Samples distribution using this method ?

I know that I can generate the the plot with for Samples and Species obtained from the filter data (GP1) and generate the ordinate and envfit, but I rather to use all OTUs to generate the Sample Distribution !!!

any suggestion ?
Thanks

@cabraham03 cabraham03 changed the title technical question using ordinate phyloseq function with envfit of vegan libray technical question using ordinate phyloseq function with envfit of vegan Package Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant