Replies: 3 comments
-
This turned out to be even more instructive than I thought, even though I don't think I have a concrete answer to your question. @jarioksa Do you have thoughts on this? First, I think it is a good idea to drop out a random plant from each bigger n group in turn and refit the ordination/model and do the permutation test for each model — you really should check that the results of your one test aren't dependent upon the particular plants you dropped at random if you only did this once. How to combine the resulting p values is where this becomes a bit more intriguing. I don't think it is right to just combine all the pseudo F statistics achieved via permutation as they are not testing the same thing — the observed Fs will vary due to changes in the data. I think you could look at this like a meta analysis; you have multiple observations of the p value and want to combine them in some way. If they were independent tests, you could just average them, but I don't think your p values would be independent as you are reusing most of the same data each test. As they're not independent, I'm not sure whether Fisher's or Stouffer's methods are appropriate here either; Fisher's method has the null hypothesis that all of the p values are consistent with no effect and an alternative that at least one of the p values is inconsistent with no effect. Stouffer's method seems more appropriate here as it assumes that each test carries some information as to the effect — if the truth is that there is some non-zero effect of your covariate(s) then you'd expect each p value to contribute information to aid the decision rather than assuming that one (or more, but not all) contribute information. So, my hand-waving answer would be to treat the multiple p values you would obtain as a meta analysis and look into Stouffer's method (and related methods for combining non-independent p values), or even ask whether you can treat the p values as being independent..? |
Beta Was this translation helpful? Give feedback.
-
I really do not know how to do this correctly. There are two issues: the observed test statistic and the permutation statistic. These should be based on the same model and same data.
Honestly, I don't assume anybody would do this. Alternatively, the permutation design needs deep rethinking. I think that simple permutation schemes are not suitable for this kind of designs. |
Beta Was this translation helpful? Give feedback.
-
Actually, I tried my proposal. I used for testing a subset of > summary(dune.env$Management)
BF HF NM SF
3 5 6 6 This leaves me 17 cases where > k <- dune.env$Management != "BF"
> y <- dune[k,]
> x <- dune.env[k,]
> bsfs <- as.matrix(expand.grid(which(x$Management == "SF"), which(x$Management == "NM")))
## a bit tricky, because how() does not know about subset we used in function call
> ctrl
Permutation Design:
Blocks:
Defined by: none
Plots:
Plots: x$Management[-bsfs[i, ]]
Permutation type: free
Mirrored?: No
Within Plots:
Permutation type: free
Permutation details:
Number of permutations: 999
Max. number of permutations allowed: 9999
Evaluate all permutations?: No. Activation limit: 5040
> mods <- list()
> for(i in seq_len(nrow(bsfs))) mods[[i]] <- anova(rda(y ~ Management + Moisture + A1, x,
subset = seq(17)[-bsfs[i,]]), permutations=ctrl)
> sapply(mods, function(z) z[1,2:4])
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
Variance 54.37059 51.52078 51.73607 52.10031 54.01866 50.9419 55.67994 52.83241
F 2.372717 2.068919 2.146908 2.16539 2.376655 2.19638 2.396599 2.093796
Pr(>F) 0.038 0.038 0.03 0.017 0.003 0.003 0.02 0.018
[,9] [,10] [,11] [,12] [,13] [,14] [,15]
Variance 52.95215 53.24202 55.00078 52.29582 56.98787 54.14703 53.39465
F 2.164452 2.173404 2.354113 2.241749 2.642457 2.27949 2.247816
Pr(>F) 0.012 0.01 0.004 0.001 0.002 0.001 0.003
[,16] [,17] [,18] [,19] [,20] [,21] [,22]
Variance 54.50398 56.66776 53.90729 56.59286 52.86264 53.3986 54.17335
F 2.353598 2.618412 2.431594 2.565946 2.139308 2.244888 2.306793
Pr(>F) 0.002 0.001 0.003 0.001 0.012 0.005 0.004
[,23] [,24] [,25] [,26] [,27] [,28] [,29]
Variance 56.25088 53.64593 55.85008 53.10538 53.20715 53.22754 55.92516
F 2.534918 2.381106 2.738956 2.373853 2.459837 2.444747 2.794404
Pr(>F) 0.001 0.001 0.001 0.001 0.002 0.001 0.001
[,30] [,31] [,32] [,33] [,34] [,35] [,36]
Variance 52.54938 54.61189 51.61613 51.48958 52.43657 53.6817 51.27859
F 2.498685 2.471325 2.133335 2.184013 2.264268 2.400008 2.325749
Pr(>F) 0.001 0.001 0.001 0.003 0.001 0.001 0.001 So yes, it makes a difference what is removed, but it can be analysed. Unlike @gavinsimpson , I don't see any reason to use meta analysis toolset, as long as you think that the choice of removed units is a part of the randomization process. In this case, this process – selection of two units to be removed with permuations – gives averaged P-value of: > mean(sapply(mods, function(z) z[1,4]) )
[1] 0.006805556 PS. There was an earlier version of this before Tue 30 Aug 2022 16:22 UTC where I was confused. The main source of that confusion was that in my first model I used Management both as the plot in the permutation design and the only constraint which usually makes little sense. |
Beta Was this translation helpful? Give feedback.
-
First, thanks to Gavin for directing me to this forum. And apologies if this is a duplicate post--I tried once before but think I failed to actually post. Now, for my actual question:
I'm analyzing some compositional datasets using permutation tests in vegan (and thus also permute). The research question is whether sugar and amino acid composition of pollen and nectar change under different environmental conditions imposed in a split-plot design. To reflect the experimental design, I've used the 'how' argument to restrict permutations by designating blocks and plots. My problem is that in the first block, each treatment group contained four plants, and in the second block each group contained three. There was also a failure in lab analysis so, for one dataset, one treatment group is down to two plants. Thus my design is unbalanced.
To get the permutation test to run, at the moment I've done a workaround where I randomly selected one plant per four-plant group to drop from the dataset so all groups have three. (And for the dataset with the two-plant group, just to see how things would look, I added a dummy third plant where each response value for that plant was the mean of the existing two plants--which feels dodgy and maybe shouldn't see the light of day.)
Since my dataset doesn't have very many plants to begin with, I worry that the test result could change depending on which individual plants are dropped. If I were able to repeat the permutation test on different versions of the dataset itself, with different plants dropped each time, I would worry less about this.
In other words, I would like to do the drop-one-random-plant-per-group routine many times, and do the permutations on each different version of the dataset, and then compile alllll those permutations into the significance test. I don't think I've seen this done though. Is this something people do? If this approach actually makes sense, is there code out there already that I can modify to make it happen? (It's not something I am capable of coding from scratch.)
Thanks for considering this. Please let me know if I should also include data.
Elsa
Beta Was this translation helpful? Give feedback.
All reactions