permuting with unbalanced designs #32

ekyoungs · 2022-08-29T21:05:28Z

ekyoungs
Aug 29, 2022

First, thanks to Gavin for directing me to this forum. And apologies if this is a duplicate post--I tried once before but think I failed to actually post. Now, for my actual question:

I'm analyzing some compositional datasets using permutation tests in vegan (and thus also permute). The research question is whether sugar and amino acid composition of pollen and nectar change under different environmental conditions imposed in a split-plot design. To reflect the experimental design, I've used the 'how' argument to restrict permutations by designating blocks and plots. My problem is that in the first block, each treatment group contained four plants, and in the second block each group contained three. There was also a failure in lab analysis so, for one dataset, one treatment group is down to two plants. Thus my design is unbalanced.

To get the permutation test to run, at the moment I've done a workaround where I randomly selected one plant per four-plant group to drop from the dataset so all groups have three. (And for the dataset with the two-plant group, just to see how things would look, I added a dummy third plant where each response value for that plant was the mean of the existing two plants--which feels dodgy and maybe shouldn't see the light of day.)

Since my dataset doesn't have very many plants to begin with, I worry that the test result could change depending on which individual plants are dropped. If I were able to repeat the permutation test on different versions of the dataset itself, with different plants dropped each time, I would worry less about this.

In other words, I would like to do the drop-one-random-plant-per-group routine many times, and do the permutations on each different version of the dataset, and then compile alllll those permutations into the significance test. I don't think I've seen this done though. Is this something people do? If this approach actually makes sense, is there code out there already that I can modify to make it happen? (It's not something I am capable of coding from scratch.)

Thanks for considering this. Please let me know if I should also include data.
Elsa

gavinsimpson · 2022-08-30T09:20:47Z

gavinsimpson
Aug 30, 2022
Maintainer

This turned out to be even more instructive than I thought, even though I don't think I have a concrete answer to your question. @jarioksa Do you have thoughts on this?

First, I think it is a good idea to drop out a random plant from each bigger n group in turn and refit the ordination/model and do the permutation test for each model — you really should check that the results of your one test aren't dependent upon the particular plants you dropped at random if you only did this once.

How to combine the resulting p values is where this becomes a bit more intriguing. I don't think it is right to just combine all the pseudo F statistics achieved via permutation as they are not testing the same thing — the observed Fs will vary due to changes in the data.

I think you could look at this like a meta analysis; you have multiple observations of the p value and want to combine them in some way. If they were independent tests, you could just average them, but I don't think your p values would be independent as you are reusing most of the same data each test. As they're not independent, I'm not sure whether Fisher's or Stouffer's methods are appropriate here either; Fisher's method has the null hypothesis that all of the p values are consistent with no effect and an alternative that at least one of the p values is inconsistent with no effect. Stouffer's method seems more appropriate here as it assumes that each test carries some information as to the effect — if the truth is that there is some non-zero effect of your covariate(s) then you'd expect each p value to contribute information to aid the decision rather than assuming that one (or more, but not all) contribute information.

So, my hand-waving answer would be to treat the multiple p values you would obtain as a meta analysis and look into Stouffer's method (and related methods for combining non-independent p values), or even ask whether you can treat the p values as being independent..?

0 replies

jarioksa · 2022-08-30T11:57:05Z

jarioksa
Aug 30, 2022

I really do not know how to do this correctly. There are two issues: the observed test statistic and the permutation statistic. These should be based on the same model and same data.

Drop random species and refit the model to get the test statistic.
Make one permutation and get the statistic under NULL model.
Record if the refitted observed statistic is larger than this permutations statistic in this one permutation, and collect this info
Go back to 1. and repeat 999 times.

Honestly, I don't assume anybody would do this. Alternatively, the permutation design needs deep rethinking. I think that simple permutation schemes are not suitable for this kind of designs.

0 replies

jarioksa · 2022-08-30T14:50:28Z

jarioksa
Aug 30, 2022

Actually, I tried my proposal. I used for testing a subset of dune data dropping Management BF cases:

> summary(dune.env$Management)
BF HF NM SF 
 3  5  6  6

This leaves me 17 cases where HF has one case less than other types. I would need to drop one NM and SF. As there are 6 cases in both, this would give 36 possible pairs of omission. I built a loop that goes over each of these 36 pairs, and refits a model of that removed pair, and runs 999 permutation test for each case (totalling 35964 permutations). If I also define structured permutation by Management plots and free permutation within plots, I get these results:

> k <- dune.env$Management != "BF"
> y <- dune[k,]
> x <- dune.env[k,]
> bsfs <- as.matrix(expand.grid(which(x$Management == "SF"), which(x$Management == "NM")))
## a bit tricky, because how() does not know about subset we used in function call
> ctrl

Permutation Design:

Blocks:
  Defined by: none

Plots:
  Plots: x$Management[-bsfs[i, ]]
  Permutation type: free
  Mirrored?: No

Within Plots:
  Permutation type: free

Permutation details:
  Number of permutations: 999
  Max. number of permutations allowed: 9999
  Evaluate all permutations?: No.  Activation limit: 5040
> mods <- list()
> for(i in seq_len(nrow(bsfs))) mods[[i]] <- anova(rda(y ~ Management + Moisture + A1, x, 
                                 subset = seq(17)[-bsfs[i,]]), permutations=ctrl)
> sapply(mods, function(z) z[1,2:4]) 
         [,1]     [,2]     [,3]     [,4]     [,5]     [,6]    [,7]     [,8]    
Variance 54.37059 51.52078 51.73607 52.10031 54.01866 50.9419 55.67994 52.83241
F        2.372717 2.068919 2.146908 2.16539  2.376655 2.19638 2.396599 2.093796
Pr(>F)   0.038    0.038    0.03     0.017    0.003    0.003   0.02     0.018   
         [,9]     [,10]    [,11]    [,12]    [,13]    [,14]    [,15]   
Variance 52.95215 53.24202 55.00078 52.29582 56.98787 54.14703 53.39465
F        2.164452 2.173404 2.354113 2.241749 2.642457 2.27949  2.247816
Pr(>F)   0.012    0.01     0.004    0.001    0.002    0.001    0.003   
         [,16]    [,17]    [,18]    [,19]    [,20]    [,21]    [,22]   
Variance 54.50398 56.66776 53.90729 56.59286 52.86264 53.3986  54.17335
F        2.353598 2.618412 2.431594 2.565946 2.139308 2.244888 2.306793
Pr(>F)   0.002    0.001    0.003    0.001    0.012    0.005    0.004   
         [,23]    [,24]    [,25]    [,26]    [,27]    [,28]    [,29]   
Variance 56.25088 53.64593 55.85008 53.10538 53.20715 53.22754 55.92516
F        2.534918 2.381106 2.738956 2.373853 2.459837 2.444747 2.794404
Pr(>F)   0.001    0.001    0.001    0.001    0.002    0.001    0.001   
         [,30]    [,31]    [,32]    [,33]    [,34]    [,35]    [,36]   
Variance 52.54938 54.61189 51.61613 51.48958 52.43657 53.6817  51.27859
F        2.498685 2.471325 2.133335 2.184013 2.264268 2.400008 2.325749
Pr(>F)   0.001    0.001    0.001    0.003    0.001    0.001    0.001

So yes, it makes a difference what is removed, but it can be analysed. Unlike @gavinsimpson , I don't see any reason to use meta analysis toolset, as long as you think that the choice of removed units is a part of the randomization process. In this case, this process – selection of two units to be removed with permuations – gives averaged P-value of:

> mean(sapply(mods, function(z) z[1,4]) )
[1] 0.006805556

PS. There was an earlier version of this before Tue 30 Aug 2022 16:22 UTC where I was confused. The main source of that confusion was that in my first model I used Management both as the plot in the permutation design and the only constraint which usually makes little sense.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

permuting with unbalanced designs #32

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

permuting with unbalanced designs #32

ekyoungs Aug 29, 2022

Replies: 3 comments

gavinsimpson Aug 30, 2022 Maintainer

jarioksa Aug 30, 2022

jarioksa Aug 30, 2022

ekyoungs
Aug 29, 2022

gavinsimpson
Aug 30, 2022
Maintainer

jarioksa
Aug 30, 2022

jarioksa
Aug 30, 2022