-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential Error in pred_interval_esmeans()
#46
Comments
Thanks @zacrobinson5 for exploring this in more detail. Yes, looks like a bug. I'll have a closer look at this and write a few more tests to catch this as soon as possible. Appreciate you exploring things in detail. |
Hi @daniel1noble , I hope all is well! I am sure you're extremely busy so no worries if you haven't gotten a chance to look into this further but any updates here? Thank you as always for your work! |
Hi Zac. Sorry, I’ve not managed to get to this yet. I’m just finishing up teaching and things are a little crazy. I’m hoping to get into this in the coming weeks when I have a bit more 'free time'.
Cheers,
Dan
|
@zacrobinson5 Sorry it's taken me a while together to this. Can you clarify where the problem arises because I'm seeing the exact same values for your "proof" and "ram.mv". We also don't yet take two slopes in orchaRd either, so I should probably have a warning for that. Here's the output I get from your code (thanks for that!). Without diving too deep yet, can you tell me what you expect the values should be? I assumed that the ram.mv and the proofs would all be the same based on your comments "# all PIs are the same", but I'm not getting that from your code: final.df |
@zacrobinson5 Sorry, in dealing with #45 I found a bug, and you picked up on one of the main issues, but it was slightly different, and ended up being a little more complicated because of how the function works across different types of rma models. gamma slots are treated a little differently between models. I believe this fixes your issue, but please let me know if not. I still need to do some testing for double random slope models as I'm not sure it will perform as expected under all conditions. For now #46 and #45 should be sorted but again please do let me know if not and I'll dig deeper. |
…at. It leads to one additional issue when using rma.uni models becuase the gamma2 slot is stored as NULL values. So, we need to work around that.
Going to close this issue for now. If you find more issues I can re-open |
@daniel1noble Thanks for the work here sir - I think there is still an issue here, but I could be mistaken. What I tried to do in the "proof" data frame is demonstrate that when you calculate the prediction interval with the formula that only includes sigma, it ends up matching the output from the function for each type of model. I don't think this is how it's supposed to work - i.e., tau2/gamma2 should be taken into account for the relevant models. This should result in the "proof" row and the output from pred_interval_esmeans() being different if I'm understanding everything under the hood correctly. Here is the formula I'm using for the proofs which ends up matching the output of the function in all cases, regardless of model specification: tmp <- summary(e1)
tmp <- tmp[ , ]
test.stat <- stats::qt(0.975, tmp$df[[1]])
sigmas <- sum(m1$sigma2)
PI <- test.stat * base::sqrt(tmp$SE^2 + sigmas)
tmp$lower.PI <- tmp$emmean - PI
tmp$upper.PI <- tmp$emmean + PI
p1b=tmp%>%
as.data.frame()%>%
mutate(model="proof (Intercept)") |
@daniel1noble played around with this a bit more. I realized in my original code I didn't specify the "struct" argument to which Wolfgang mentioned to me is necessary to get the equivalent of "random slopes" in these models. I've adapted this code a little bit and now am getting a different error when I go to use the function. Here is the error:
library(metafor)
library(orchaRd)
library(tidyverse)
data("dat.konstantopoulos2011")
df=dat.konstantopoulos2011%>%
as.data.frame()
#fit random intercept model
m1=rma.mv(yi ~ year, vi, random = list(~1|study, ~1|district, ~1|school), data = df, method = "REML",
control = list(optimizer="optim"))
e1=emmeans(emmprep(m1),~year)
p1=pred_interval_esmeans(m1,e1,mod = "year")%>%
as.data.frame()%>%
mutate(model="rma.mv (Intercept)")
#fit single random slope model
m2=rma.mv(yi ~ year, vi, random = list(~year|study, ~1|district, ~1|school), data = df, method = "REML",
struct = "GEN",
control = list(optimizer="optim"))
e2=emmeans(emmprep(m2),~year)
p2=pred_interval_esmeans(m2,e2,mod = "year")%>% ### GET ERROR HERE
as.data.frame()%>%
mutate(model="rma.mv (Slope x 1)")
#fit double random slope model
m3=rma.mv(yi ~ year, vi, random = list(~year|study, ~year|district, ~1|school), data = df, method = "REML",
struct = c("GEN","GEN"),
control = list(optimizer="optim"))
e3=emmeans(emmprep(m3),~year)
p3=pred_interval_esmeans(m3,e3,mod = "year")%>% ### GET ERROR HERE
as.data.frame()%>%
mutate(model="rma.mv (Slope x 2)")
#combine dfs
prediction.intervals=rbind(p1,p2,p3)
#proofs
tmp <- summary(e1)
tmp <- tmp[ , ]
test.stat <- stats::qt(0.975, tmp$df[[1]])
sigmas <- sum(m1$sigma2)
PI <- test.stat * base::sqrt(tmp$SE^2 + sigmas)
tmp$lower.PI <- tmp$emmean - PI
tmp$upper.PI <- tmp$emmean + PI
p1b=tmp%>%
as.data.frame()%>%
mutate(model="proof (Intercept)")
tmp <- summary(e2)
tmp <- tmp[ , ]
test.stat <- stats::qt(0.975, tmp$df[[1]])
sigmas <- sum(m2$sigma2)
PI <- test.stat * base::sqrt(tmp$SE^2 + sigmas)
tmp$lower.PI <- tmp$emmean - PI
tmp$upper.PI <- tmp$emmean + PI
p2b=tmp%>%
as.data.frame()%>%
mutate(model="proof (Slope x 1)")
tmp <- summary(e3)
tmp <- tmp[ , ]
test.stat <- stats::qt(0.975, tmp$df[[1]])
sigmas <- sum(m3$sigma2)
PI <- test.stat * base::sqrt(tmp$SE^2 + sigmas)
tmp$lower.PI <- tmp$emmean - PI
tmp$upper.PI <- tmp$emmean + PI
p3b=tmp%>%
as.data.frame()%>%
mutate(model="proof (Slope x 2)")
#compare proofs
final.df=rbind(p1,p1b,p2,p2b,p3,p3b)
view(final.df)
|
@zacrobinson5 Thanks for clarifying the your expectations. I wasn't clear on whether the proofs were what you expected or not. I usually work the other way and match values to correct values. When I tested yesterday, the proofs no longer matched what |
I'll add one more note here @zacrobinson5. I noticed that you're using the |
@zacrobinson5 Just a quick update on this. First, I think you need to stick with In terms of the example you provide. Here's where I got to exploring things.
I need to do some more testing on this issue, but part of the problem maybe that you're not using I've run a few other examples and did some checks (both manual and with |
@daniel1noble Thank you so much for your continued work looking into this sir - I genuinely appreciate it! Using mod_results seems to address the problem from my tests! I suppose one follow up I have from a practical perspective is that for the project I'm working on with this, I'm exploring some non-linear patterns and am fitting some splines which is how I ended up on using emmprep in concert with orchaRd - mod_results is giving me some funky results with those
emmprep() takes the params argument which it seems like mod_results() should as well but I've had some trouble specifying it - any advice there? I'm going to go ahead and close the issue here as this seems to be fixed for the most part so long as you're using the functions within orchaRd's ecosystem - thanks again! EDIT: I can get the spline models to give me outputs that look correct - just not sure what the error message is getting at |
@daniel1noble sorry one other question - is this quote still accurate after the updates?
And specifically do you mean: ~slope|study , slope|group, 1|es or ~slope1+slope2|study, 1|group, 1|es or both? |
@zacrobinson5 Great that it seems to be working now. The intercept message above is with respect to mod = ~1 + p1 + p2, where you fit explicitly an intercept. Often we fit. mod = -1 + p1 + p2. So the message is just suggesting to fit the former. The reason is the emmeans can calculate the marginalised means for each group anyway. Ih regards to the second message. That is now null and void because, as you say, now that gamma is incorporated it can deal with two slopes. There still might be situations it fails, so it's still got a warning message for now until we have a diversity of testing situations. Hope that all helps! Cheers, |
Hey @daniel1noble I am revisiting this issue with a fresh pair of eyes and (I think) an improved understanding. I think I've figured out a few things in the function for Here is me explaining a few changes to the function that (might) be worth making, but please let me know if I am woefully off base. Was just poking around and figured I'd ask!
Now, I could be misunderstanding some of what is going on here, but I think the final method of calculating Current: sigmas <- sum(model$sigma2)
taus <- model$tau2
gammas <- model$gamma2
### skip some lines ###
PI <- test.stat * sqrt(tmp$SE^2 + sigmas + taus + gammas) ## tau and gamma should be wrapped in sum() ?
The key here is that
Current: sigmas <- sum(model$sigma2)
taus <- model$tau2
gammas <- model$gamma2
w_tau <- model$g.levels.k
w_gamma <- model$g.levels.k ## h.levels.k instead?
if(mod == "1"){
tau <- weighted_var(taus, weights = w_tau)
gamma <- weighted_var(gamma, weights = w_gamma) ## gamma should be gammas ?
PI <- test.stat * sqrt(tmp$SE^2 + sigmas + tau + gamma) Potentially better version: pred_interval_esmeans <- function(model, mm, mod, ...){
tmp <- summary(mm)
tmp <- tmp[ , ]
test.stat <- stats::qt(0.975, tmp$df[[1]])
if(length(model$tau2) <= 1 | length(model$gamma2) <= 1){ # Note this should fix #46 but code is repetitive and needs to be cleaned up. Other issue is how this plays with different rma. objects. uni models will treat slots for gamma NULL and we need to deal with this.
sigmas <- sum(model$sigma2)
taus <- model$tau2
gamma2 <- ifelse(is.null(model$gamma2), 0, model$gamma2)
PI <- test.stat * base::sqrt(tmp$SE^2 + sigmas + sum(taus) + sum(gamma2))
} else {
sigmas <- sum(model$sigma2)
taus <- model$tau2
gammas <- model$gamma2
w_tau <- model$g.levels.k
w_gamma <- model$h.levels.k
if(mod == "1"){
tau <- weighted_var(taus, weights = w_tau)
gamma <- weighted_var(gammas, weights = w_gamma)
PI <- test.stat * sqrt(tmp$SE^2 + sigmas + tau + gamma)
} else {
PI <- test.stat * sqrt(tmp$SE^2 + sigmas + sum(taus) + sum(gammas))
}
}
tmp$lower.PI <- tmp$emmean - PI
tmp$upper.PI <- tmp$emmean + PI
# renaming "overall" to ""
if(tmp[1,1] == "overall"){tmp[,1] <- "intrcpt"}
return(tmp)
} EDIT 3/19/24
PI <- test.stat * sqrt(tmp$SE^2 + sigmas + sum(taus) + sum(gammas) + sum(diag(vcov(model, type = "fixed"))))
or maybe better yet
PI <- test.stat * sqrt(tmp$SE^2 + sigmas + sum(taus) + sum(gammas) + var(model$X%*%model$b)) This seems closer to the way pi.lb <- pred - crit * sqrt(vpred + sum(x$sigma2) + x$tau2 + x$gamma2 + newvi)
pi.ub <- pred + crit * sqrt(vpred + sum(x$sigma2) + x$tau2 + x$gamma2 + newvi) |
@zacrobinson5 Thanks for all this. Give me a few weeks and I will look into it. Sorry. It's very busy at the moment and I have some urgent things that need to be dealt with, but this all sounds sensible to me. |
@daniel1noble sounds good and no worries! I added one other thing after I played around with my suggested tweaks a bit more |
Been working on this some more - here is where I'm at after some great feedback from the r-sig-meta mailing list: I think I still need covariances between multiple random slopes get.pi.emmeans.rma.mv <- function(model, mm){
tmp <- summary(mm)
tmp <- tmp[ , ]
test.stat <- stats::qt(mm@misc$level+(1-mm@misc$level)/2, tmp$df[[1]])
n_g <- colnames(model$G)
n_g2 <- if(is.null(n_g)){0} else{n_g[!n_g %in% c("intrcpt", "outer")]}
l_g <- if(is.null(n_g)){0} else{mm@linfct[,colnames(mm@linfct) %in% n_g2]}
v_g <- if(is.null(n_g)){0} else{diag(model$G)%>%as.data.frame()%>%.[n_g2,]}
cv_g <- if(is.null(n_g)){0} else{model$G[upper.tri(model$G)]}
n_h <- colnames(model$H)
n_h2 <- if(is.null(n_h)){0} else{n_h[!n_h %in% c("intrcpt", "outer")]}
l_h <- if(is.null(n_h)){0} else{mm@linfct[,colnames(mm@linfct) %in% n_h2]}
v_h <- if(is.null(n_h)){0} else{diag(model$H)%>%as.data.frame()%>%.[n_h2,]}
cv_h <- if(is.null(n_h)){0} else{model$H[upper.tri(model$H)]}
V_b <- tmp$SE^2
V_ri <- sum(model$sigma2 + model$tau2[1] + model$gamma2[1])
V_g <- if(is.null(n_g)){0} else{((l_g^2)*v_g)%>%as.data.frame()%>%rowSums()}
V_g_cv <- if(is.null(n_g)){0}
else{((2*l_g)*cv_g[seq(1,ncol(l_g))])%>%as.data.frame()%>%rowSums()}
V_h <- if(is.null(n_h)){0} else{((l_h^2)*v_h)%>%as.data.frame()%>%rowSums()}
V_h_cv <- if(is.null(n_h)){0}
else{((2*l_h)*cv_h[seq(1,ncol(l_h))])%>%as.data.frame()%>%rowSums()}
PI <- test.stat * sqrt(V_b + V_ri + V_g + V_g_cv + V_h + V_h_cv)
tmp$lower.PI <- tmp$emmean - PI
tmp$upper.PI <- tmp$emmean + PI
return(tmp)
}
|
Sorry for the multiple messages, I (think) this is the final version with all covariances integrated: get.pi.emmeans.rma.mv <- function(mm,model){
tmp <- summary(mm)
tmp <- tmp[ , ]
test.stat <- stats::qt(mm@misc$level+(1-mm@misc$level)/2, df=tmp$df[[1]])
V_se <- tmp$SE^2
V_ri <- sum(model$sigma2)
n_g <- colnames(model$G)
colnames(mm@linfct)[1] <- if(colnames(mm@linfct)[1]=="(Intercept)"){"intrcpt"} else{colnames(mm@linfct)[1]}
g_g <- model$X[,colnames(mm@linfct) %in% n_g]
V_g <- if(is.null(n_g)){0} else{sum(diag(crossprod((g_g %*% model$G),g_g)))/nobs.rma(model)}
n_h <- colnames(model$H)
colnames(mm@linfct)[1] <- if(colnames(mm@linfct)[1]=="(Intercept)"){"intrcpt"} else{colnames(mm@linfct)[1]}
g_h <- model$X[,colnames(mm@linfct) %in% n_h]
V_h <- if(is.null(n_h)){0} else{sum(diag(crossprod((g_h %*% model$H),g_h)))/nobs.rma(model)}
V=cbind(V_ri,V_g,V_h)%>%
as.data.frame()%>%
rowSums()
PI <- test.stat * sqrt(V_se + V)
tmp$lower.PL <- tmp$emmean - PI
tmp$upper.PL <- tmp$emmean + PI
return(tmp)
}
|
@zacrobinson5 Really sorry I haven't had a chance to deep dive into this yet Zac. Shinichi and I are just swamped at the moment. Any chance you could fork the repo, edit the relevant code, run the tests |
@daniel1noble No worries at all Dan! I will give this a shot when able, pretty new to Git so might take me a bit to figure it out |
Describe the bug
In comparing the prediction intervals on some random intercept vs random slope models, I think there may be an error in the
pred_interval_esmeans()
function.In the code of the function, the following is listed:
if(length(model$tau2) <= 1){ #including gamma2
I believe this ifelse() statement is supposed to determine the way to calculate the prediction intervals based on the ~inner | outer structure of the random effects in rma.mv. Because
tau2
will always return a number rather than NULL, the result oflength(model$tau2)
will always be <=1, which sends it down the first "path" of the ifelse() statement which only takes into account the sigma(s):sigmas <- sum(model$sigma2)
PI <- test.stat * base::sqrt(tmp$SE^2 + sigmas)
The comment next to (i.e., "including gamma2) it is what makes me think this isn't the desired function, as if multiple
~ inner
arguments are included, the length of tau2 and gamma2 arguments together would be >1, which would send it down a different "path" that includes tau and gamma in the calculations which are otherwise ignored.To Reproduce
Steps to reproduce the behavior:
Expected behavior
I'd expect the tau and gamma values to be taken into account for models with more complex
~inner | outer
structuresDesktop (please complete the following information):
The text was updated successfully, but these errors were encountered: