group-level parameters #112

DominiqueMakowski · 2021-05-31T10:04:35Z

What's the most catchy & appropriate name; estimate_individual, estimate_groupspecific, estimate_grouplevel, estimate_random, estimate_individualeffects, estimate_scores, estimate_individualparameters...

The text was updated successfully, but these errors were encountered:

mattansb · 2021-05-31T12:14:19Z

I like grouplevel.. or groupwise?

DominiqueMakowski · 2021-05-31T13:55:10Z

I like grouplevel too but I do recall @bwiernik made a distinction between group-level and group-specific

DominiqueMakowski · 2021-06-02T08:05:20Z

After quite some tinkering I think the most intuitive and straightforward design is estimate_random:

its default output is mostly a cleaned version of parameters(x, group_level = TRUE) (but formatted to be specific to random effects, so some non-applicable columns have been dropped)
It can be plotted
It can be re-organized to match the original input of random factors (and directly cbinded with the original data) using reshape_random() (in the line of our other specific reshapers like reshape_ci, reshape_loadings, reshape_iterations, etc.)

library(modelbased)
library(lme4)
library(see)

data <- lme4::sleepstudy
data <- rbind(data, data)
data$Newfactor <- rep(c("A", "B", "C", "D"))

model <- lmer(Reaction ~ Days + (1 + Days|Subject) + (1|Newfactor), data = data)

random <- estimate_random(model)
head(random)
#> Group     | Level |   Parameter | Coefficient |   SE |          95% CI
#> ----------------------------------------------------------------------
#> Newfactor |     A | (Intercept) |        0.58 | 2.04 | [ -3.42,  4.58]
#> Newfactor |     B | (Intercept) |        2.32 | 2.05 | [ -1.69,  6.34]
#> Newfactor |     C | (Intercept) |       -2.04 | 2.04 | [ -6.04,  1.97]
#> Newfactor |     D | (Intercept) |       -0.87 | 2.05 | [ -4.88,  3.15]
#> Subject   |   308 | (Intercept) |       -3.28 | 9.24 | [-21.39, 14.82]
#> Subject   |   308 |        Days |       10.37 | 1.73 | [  6.98, 13.75]

plot(random)

# Rematch to original random data
x <- reshape_random(random, indices = c("Coefficient", "SE"))
head(x)
#>   Subject Newfactor Newfactor_Coefficient_Intercept Newfactor_SE_Intercept
#> 1     308         A                       0.5818450               2.042339
#> 2     308         B                       2.3221207               2.049567
#> 3     308         C                      -2.0368499               2.042339
#> 4     308         D                      -0.8671158               2.049567
#> 5     308         A                       0.5818450               2.042339
#> 6     308         B                       2.3221207               2.049567
#>   Subject_Coefficient_Intercept Subject_SE_Intercept Subject_Coefficient_Days
#> 1                     -3.283123             9.238225                  10.3656
#> 2                     -3.283123             9.238225                  10.3656
#> 3                     -3.283123             9.238225                  10.3656
#> 4                     -3.283123             9.238225                  10.3656
#> 5                     -3.283123             9.238225                  10.3656
#> 6                     -3.283123             9.238225                  10.3656
#>   Subject_SE_Days
#> 1        1.728378
#> 2        1.728378
#> 3        1.728378
#> 4        1.728378
#> 5        1.728378
#> 6        1.728378

^{Created on 2021-06-02 by the reprex package (v1.0.0)}

strengejacke · 2021-06-02T08:09:09Z

that looks nice!

bwiernik · 2021-06-02T22:01:53Z

These are the just the estimated random effects (returned with ranef()) right? If so, I like "estimate_random".

"estimate_group_specific" should return the random effects + fixed effects (what is returned with coef()).

DominiqueMakowski · 2021-06-03T02:02:28Z

What I did is I added an argument to estimate_random called deviation = TRUE:

The description is as follows:

modelbased/R/estimate_random.R

Line 6 in 66a45db

    
           #' @param deviation If \code{TRUE}, the coefficients are the ones estimated natively by the model (as they are returned by for instance \code{lme4::ranef()}). They correspond to the \emph{deviation} of each individual group from their fixed effect. As such, a coefficient close to 0 means that the participants' effect is the same as the population-level effect (in other words, in is "in the norm"). If \code{FALSE}, will return the sum of the random effect and its corresponding fixed effects. This argument can be used to reproduce the results given by \code{lme4::ranef()} and \code{coef()} (see \code{?coef.merMod}). Note that the sum is made between the random effects parameters and the fixed effect coefficient (which is a point-estimate), and thus does not take into account the variability and uncertainty in the fixed effects estimations. Thus, use these with caution.

I also mention that in the new vignette I started: feel free to correct and improve!

bwiernik · 2021-06-03T02:34:42Z

We aren't currently reporting SEs or intervals for the deviation = FALSE estimates for lme4 or glmmTMB, right? We shouldn't ever do that for lme4 and should wait to do so for glmmTMB until a proper method is available (Ben Bolker has said he would look into it). For bayesian models we can just derive from the posterior of course.

DominiqueMakowski · 2021-06-03T02:43:53Z

From my current understanding the output of coef (https://github.com/lme4/lme4/blob/9c673edb76ae19165ffe0a45b375737bb02f1fc3/R/lmer.R#L657) is literally just the sum of ranef and fixef (i.e., the point-estimate of the coefficients). As such, I don't see the difference between adding the fixef point-estimate to the random coefficient and adding the fixef point-estimate to say the random CI bounds. With the same logic, the SE stays the same (since we it doesn't correct for the variability of the fixed effect anyway). So I don't think it's inherently wrong to return the same SE and the summed CI, given the current state of things (again, based on how coef currently works)

mattansb · 2021-06-03T05:20:39Z

If this is ranef+fixef, I think estimate_random is a bad fit :/

DominiqueMakowski · 2021-06-03T05:35:00Z

I don't see it that way, for me it's still the random effects (or rather group-level effects), it's just that they axe expressed in the same "unit" as the fixed effect (rather than being expressed in relation to it), which makes it often easier to interpret but it doesn't change inherently the nature of the coefficients.

It's a bit like doing Sepal.Length / Species instead of Sepal.Length * Species, it doesn't change the model, but it shows the effect for each level rather than the "deviation" of the effect from the intercept. That might not be the best example but you see what I mean (i hope)

Afaik it's mostly a "rescaling" of the values for more interpretability rather than a fundamental alteration of the meaning...

mattansb · 2021-06-03T05:46:30Z

for me it's still the random effects (or rather group-level effects)

To me those aren't the same thing... :/

Random = distribution around the fixed
Group level effect = the predicted effect at the level

mattansb · 2021-06-03T05:59:33Z

It actually matters quite a bit.....

library(lme4)
#> Loading required package: Matrix

m1 <- lmer(mpg ~ 1 + (1|gear), data = mtcars)
m2 <- lmer(mpg ~ 0 + (1|gear), data = mtcars)

# Same
coef(m1)
#> $gear
#>   (Intercept)
#> 3    16.46041
#> 4    24.15615
#> 5    21.22401
#> 
#> attr(,"class")
#> [1] "coef.mer"
coef(m2)
#> $gear
#>   (Intercept)
#> 3    16.05235
#> 4    24.43000
#> 5    21.16515
#> 
#> attr(,"class")
#> [1] "coef.mer"

# Very diff
ranef(m1)
#> $gear
#>   (Intercept)
#> 3   -4.153112
#> 4    3.542629
#> 5    0.610483
#> 
#> with conditional variances for "gear"
ranef(m2)
#> $gear
#>   (Intercept)
#> 3    16.05235
#> 4    24.43000
#> 5    21.16515
#> 
#> with conditional variances for "gear"

^{Created on 2021-06-03 by the reprex package (v2.0.0)}

DominiqueMakowski · 2021-06-03T06:16:04Z

when I said it doesn't matter I didn't mean that the values are the same ^^ naturally they change in respect to their fixed effect. But yeah in your m2 since "their fixed effect" (the intercept) = 0 so ranef and coef are the same I'm not sure what point did you have in mind?

DominiqueMakowski · 2021-06-03T06:18:28Z

Group level effect = the predicted effect at the level

It's not really the predicted effect as much as simply the random parameter expressed in "absolute" (i.e., not "relative" to its fixed effect)

mattansb · 2021-06-03T06:25:30Z

The random part is the predicted deviation of the effect, so when adding the fixed effect it is the predicted effect. AFAIK this is the terminology used for the output of coef() here... These values aren't present as-is in the data (because of shrinkage), but they are the predicted values...

DominiqueMakowski · 2021-06-03T06:30:24Z

ah ok I see what you mean by predicted here ^^ (and thank god none of us is from ML)

strengejacke · 2021-06-03T06:49:12Z

ranef() returns BLUPs, best linear unbiased prediction. The random effects are indeed not estimated, unlike fixed effects.

bwiernik · 2021-06-03T08:37:43Z

Let's change the name to estimate_blup(). Also rather than "deviation", I think we need a different name for the argument. "deviation" is really only accurate for specific model specifications. How about component with options for "ranef"/"random" or "group_specific"/"total"/"coefficient" corresponding to the current deviation = true/false options.

Explanation:

The more I think about it, the more I really don't like calling the output of coef() "random". It's not. It's the sum of the fixed and random components. I think this is going to encourage confusion on users' parts. See https://wviechtb.github.io/metafor/reference/blup.html and glmmTMB/glmmTMB#691 for some discussion.

The SEs are absolutely not the same. In most cases, they are not even close. The SE for the output of coef() is sqrt(SE^2_fixed + SE^2_random + 2*cov_fixedrandom). The covariance term tends to be very strong and negative. Because of some of the linear algebra tricks lme4 uses (and Doug Bates' philosophical objections), it's not possible to estimate that covariance with lme4. Ben Bolker has said he is open to glmmTMB reporting these SEs, but there are some implementation details to work out. glmmTMB/glmmTMB#691

mattansb · 2021-06-03T08:46:01Z

I agree with all of this, expect for estimate_blup() - no one will have any idea what that means :/

DominiqueMakowski · 2021-06-03T08:53:58Z

Yeah, I think estimate_random is alright and understandable as a general name

what about type="random"/"total" or relative [to the fixed effect] = TRUE/FALSE?

I'll remove the SE

bwiernik · 2021-06-03T08:57:46Z

I am really uncomfortable calling the function estimate_random for something that is not just random.

If these are going to go into one function, then I think that "blup" is the accurate label.

These are the BLUPs or eBLUPs of either the conditional modes of the random effect distributions or the conditional modes of the total coefficient distributions.

BLUP is quite widespread terminology, "best linear unbiased prediction (BLUP)" is a term used widely in mixed effects model literature and in most mixed effects model textbooks and course materials.

We can try to come up with another label, but if we can't and "blup" isn't acceptable, then I think separate functions are needed, such as estimate_random and estimate_coefficient.

One other reason I like blup here is that it can then also be extended in the future. For example, we could add component="conditional_residual" for the deviation of observed scores from the BLUP.

mattansb · 2021-06-03T09:06:46Z

Agreed :/ Though I still think estimate_grouplevel is both specific enough and ambiguous enough for our needs here...

bwiernik · 2021-06-03T09:10:04Z

I am okay with estimate_grouplevel with an argument for "random"/"coefficient" like I described above.

bwiernik · 2021-06-03T09:14:21Z

"condtional" could be another option, but that's probably too confusing given that glmmTMB also uses "conditional" to refer to conditional on the zero-inflated model as well as conditional on group membership.

DominiqueMakowski · 2021-06-03T09:20:43Z

estimate_grouplevel() is a fair compromise, I also find estimate_blups terribly un-easystatsy

DominiqueMakowski · 2021-06-03T09:22:23Z

I'm worried that component or effect as an arg will be confusing given that these are also present in parameters() and insight's functions

bwiernik · 2021-06-03T09:27:26Z

How about "what" or "estimate" or "estimate.what"/"estimate_what"?

DominiqueMakowski · 2021-06-03T09:29:20Z

what about one of:
estimate_grouplevel(model, type = c("random", "total"))
estimate_grouplevel(model, type = c("random", "coefficient"))

bwiernik · 2021-06-03T09:34:48Z

I like it. Let's go with "coefficient"

mattansb · 2021-06-03T09:51:15Z

estimate_grouplevel(model, type = c("random", "coefficient")) 👍

DominiqueMakowski · 2021-06-03T10:58:59Z

should reshape_random() be renamed reshape_grouplevel() or is it fine?

DominiqueMakowski · 2021-06-03T11:12:40Z

actually "coefficient" might not be the best, because it's confusing with the indices arg that can be "Coefficient" (to retrieve the column name), what about "full" or "total" or "absolute" or ... "blup"? 😁

mattansb · 2021-06-03T12:25:32Z

Why does it matter if the argument value might conflict with another argument's value?

mattansb · 2021-06-03T12:26:01Z

should reshape_random() be renamed reshape_grouplevel() or is it fine?

Hmmm I think should also be changed.

DominiqueMakowski · 2021-06-03T13:06:11Z

Why does it matter if the argument value might conflict with another argument's value?

reshape_grouplevel(estimate_grouplevel(x, "coefficient"), "Coefficient") or

x %>%
  estimate_grouplevel("coefficient") %>%  # lowercase
  reshape_grouplevel("Coefficient")  # uppercase

is arguably not the most elegant and clear ^^

bwiernik · 2021-06-03T15:55:32Z

I don't think that clash is that big a deal. It's not as unclear when the arguments are named.

x %>%
  estimate_grouplevel(type = "coefficient") %>% 
  reshape_grouplevel(indices = "Coefficient")

"full" or "total" or "absolute" or ... "blup"

"total" should be okay--I'm not thinking of a place where this would be inaccurate. Another option might be "effect" but that's probably not as clear.

"blup" could refer to either random or total. The best prediction of what?

DominiqueMakowski · 2021-06-04T03:56:44Z

Made the fixes, I think we're good for submission for this round, next update will focus on visualisation (#114 which needs lightbeam geoms in see etc.). Will pass it to the winbuilder first and then submit

DominiqueMakowski added the Feature idea 🔥 label May 31, 2021

DominiqueMakowski changed the title ~~group-level parameters: function name~~ group-level parameters May 31, 2021

DominiqueMakowski added a commit that referenced this issue May 31, 2021

estimate_individual #112

f21d2bf

DominiqueMakowski closed this as completed Jun 4, 2021

group-level parameters #112

group-level parameters #112

Comments

DominiqueMakowski commented May 31, 2021 • edited Loading

mattansb commented May 31, 2021

DominiqueMakowski commented May 31, 2021

DominiqueMakowski commented Jun 2, 2021 • edited Loading

strengejacke commented Jun 2, 2021

bwiernik commented Jun 2, 2021

DominiqueMakowski commented Jun 3, 2021

bwiernik commented Jun 3, 2021

DominiqueMakowski commented Jun 3, 2021 • edited Loading

mattansb commented Jun 3, 2021

DominiqueMakowski commented Jun 3, 2021

mattansb commented Jun 3, 2021

mattansb commented Jun 3, 2021

DominiqueMakowski commented Jun 3, 2021 • edited Loading

DominiqueMakowski commented Jun 3, 2021 • edited Loading

mattansb commented Jun 3, 2021

DominiqueMakowski commented Jun 3, 2021

strengejacke commented Jun 3, 2021

bwiernik commented Jun 3, 2021 • edited Loading

mattansb commented Jun 3, 2021

DominiqueMakowski commented Jun 3, 2021 • edited Loading

bwiernik commented Jun 3, 2021 • edited Loading

mattansb commented Jun 3, 2021

bwiernik commented Jun 3, 2021

bwiernik commented Jun 3, 2021

DominiqueMakowski commented Jun 3, 2021

DominiqueMakowski commented Jun 3, 2021

bwiernik commented Jun 3, 2021

DominiqueMakowski commented Jun 3, 2021

bwiernik commented Jun 3, 2021

mattansb commented Jun 3, 2021

DominiqueMakowski commented Jun 3, 2021

DominiqueMakowski commented Jun 3, 2021

mattansb commented Jun 3, 2021

mattansb commented Jun 3, 2021

DominiqueMakowski commented Jun 3, 2021

bwiernik commented Jun 3, 2021

DominiqueMakowski commented Jun 4, 2021

DominiqueMakowski commented May 31, 2021 •

edited

Loading

DominiqueMakowski commented Jun 2, 2021 •

edited

Loading

DominiqueMakowski commented Jun 3, 2021 •

edited

Loading

DominiqueMakowski commented Jun 3, 2021 •

edited

Loading

DominiqueMakowski commented Jun 3, 2021 •

edited

Loading

bwiernik commented Jun 3, 2021 •

edited

Loading

DominiqueMakowski commented Jun 3, 2021 •

edited

Loading

bwiernik commented Jun 3, 2021 •

edited

Loading