Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

group-level parameters #112

Closed
DominiqueMakowski opened this issue May 31, 2021 · 37 comments
Closed

group-level parameters #112

DominiqueMakowski opened this issue May 31, 2021 · 37 comments
Labels
Feature idea 🔥 New feature or request

Comments

@DominiqueMakowski
Copy link
Member

DominiqueMakowski commented May 31, 2021

Follow up of easystats/parameters#486

What's the most catchy & appropriate name; estimate_individual, estimate_groupspecific, estimate_grouplevel, estimate_random, estimate_individualeffects, estimate_scores, estimate_individualparameters...

@DominiqueMakowski DominiqueMakowski added the Feature idea 🔥 New feature or request label May 31, 2021
@DominiqueMakowski DominiqueMakowski changed the title group-level parameters: function name group-level parameters May 31, 2021
@mattansb
Copy link
Member

I like grouplevel.. or groupwise?

DominiqueMakowski added a commit that referenced this issue May 31, 2021
@DominiqueMakowski
Copy link
Member Author

I like grouplevel too but I do recall @bwiernik made a distinction between group-level and group-specific

@DominiqueMakowski
Copy link
Member Author

DominiqueMakowski commented Jun 2, 2021

After quite some tinkering I think the most intuitive and straightforward design is estimate_random:

  • its default output is mostly a cleaned version of parameters(x, group_level = TRUE) (but formatted to be specific to random effects, so some non-applicable columns have been dropped)
  • It can be plotted
  • It can be re-organized to match the original input of random factors (and directly cbinded with the original data) using reshape_random() (in the line of our other specific reshapers like reshape_ci, reshape_loadings, reshape_iterations, etc.)
library(modelbased)
library(lme4)
library(see)

data <- lme4::sleepstudy
data <- rbind(data, data)
data$Newfactor <- rep(c("A", "B", "C", "D"))

model <- lmer(Reaction ~ Days + (1 + Days|Subject) + (1|Newfactor), data = data)

random <- estimate_random(model)
head(random)
#> Group     | Level |   Parameter | Coefficient |   SE |          95% CI
#> ----------------------------------------------------------------------
#> Newfactor |     A | (Intercept) |        0.58 | 2.04 | [ -3.42,  4.58]
#> Newfactor |     B | (Intercept) |        2.32 | 2.05 | [ -1.69,  6.34]
#> Newfactor |     C | (Intercept) |       -2.04 | 2.04 | [ -6.04,  1.97]
#> Newfactor |     D | (Intercept) |       -0.87 | 2.05 | [ -4.88,  3.15]
#> Subject   |   308 | (Intercept) |       -3.28 | 9.24 | [-21.39, 14.82]
#> Subject   |   308 |        Days |       10.37 | 1.73 | [  6.98, 13.75]

plot(random)

# Rematch to original random data
x <- reshape_random(random, indices = c("Coefficient", "SE"))
head(x)
#>   Subject Newfactor Newfactor_Coefficient_Intercept Newfactor_SE_Intercept
#> 1     308         A                       0.5818450               2.042339
#> 2     308         B                       2.3221207               2.049567
#> 3     308         C                      -2.0368499               2.042339
#> 4     308         D                      -0.8671158               2.049567
#> 5     308         A                       0.5818450               2.042339
#> 6     308         B                       2.3221207               2.049567
#>   Subject_Coefficient_Intercept Subject_SE_Intercept Subject_Coefficient_Days
#> 1                     -3.283123             9.238225                  10.3656
#> 2                     -3.283123             9.238225                  10.3656
#> 3                     -3.283123             9.238225                  10.3656
#> 4                     -3.283123             9.238225                  10.3656
#> 5                     -3.283123             9.238225                  10.3656
#> 6                     -3.283123             9.238225                  10.3656
#>   Subject_SE_Days
#> 1        1.728378
#> 2        1.728378
#> 3        1.728378
#> 4        1.728378
#> 5        1.728378
#> 6        1.728378

Created on 2021-06-02 by the reprex package (v1.0.0)

@strengejacke
Copy link
Member

that looks nice!

@bwiernik
Copy link
Contributor

bwiernik commented Jun 2, 2021

These are the just the estimated random effects (returned with ranef()) right? If so, I like "estimate_random".

"estimate_group_specific" should return the random effects + fixed effects (what is returned with coef()).

@DominiqueMakowski
Copy link
Member Author

What I did is I added an argument to estimate_random called deviation = TRUE:

The description is as follows:

#' @param deviation If \code{TRUE}, the coefficients are the ones estimated natively by the model (as they are returned by for instance \code{lme4::ranef()}). They correspond to the \emph{deviation} of each individual group from their fixed effect. As such, a coefficient close to 0 means that the participants' effect is the same as the population-level effect (in other words, in is "in the norm"). If \code{FALSE}, will return the sum of the random effect and its corresponding fixed effects. This argument can be used to reproduce the results given by \code{lme4::ranef()} and \code{coef()} (see \code{?coef.merMod}). Note that the sum is made between the random effects parameters and the fixed effect coefficient (which is a point-estimate), and thus does not take into account the variability and uncertainty in the fixed effects estimations. Thus, use these with caution.

I also mention that in the new vignette I started: feel free to correct and improve!

@bwiernik
Copy link
Contributor

bwiernik commented Jun 3, 2021

We aren't currently reporting SEs or intervals for the deviation = FALSE estimates for lme4 or glmmTMB, right? We shouldn't ever do that for lme4 and should wait to do so for glmmTMB until a proper method is available (Ben Bolker has said he would look into it). For bayesian models we can just derive from the posterior of course.

@DominiqueMakowski
Copy link
Member Author

DominiqueMakowski commented Jun 3, 2021

From my current understanding the output of coef (https://github.com/lme4/lme4/blob/9c673edb76ae19165ffe0a45b375737bb02f1fc3/R/lmer.R#L657) is literally just the sum of ranef and fixef (i.e., the point-estimate of the coefficients). As such, I don't see the difference between adding the fixef point-estimate to the random coefficient and adding the fixef point-estimate to say the random CI bounds. With the same logic, the SE stays the same (since we it doesn't correct for the variability of the fixed effect anyway). So I don't think it's inherently wrong to return the same SE and the summed CI, given the current state of things (again, based on how coef currently works)

@mattansb
Copy link
Member

mattansb commented Jun 3, 2021

If this is ranef+fixef, I think estimate_random is a bad fit :/

@DominiqueMakowski
Copy link
Member Author

I don't see it that way, for me it's still the random effects (or rather group-level effects), it's just that they axe expressed in the same "unit" as the fixed effect (rather than being expressed in relation to it), which makes it often easier to interpret but it doesn't change inherently the nature of the coefficients.

It's a bit like doing Sepal.Length / Species instead of Sepal.Length * Species, it doesn't change the model, but it shows the effect for each level rather than the "deviation" of the effect from the intercept. That might not be the best example but you see what I mean (i hope)

Afaik it's mostly a "rescaling" of the values for more interpretability rather than a fundamental alteration of the meaning...

@mattansb
Copy link
Member

mattansb commented Jun 3, 2021

for me it's still the random effects (or rather group-level effects)

To me those aren't the same thing... :/

  • Random = distribution around the fixed
  • Group level effect = the predicted effect at the level

@mattansb
Copy link
Member

mattansb commented Jun 3, 2021

It actually matters quite a bit.....

library(lme4)
#> Loading required package: Matrix

m1 <- lmer(mpg ~ 1 + (1|gear), data = mtcars)
m2 <- lmer(mpg ~ 0 + (1|gear), data = mtcars)

# Same
coef(m1)
#> $gear
#>   (Intercept)
#> 3    16.46041
#> 4    24.15615
#> 5    21.22401
#> 
#> attr(,"class")
#> [1] "coef.mer"
coef(m2)
#> $gear
#>   (Intercept)
#> 3    16.05235
#> 4    24.43000
#> 5    21.16515
#> 
#> attr(,"class")
#> [1] "coef.mer"

# Very diff
ranef(m1)
#> $gear
#>   (Intercept)
#> 3   -4.153112
#> 4    3.542629
#> 5    0.610483
#> 
#> with conditional variances for "gear"
ranef(m2)
#> $gear
#>   (Intercept)
#> 3    16.05235
#> 4    24.43000
#> 5    21.16515
#> 
#> with conditional variances for "gear"

Created on 2021-06-03 by the reprex package (v2.0.0)

@DominiqueMakowski
Copy link
Member Author

DominiqueMakowski commented Jun 3, 2021

when I said it doesn't matter I didn't mean that the values are the same ^^ naturally they change in respect to their fixed effect. But yeah in your m2 since "their fixed effect" (the intercept) = 0 so ranef and coef are the same I'm not sure what point did you have in mind?

@DominiqueMakowski
Copy link
Member Author

DominiqueMakowski commented Jun 3, 2021

Group level effect = the predicted effect at the level

It's not really the predicted effect as much as simply the random parameter expressed in "absolute" (i.e., not "relative" to its fixed effect)

@mattansb
Copy link
Member

mattansb commented Jun 3, 2021

The random part is the predicted deviation of the effect, so when adding the fixed effect it is the predicted effect. AFAIK this is the terminology used for the output of coef() here... These values aren't present as-is in the data (because of shrinkage), but they are the predicted values...

@DominiqueMakowski
Copy link
Member Author

ah ok I see what you mean by predicted here ^^ (and thank god none of us is from ML)

@strengejacke
Copy link
Member

ranef() returns BLUPs, best linear unbiased prediction. The random effects are indeed not estimated, unlike fixed effects.

@bwiernik
Copy link
Contributor

bwiernik commented Jun 3, 2021

Let's change the name to estimate_blup(). Also rather than "deviation", I think we need a different name for the argument. "deviation" is really only accurate for specific model specifications. How about component with options for "ranef"/"random" or "group_specific"/"total"/"coefficient" corresponding to the current deviation = true/false options.

Explanation:

The more I think about it, the more I really don't like calling the output of coef() "random". It's not. It's the sum of the fixed and random components. I think this is going to encourage confusion on users' parts. See https://wviechtb.github.io/metafor/reference/blup.html and glmmTMB/glmmTMB#691 for some discussion.

The SEs are absolutely not the same. In most cases, they are not even close. The SE for the output of coef() is sqrt(SE^2_fixed + SE^2_random + 2*cov_fixedrandom). The covariance term tends to be very strong and negative. Because of some of the linear algebra tricks lme4 uses (and Doug Bates' philosophical objections), it's not possible to estimate that covariance with lme4. Ben Bolker has said he is open to glmmTMB reporting these SEs, but there are some implementation details to work out. glmmTMB/glmmTMB#691

@mattansb
Copy link
Member

mattansb commented Jun 3, 2021

I agree with all of this, expect for estimate_blup() - no one will have any idea what that means :/

@DominiqueMakowski
Copy link
Member Author

DominiqueMakowski commented Jun 3, 2021

Yeah, I think estimate_random is alright and understandable as a general name

what about type="random"/"total" or relative [to the fixed effect] = TRUE/FALSE?

I'll remove the SE

@bwiernik
Copy link
Contributor

bwiernik commented Jun 3, 2021

I am really uncomfortable calling the function estimate_random for something that is not just random.

If these are going to go into one function, then I think that "blup" is the accurate label.

These are the BLUPs or eBLUPs of either the conditional modes of the random effect distributions or the conditional modes of the total coefficient distributions.

BLUP is quite widespread terminology, "best linear unbiased prediction (BLUP)" is a term used widely in mixed effects model literature and in most mixed effects model textbooks and course materials.

We can try to come up with another label, but if we can't and "blup" isn't acceptable, then I think separate functions are needed, such as estimate_random and estimate_coefficient.

One other reason I like blup here is that it can then also be extended in the future. For example, we could add component="conditional_residual" for the deviation of observed scores from the BLUP.

@mattansb
Copy link
Member

mattansb commented Jun 3, 2021

Agreed :/ Though I still think estimate_grouplevel is both specific enough and ambiguous enough for our needs here...

@bwiernik
Copy link
Contributor

bwiernik commented Jun 3, 2021

I am okay with estimate_grouplevel with an argument for "random"/"coefficient" like I described above.

@bwiernik
Copy link
Contributor

bwiernik commented Jun 3, 2021

"condtional" could be another option, but that's probably too confusing given that glmmTMB also uses "conditional" to refer to conditional on the zero-inflated model as well as conditional on group membership.

@DominiqueMakowski
Copy link
Member Author

estimate_grouplevel() is a fair compromise, I also find estimate_blups terribly un-easystatsy

@DominiqueMakowski
Copy link
Member Author

I'm worried that component or effect as an arg will be confusing given that these are also present in parameters() and insight's functions

@bwiernik
Copy link
Contributor

bwiernik commented Jun 3, 2021

How about "what" or "estimate" or "estimate.what"/"estimate_what"?

@DominiqueMakowski
Copy link
Member Author

what about one of:
estimate_grouplevel(model, type = c("random", "total"))
estimate_grouplevel(model, type = c("random", "coefficient"))

@bwiernik
Copy link
Contributor

bwiernik commented Jun 3, 2021

I like it. Let's go with "coefficient"

@mattansb
Copy link
Member

mattansb commented Jun 3, 2021

estimate_grouplevel(model, type = c("random", "coefficient")) 👍

@DominiqueMakowski
Copy link
Member Author

should reshape_random() be renamed reshape_grouplevel() or is it fine?

@DominiqueMakowski
Copy link
Member Author

actually "coefficient" might not be the best, because it's confusing with the indices arg that can be "Coefficient" (to retrieve the column name), what about "full" or "total" or "absolute" or ... "blup"? 😁

@mattansb
Copy link
Member

mattansb commented Jun 3, 2021

Why does it matter if the argument value might conflict with another argument's value?

@mattansb
Copy link
Member

mattansb commented Jun 3, 2021

should reshape_random() be renamed reshape_grouplevel() or is it fine?

Hmmm I think should also be changed.

@DominiqueMakowski
Copy link
Member Author

Why does it matter if the argument value might conflict with another argument's value?

reshape_grouplevel(estimate_grouplevel(x, "coefficient"), "Coefficient") or

x %>%
  estimate_grouplevel("coefficient") %>%  # lowercase
  reshape_grouplevel("Coefficient")  # uppercase

is arguably not the most elegant and clear ^^

@bwiernik
Copy link
Contributor

bwiernik commented Jun 3, 2021

I don't think that clash is that big a deal. It's not as unclear when the arguments are named.

x %>%
  estimate_grouplevel(type = "coefficient") %>% 
  reshape_grouplevel(indices = "Coefficient") 

"full" or "total" or "absolute" or ... "blup"

"total" should be okay--I'm not thinking of a place where this would be inaccurate. Another option might be "effect" but that's probably not as clear.

"blup" could refer to either random or total. The best prediction of what?

@DominiqueMakowski
Copy link
Member Author

Made the fixes, I think we're good for submission for this round, next update will focus on visualisation (#114 which needs lightbeam geoms in see etc.). Will pass it to the winbuilder first and then submit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature idea 🔥 New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants