Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

marginalmeans backend for estimate_means() #214

Open
DominiqueMakowski opened this issue Sep 26, 2022 · 4 comments
Open

marginalmeans backend for estimate_means() #214

DominiqueMakowski opened this issue Sep 26, 2022 · 4 comments

Comments

@DominiqueMakowski
Copy link
Member

Okay I made a few attempts, but the issue is I cannot seem to be able to pass a datagrid to marginaleffects::marginalmeans(). Which blocks me from doing things like:

library(modelbased)
#> Warning: package 'modelbased' was built under R version 4.2.1

model <- lm(Petal.Length ~ Sepal.Width * Species, data = iris)

estimate_means(model, fixed = "Sepal.Width")
#> We selected `at = c("Species")`.
#> Estimated Marginal Means
#> 
#> Species    | Sepal.Width | Mean |   SE |       95% CI
#> -----------------------------------------------------
#> setosa     |        3.06 | 1.43 | 0.08 | [1.28, 1.58]
#> versicolor |        3.06 | 4.50 | 0.07 | [4.35, 4.65]
#> virginica  |        3.06 | 5.61 | 0.06 | [5.50, 5.72]
#> 
#> Marginal means estimated at Species
estimate_means(model, at = c("Species", "Sepal.Width"), length = 2)
#> Estimated Marginal Means
#> 
#> Species    | Sepal.Width | Mean |   SE |       95% CI
#> -----------------------------------------------------
#> setosa     |        2.00 | 1.35 | 0.21 | [0.92, 1.77]
#> versicolor |        2.00 | 3.61 | 0.15 | [3.33, 3.90]
#> virginica  |        2.00 | 4.88 | 0.17 | [4.54, 5.23]
#> setosa     |        4.40 | 1.54 | 0.15 | [1.24, 1.84]
#> versicolor |        4.40 | 5.63 | 0.29 | [5.05, 6.20]
#> virginica  |        4.40 | 6.53 | 0.25 | [6.04, 7.02]
#> 
#> Marginal means estimated at Species, Sepal.Width
estimate_means(model, at = "Species=c('versicolor', 'setosa')")
#> NOTE: Results may be misleading due to involvement in interactions
#> Estimated Marginal Means
#> 
#> Species    | Mean |   SE |       95% CI
#> ---------------------------------------
#> versicolor | 4.50 | 0.07 | [4.35, 4.65]
#> setosa     | 1.43 | 0.08 | [1.28, 1.58]
#> 
#> Marginal means estimated at Species
estimate_means(model, at = "Sepal.Width=c(2, 4)")
#> NOTE: Results may be misleading due to involvement in interactions
#> Estimated Marginal Means
#> 
#> Sepal.Width | Mean |   SE |       95% CI
#> ----------------------------------------
#> 2.00        | 3.28 | 0.10 | [3.07, 3.49]
#> 4.00        | 4.35 | 0.10 | [4.15, 4.55]
#> 
#> Marginal means estimated at Sepal.Width
estimate_means(model, at = c("Species", "Sepal.Width=0"))
#> Estimated Marginal Means
#> 
#> Species    | Sepal.Width | Mean |   SE |       95% CI
#> -----------------------------------------------------
#> setosa     |        0.00 | 1.18 | 0.50 | [0.19, 2.17]
#> versicolor |        0.00 | 1.93 | 0.49 | [0.97, 2.90]
#> virginica  |        0.00 | 3.51 | 0.51 | [2.50, 4.52]
#> 
#> Marginal means estimated at Species, Sepal.Width
estimate_means(model, at = "Sepal.Width", length = 5)
#> NOTE: Results may be misleading due to involvement in interactions
#> Estimated Marginal Means
#> 
#> Sepal.Width | Mean |   SE |       95% CI
#> ----------------------------------------
#> 2.00        | 3.28 | 0.10 | [3.07, 3.49]
#> 2.60        | 3.60 | 0.06 | [3.49, 3.71]
#> 3.20        | 3.92 | 0.04 | [3.84, 4.01]
#> 3.80        | 4.25 | 0.08 | [4.08, 4.41]
#> 4.40        | 4.57 | 0.14 | [4.30, 4.84]
#> 
#> Marginal means estimated at Sepal.Width
estimate_means(model, at = "Sepal.Width=c(2, 4)")
#> NOTE: Results may be misleading due to involvement in interactions
#> Estimated Marginal Means
#> 
#> Sepal.Width | Mean |   SE |       95% CI
#> ----------------------------------------
#> 2.00        | 3.28 | 0.10 | [3.07, 3.49]
#> 4.00        | 4.35 | 0.10 | [4.15, 4.55]
#> 
#> Marginal means estimated at Sepal.Width

Created on 2022-09-26 by the reprex package (v2.0.1)

Basically controlling at which levels of Sepal.Width I want the means. Is that the right way of doing it or should one directly use predictions on the datagrid?

@bwiernik is there a reason do use marginalmeans() over simply get_predicted on the datagrid one wants the means of? (@vincentarelbundock but I don't wanna tag you each time I need help with some probably basic thing ^^)

@bwiernik
Copy link
Contributor

marginaleffects has its own datagrid function, so passing a grid is definitely possible.

The purpose of marginal means over get_predicted is if you want to marginalize over other predictors (eg, average mean in control group across ethnic groups). marginaleffects has several options for how to handle continuous variables when averaging

@vincentarelbundock
Copy link
Contributor

To begin, let me introduce a conceptual difference between two functions: predictions() and marginalmeans().

predictions() accepts a grid (newdata argument) in the form of a data frame, and it makes predictions for each row of that data frame. When calling predictions(), we can also use the by argument to average-out or "marginalize" across some variables from the model.

marginalmeans() is a handy shortcut function which does not accept a grid, but does three things under the hood:

  1. Create a grid automatically
  2. Call predictions() to make predictions on that grid
  3. Marginalize across some categorical variables

In almost all of the examples you show, there is no averaging across categorical variables going on in estimate_means(), so we don't actually have to call marginalmeans(). Calling predictions() works just fine. At the bottom of this post I give code to replicate the results of all your examples.

Before that, I'll give a few examples of marginalmeans() to illustrate the equivalences.

model <- lm(Petal.Length ~ Sepal.Width * Species, data = iris)

mod <- lm(mpg ~ factor(am) + factor(vs) + factor(gear), data = mtcars)

# 1 call to `marginalmeans()` is equivalent to 3 calls to `estimate_means()`
marginalmeans(mod)
estimate_means(mod, at = "am")
estimate_means(mod, at = "vs")
estimate_means(mod, at = "gear")

# Marginalize across values of `gear`. note the `interaction` argument. 
marginalmeans(mod, variables = c("am", "vs"), interaction = TRUE)
estimate_means(mod, at = c("am", "vs"))

Now, replications of your examples:

library(modelbased)
library(marginaleffects)
model <- lm(Petal.Length ~ Sepal.Width * Species, data = iris)

estimate_means(
    model,
    fixed = "Sepal.Width")
predictions(
    model,
    newdata = datagrid(Species = unique))

estimate_means(
    model, 
    at = c("Species", "Sepal.Width"), length = 2)
predictions(
    model,
    newdata = datagrid(Species = unique, Sepal.Width = c(2, 4.4)))

estimate_means(
    model,
    at = "Species=c('versicolor', 'setosa')")
predictions(
    model,
    newdata = datagrid(Species = c("versicolor", "setosa")))

estimate_means(
    model, 
    at = "Sepal.Width",
    length = 5)
predictions(
    model,
    newdata = datagrid(Sepal.Width = fivenum, Species = unique),
    by = "Sepal.Width")

estimate_means(
    model,
    at = "Sepal.Width=c(2, 4)")
predictions(
    model,
    newdata = datagrid(Sepal.Width = c(2, 4), Species = unique),
    by = "Sepal.Width")

@vincentarelbundock
Copy link
Contributor

Of course, you can use the easystats version of the datagrid() builder and feed that to the newdata argument in predictions(). This would bring the two approaches closer in syntax.

@strengejacke
Copy link
Member

#248 as reminder/reference.

I have some experience now in wrapping marginaleffects functions to get the desired results, so maybe we can try to continue on this issue. The major distinctions/use cases I see (but feel free to correct/add):

  • predict() - i.e. holding non-focal terms constant at mean/reference level/mode
  • emmeans() - i.e. mean and "weighting" factors for non-focal terms
  • counterfactual - i.e. averaging across non-focal terms

Furthermore, for mixed models:

  • predictions conditioned on fixed effects only ("conditional effects")
  • predictions for the full model, taking random effects into account ("marginal effects", in the random-effects meaning)

Once we have these different estimate_*() functions/options, we can easily get the contrasts or pairwise comparisons by providing the hypothesis argument to those function calls.

WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants