Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_predicted.glmmTMB tests and not transformations #449

Closed
wants to merge 1 commit into from
Closed

get_predicted.glmmTMB tests and not transformations #449

wants to merge 1 commit into from

Conversation

vincentarelbundock
Copy link
Contributor

This PR is related to #413. It might be a bit more controversial than the others, so I’ll leave it here for a bit to give people a chance to comment.

I removed the call to .get_predicted_transform from get_predicted.glmmTMB because I think it produced incorrect results:

library(insight)
library(glmmTMB)

m <- glmmTMB(count ~ spp + mined + (1 | site),
             zi = ~ spp + mined,
             family = nbinom2, data = Salamanders)

unknown <- get_predicted(m, predict = "expectation")
known <- predict(m, type = "response", se.fit = TRUE)

unknown |> as.data.frame() |> head()
#   Predicted        SE     CI_low  CI_high
# 1 0.5387752 0.2504592 0.21662650 1.339996
# 2 1.0768783 0.3636606 0.55554192 2.087452
# 3 0.3554236 0.2528622 0.08813904 1.433258
# 4 2.4701755 0.6106873 1.52156375 4.010195
# 5 2.4950498 0.6142041 1.54006744 4.042208
# 6 2.1819828 0.5595765 1.31995150 3.606988

known$fit |> head()
# mu_predict mu_predict mu_predict mu_predict mu_predict mu_predict 
#  0.1546241  0.3090553  0.1020037  2.0732045  2.0940814  1.8313260

known$se.fit |> head()
# mu_predict mu_predict mu_predict mu_predict mu_predict mu_predict 
# 0.07476462 0.15006330 0.06712653 0.52401542 0.50728643 0.45872776

The documentation of the predict.glmmTMB() function states:

"response" expected value; this is mu*(1-p) for zero-inflated models and ‘mu’ otherwise

So the fact that response does not equal expectation is surprising.

I’m not sure it’s my place to open a big philosophical debate about this, but I would argue that insight should do its best to avoid making automatic transformations to numerical values in general, and to the scale of predicted values in particular. The mission statement of the package on CRAN is:

“A tool to provide an easy, intuitive and consistent access to information contained in various R models”

So I think we should concentrate on access to information, and not on transformation of information. For example, providing a “dictionary” of common response types makes sense, because many models share types but use different naming conventions. So we make things more uniform. But automagically transforming the scale of the response can be dangerous – as illustrated by the code above – and it’s a very very deep rabbit hole that makes the codebase hard to understand and hard to manage.

For insight to play a role as an infrastructure package, it’s crucial that it never produce erroneous or suprising results. Avoiding transformations, keeping things simple, and sticking close to the modelling package’s information feels like a safer strategy.

Anyway, just my 2 cents. Feel free to ignore or tell me it's all dumb ;)

@bwiernik
Copy link
Contributor

I disagree about insight not doing transformations. One of the big problems get_predicted() is designed to solve is that packages vary widely in what response options they make available in their predict methods and in the labels they use for those options. This is especially a problem when packages use the same label for different things.

One of the design goals here is that, eg, "expectation" should always return the expected values of the response, conditional on the predictors

@vincentarelbundock
Copy link
Contributor Author

I disagree about insight not doing transformations. One of the big problems get_predicted() is designed to solve is that packages vary widely in what response options they make available in their predict methods and in the labels they use for those options.

That makes a lot of sense.

Would you then agree that get_predicted(model, predict="expectation") should default to no transformation unless the predict() function does not include a type that returns the expected value of the response?

Right now we call .get_predict_transform in the following models:

  • lm
  • glmmTMB
  • lmerMod (inherited by merMod
  • gam (inherited by gamm)

I’m not sure about gam models, but I think that in all the other cases, calling predict(type="response") gives us what we expect with predict="expectation". No?

For example in a logit glmer we can get predicted probabilities in [0, 1]:

library(lme4)
# Loading required package: Matrix

mod <- glmer(am ~ mpg + (1 | cyl), family = binomial, data = mtcars)
# boundary (singular) fit: see ?isSingular
predict(mod, type = "response") |> head()
#         Mazda RX4     Mazda RX4 Wag        Datsun 710    Hornet 4 Drive 
#         0.4610951         0.4610951         0.5978984         0.4917199 
# Hornet Sportabout           Valiant 
#         0.2969009         0.2599331

@vincentarelbundock
Copy link
Contributor Author

Addendum: Maybe one justification is that we need predictions on the link scale to get confidence intervals that do not go outside the 0-1 range in logit, etc.

@bwiernik
Copy link
Contributor

Can you take a step back and describe the issue from the beginning? I'm not sure what the .get_predict_transform function does

@vincentarelbundock
Copy link
Contributor Author

vincentarelbundock commented Oct 11, 2021

Frankly, I'm quite confused about how the code works. AFAICT:

  • get_predicted_args takes a model and a predict argument, and creates an args list with two variables: type and scale
    • type: most of the time this is set to "link"
    • scale: when predict="expectation", this is typically set to "response"
  • get_predicted.CLASS uses the modelling package's function to get predictions using the type variable set above:
    • predict(type = args$type)
    • Usually, this gives us predictions on the link scale
  • get_predicted_ci creates confidence intervals. Again, usually on the link scale.
  • get_predicted_transform uses the args$scale value to transform the predicted values and confidence intervals from the link scale to whatever scale was chosen by the user (e.g., if predict="expectation", then args$scale="response"

The argument in the first post of this PR is that we should not make predictions on the link scale and then transform them ourselves. Instead, we should call predict(model, type = "response") and benefit from the modelling packages' work.

However, I see now that getting confidence intervals might require us to go through the link scale, because doing 1.96*SE would sometimes produce interval bounds outside the variable's natural scale.

That said, doing the link transforms ourselves still feels dangerous and a time sink.

Notes:

  • Some predict methods have a se.fit argument which returns intervals. We probably shouldn't do the transforms there ourselves.
  • Some predict methods have a se.fit argument which returns only standard errors. In those cases, I would still not do the transform, and just return the standard error without intervals. If we really want to have intervals, we can do the transformation ourselves, but we need extensive tests which cover all the link function supported by the model. This is not the case right now. The glmmTMB example in the first post would probably have been caught by such tests.

@vincentarelbundock
Copy link
Contributor Author

Here's a minimal working example of what goes wrong in the glmmTMB transformation:

library(glmmTMB)
library(insight)
library(testthat)

data("fish")
m1 <- glmmTMB(
  count ~ child + camper + (1 | persons),
  ziformula = ~ child + camper + (1 | persons),
  data = fish,
  family = truncated_poisson())

Link predictions are the same

known_link <- predict(m1, type = "link")
insight_link <- get_predicted(m1, predict = "link")

known_link |> head()
# [1] 0.02278348 0.75632274 0.02278348 0.50905645 0.02278348 0.96834946
insight_link |> head()
# [1] 0.02278348 0.75632274 0.02278348 0.50905645 0.02278348 0.96834946

Response predictions are different:

known_response <- predict(m1, type = "response")
insight_response <- get_predicted(m1, predict = "expectation")

known_response |> head()
# [1] 0.2400130 0.9726394 0.2400130 0.4613930 0.2400130 0.6362097
insight_response |> head()
# [1] 1.023045 2.130428 1.023045 1.663721 1.023045 2.633594

Reponse predictions produced by insight are equal to insight::link_inverse(x)(known_link), as called on this line of code:

insight::link_inverse(m1)(known_link) |> head()
# [1] 1.023045 2.130428 1.023045 1.663721 1.023045 2.633594
insight_response |> head()
# [1] 1.023045 2.130428 1.023045 1.663721 1.023045 2.633594

So my guess is that link_inverse gives us the wrong inverse link, or at least a different one than is used internally by the glmmTMB package. I think this underscores the complexity of the problem: doing the transforms ourselves requires keeping track of every link function for every model type supported by every package, and then making sure we test all of those.

@vincentarelbundock
Copy link
Contributor Author

@bwiernik sorry to bug you with such a long series of stream-of-consciousness posts. I should figure things out properly before starting big discussions on PRs like this.

I figured out how to back out the same predictions as predict.glmmTMB and will open a model-specific issue about this.

I think I still hold some of the views expressed above, but there's probably no point in rethinking all that if what we have can be made to work in predictable ways.

Thanks!

@bwiernik
Copy link
Contributor

At a broad level, I think the issue is that packages are very inconsistent with what infrastructure they provide. A method generally needs to be model specific

@vincentarelbundock vincentarelbundock deleted the get_predicted_glmmTMB_type branch October 11, 2021 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants