-
-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get_predicted.glmmTMB tests and not transformations #449
get_predicted.glmmTMB tests and not transformations #449
Conversation
I disagree about insight not doing transformations. One of the big problems get_predicted() is designed to solve is that packages vary widely in what response options they make available in their predict methods and in the labels they use for those options. This is especially a problem when packages use the same label for different things. One of the design goals here is that, eg, "expectation" should always return the expected values of the response, conditional on the predictors |
That makes a lot of sense. Would you then agree that Right now we call
I’m not sure about For example in a logit library(lme4)
# Loading required package: Matrix
mod <- glmer(am ~ mpg + (1 | cyl), family = binomial, data = mtcars)
# boundary (singular) fit: see ?isSingular
predict(mod, type = "response") |> head()
# Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive
# 0.4610951 0.4610951 0.5978984 0.4917199
# Hornet Sportabout Valiant
# 0.2969009 0.2599331 |
Addendum: Maybe one justification is that we need predictions on the link scale to get confidence intervals that do not go outside the 0-1 range in logit, etc. |
Can you take a step back and describe the issue from the beginning? I'm not sure what the .get_predict_transform function does |
Frankly, I'm quite confused about how the code works. AFAICT:
The argument in the first post of this PR is that we should not make predictions on the link scale and then transform them ourselves. Instead, we should call However, I see now that getting confidence intervals might require us to go through the link scale, because doing 1.96*SE would sometimes produce interval bounds outside the variable's natural scale. That said, doing the link transforms ourselves still feels dangerous and a time sink. Notes:
|
Here's a minimal working example of what goes wrong in the library(glmmTMB)
library(insight)
library(testthat)
data("fish")
m1 <- glmmTMB(
count ~ child + camper + (1 | persons),
ziformula = ~ child + camper + (1 | persons),
data = fish,
family = truncated_poisson()) Link predictions are the same known_link <- predict(m1, type = "link")
insight_link <- get_predicted(m1, predict = "link")
known_link |> head()
# [1] 0.02278348 0.75632274 0.02278348 0.50905645 0.02278348 0.96834946
insight_link |> head()
# [1] 0.02278348 0.75632274 0.02278348 0.50905645 0.02278348 0.96834946 Response predictions are different: known_response <- predict(m1, type = "response")
insight_response <- get_predicted(m1, predict = "expectation")
known_response |> head()
# [1] 0.2400130 0.9726394 0.2400130 0.4613930 0.2400130 0.6362097
insight_response |> head()
# [1] 1.023045 2.130428 1.023045 1.663721 1.023045 2.633594 Reponse predictions produced by insight::link_inverse(m1)(known_link) |> head()
# [1] 1.023045 2.130428 1.023045 1.663721 1.023045 2.633594
insight_response |> head()
# [1] 1.023045 2.130428 1.023045 1.663721 1.023045 2.633594 So my guess is that |
@bwiernik sorry to bug you with such a long series of stream-of-consciousness posts. I should figure things out properly before starting big discussions on PRs like this. I figured out how to back out the same predictions as I think I still hold some of the views expressed above, but there's probably no point in rethinking all that if what we have can be made to work in predictable ways. Thanks! |
At a broad level, I think the issue is that packages are very inconsistent with what infrastructure they provide. A method generally needs to be model specific |
This PR is related to #413. It might be a bit more controversial than the others, so I’ll leave it here for a bit to give people a chance to comment.
I removed the call to
.get_predicted_transform
fromget_predicted.glmmTMB
because I think it produced incorrect results:The documentation of the
predict.glmmTMB()
function states:So the fact that
response
does not equalexpectation
is surprising.I’m not sure it’s my place to open a big philosophical debate about this, but I would argue that
insight
should do its best to avoid making automatic transformations to numerical values in general, and to the scale of predicted values in particular. The mission statement of the package on CRAN is:So I think we should concentrate on access to information, and not on transformation of information. For example, providing a “dictionary” of common response types makes sense, because many models share types but use different naming conventions. So we make things more uniform. But automagically transforming the scale of the response can be dangerous – as illustrated by the code above – and it’s a very very deep rabbit hole that makes the codebase hard to understand and hard to manage.
For
insight
to play a role as an infrastructure package, it’s crucial that it never produce erroneous or suprising results. Avoiding transformations, keeping things simple, and sticking close to the modelling package’s information feels like a safer strategy.Anyway, just my 2 cents. Feel free to ignore or tell me it's all dumb ;)