Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

appropriate R^2 for fitted models? #44

Closed
YSanchezAraujo opened this issue Feb 14, 2024 · 2 comments
Closed

appropriate R^2 for fitted models? #44

YSanchezAraujo opened this issue Feb 14, 2024 · 2 comments

Comments

@YSanchezAraujo
Copy link

YSanchezAraujo commented Feb 14, 2024

Is there a function (I've searched the API and can't find it but maybe have missed it) to compute variance explained? I'm using robust models from this package to compute p-values for correlations in the case of a single independent variable:

y = b0 + b1 * x

ideally I'd also want to compute the correaltion coefficient, which in the model above in the ols case is just sign(b1) * sqrt(R^2), but in this case I can't simply predict the responses and compute R^2 as per usual because of the potential for negative values.

I see in the API one possibility is (i think)

R^2 = 1 - StatsBase.deviance(model) / StatsBase.nulldeviance(model)

but I'm wondering if there's potentially the same issue here?

@getzze
Copy link
Owner

getzze commented Feb 15, 2024

There is no exact equivalent of R2 for robust models, but different definitions of pseudo-coefficients of determination.
See https://juliastats.org/StatsBase.jl/stable/statmodels/#StatsAPI.r2 , where there is the deviance formula you mentioned.

You can get it like that also:

R2 = StatsBase.r2(model, :devianceratio)

I would use this one as it is stricly equivalent to R2 in the OLS case, but I don't know if it's guaranteed to be non-negative. If it is negative, it means the fit with a slope is worse than no fit, in this case, you can consider that R2=0.

But I am not sure this is what you want. If what you want is a signed correlation, it would be better to z-score y and X before doing the regression so the coefficients b0 and b1 are scaled. For OLS, z-scoring before the regression leads to a0 = 0 and a1 = Σxi yi which is the same as computing the Pearson correlation of the z-scored x and y.

You can use StatsBase.ZScoreTransform to z-score:
https://juliastats.org/StatsBase.jl/stable/transformations/#Standardization-a.k.a-Z-score-Normalization

@YSanchezAraujo
Copy link
Author

thanks for the response and additional info! closing this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants