-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Return fitted model as fitresult
#40
Comments
fitresult
fitresult
Mmm. I sense your frustration and the simplicity of your suggestion is naturally appealing. But I, for one, have concerns:
What if we return the raw @OkonSamuel may want to comment. |
@rikhuijzer all those should have been moved to the model's report. Keeping the glm_model object caused some concerns (see #16) as the model wraps the dataset itself. The model has hyper_parameters that you could use to specify what you want to extract from the glm_model. If it's an absolute must to have the whole glm_object returned, we could add it to the report rather than fitresult and add a key to that controls whether or not this is returned and also add a warning about memory concerns. |
Yes I was editing a LaTeX document, while running PDF generation, while thinking about the analysis, while having Pluto open, while reading supervisor comments, while going through StatsBase and GLM and I should probably not have filed an issue immediately and instead let it sit for a day! My apologies.
Yes I do think that that could work. That would make already some extra functions available I think.
Well I don't know if it's an absolute must. What I needed, for example, was a different p-value than the default pvalues = 2 .* Distributions.cdf(Distributions.Normal(), -abs.(z_scores)) and the GLM returns the z-score which you can extract by parsing the returned text 😅. So I understand that most of our problems here were caused by GLM having a weird API from the perspective of machine learning (storing data, not having much keywords, having very elaborate types). Maybe we should just let it be? My problem is solved for now via manually calculating p-values and parsing the text. Again. Apologies for the noise. |
I tried to use a
GLM
/StatsBase
function to extract data from the fitted model. However, this is impossible because this package does not return the original GLM model (fitted_lm
is dropped):MLJGLMInterface.jl/src/MLJGLMInterface.jl
Lines 367 to 383 in 924d0e9
This is fine for most use-cases, but has one problem: Julia relies heavily on the fact that people can "attach" arbitrary functions to certain objects. In this case, for example, Julia returns an
object when calling
GLM.lm(@formula(y ~ x), data)
. Subsequently, people can call functions likeTo get the confidence interval. Currently,
GLM.confint
cannot be called because the fitted model is dropped by MLJ.Can we switch MLJGLMInterface over to just report the fitted model as fitresult (specifically,
fitresult = fitted_lm
)? I suggest to fully drop theFitResult
struct that is defined in this package. Clients can still obtain information such as the coefficients via the appropriate GLM functions, such asGLM.coef
. We can add this and a few other functions to the docstring.Because this would be a breaking release, I also suggest to move to version 1.0.0, so that future releases can specify whether it's a major, minor, or patch release (https://semver.org).
@ablaom Any thoughts?
The text was updated successfully, but these errors were encountered: