-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retain data in tabular form #45
Conversation
After reading some old PRs I realised we need I just added the |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #45 +/- ##
==========================================
+ Coverage 96.79% 97.46% +0.67%
==========================================
Files 1 1
Lines 187 158 -29
==========================================
- Hits 181 154 -27
+ Misses 6 4 -2 ☔ View full report in Codecov by Sentry. |
Thanks @tiemvanderdeure for this substantial PR 🙏🏾 In testing locally, I was struggling to cook up data sets for which Cholesky factorization would not fail, so I tried a simple one-dimensional example (here adapted from the GLM docs). But it's not working either: using CategoricalArrays, StableRNGs, MLJModelInterface
using MLJGLMInterface # dev'ed from the source branch of this PR
rng = StableRNG(1); # Ensure example can be reproduced
y = rand(rng, 100)
X = (;x = categorical(repeat([1, 2, 3, 4], 25)))
model = LinearRegressor()
MLJModelInterface.fit(model, 0, X, y)
# ERROR: PosDefException: matrix is not positive definite; Cholesky factorization failed.
# Stacktrace:
# [1] checkpositivedefinite
# @ /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/LinearAlgebra/src/factorization.jl:67 [inlined]
# [2] #cholesky!#140
# @ /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/LinearAlgebra/src/cholesky.jl:269 [inlined]
# [3] cholesky! (repeats 2 times)
# @ /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/LinearAlgebra/src/cholesky.jl:267 [inlined] Am I missing something? |
Good catch! Clearly we need some more tests. I just looked at this and there are a few things going on here. Firstly Secondly, it appears the position of I'll implement the second straight away. Changing default settings might be a bit trickier, what do you think? |
Generally we mirror the defaults of the wrapped model. In this commit (some time back) I see that @OkonSamuel Do you remember a reason for making the new default |
Maybe we should make a new PR for the defaults, just to get things not too mixed up. Checks pass and as far as I am concerned this PR is good to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Good to go.
Thanks @tiemvanderdeure for your valuable contribution and patience. 🙏🏾
Thanks @tiemvanderdeure! I don't know the details of this PR, but I see 160 lines removed and only 100 added so that looks very good to me! Thanks to you and Anthony for improving this package 😃 |
I rewrote some parts of this package in order to interface with GLM.jl more directly. Hopefully, this will make sure all the functionality GLM.jl has will also be available through MLJ.
This pull solves both #44 and #42.
I removed
_matrix_and_features
as this is where the variable types (CategoricalVector
) was lost.In order to predict with categorical values properly, we need the entire object returned by
GLM.glm
, as this contains information about classes. For now, I simply removed theFitResult
and return theTableRegressionModel
returned byGLM.glm
instead (we can revert this if it is problematic, and find some other solution). I also changedpredict
, so we callGLM.predict
, rather than constructing the model matrix in this package.GLM.predict
callsStatsModels.modelcols
, which does the dummy-encoding for categorical variables.