-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow handling of categorical predictor variables #44
Comments
Thanks for chiming in here @tiemvanderdeure, and for the offer of help. Yes, this is not a bug but a feature limitation. I expect (but haven't checked) that GLM is just one-hot encoding here, so as a workaround you could use MLJ's The ordinary way of extending functionality in this case starts by expanding the input = Table(Continuous, Finite) This is a contract that the user can supply any table for input Next, it will be up to the implementation to ensure it passes off the categorical columns in a form that GLM expects them to be. I don't know what this is - It might also be a good idea to store the class pools in the At present |
GLM is doing one-hot encoding, yes. (Or I think more specifically, StatsModels is). I think it would just be easiest to pass them on as a I think we could get away with just passing something like And yes, I'm all for some basic checks on data types provided. If we just reconstruct the model matrix using |
I did not know that GLM handles tabular input, and that I expect we should also mirror the new handling at the predict stage as well. |
Categorical variables are supported as of #45 |
when
fit!
ing a model, any categorical predictor variables are converted to floating-point values before the data is passed toGLM.lm
, so any information about levels is lost and the predictor is treated as if it were continuous.fit_data_scitype
does say categorical values aren't allowed, which leads me to think that might be some particular reason that categorical predictors are handled this way?If there isn't I'll go ahead and write a PR later this week.
GLM.lm
supports categorical predictor values, so I can't immediately see why this should be a problem.gives
The text was updated successfully, but these errors were encountered: