Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpart fails to predict on titanic task #97

Closed
QuayAu opened this issue Dec 20, 2018 · 8 comments
Closed

rpart fails to predict on titanic task #97

QuayAu opened this issue Dec 20, 2018 · 8 comments

Comments

@QuayAu
Copy link
Contributor

QuayAu commented Dec 20, 2018

task = mlr_tasks$get("titanic")
learner = mlr_learners$get("classif.rpart")
resampling = mlr_resamplings$get("cv")

resample(task, learner, resampling)

throws:

 Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = attr(object,  : 
  factor Cabin has new levels A14, A21, B101, B39, B69, B78, C104, C106, C116, C28, C49, C7, C82, C86, C95, D19, D9, F, T 
@mllg
Copy link
Sponsor Member

mllg commented Dec 20, 2018

Titanic needs preprocessing, better don't use it for automatic tests.

@mllg mllg closed this as completed Dec 20, 2018
@mllg
Copy link
Sponsor Member

mllg commented Dec 20, 2018

You could create a preprocessed version and add it to the package, though.

@berndbischl berndbischl reopened this Dec 20, 2018
@berndbischl
Copy link
Sponsor Member

berndbischl commented Dec 20, 2018

@QuayAu really dont use titanic, for what we discussed yesterday. seems a bad choice

@mllg isnt the posted issue at leats relevant?
what i mean is: mlr3 shouldnt fail on something like this?

@mllg
Copy link
Sponsor Member

mllg commented Dec 20, 2018

rpart breaks, there is nothing I can do about it. You need to merge levels or use stratification.

@berndbischl
Copy link
Sponsor Member

berndbischl commented Dec 20, 2018

rpart breaks, there is nothing I can do about it. You need to merge levels or use stratification.

bah, i hate this issue. does anyone know what they are doing in other toolboxes?

is it possible to have a "fallback"? that would only work if the fallback is only on the observations where we break, which would imply testing them all one-by-one. which is totally infeasible as too expensive?

@berndbischl
Copy link
Sponsor Member

berndbischl commented Dec 20, 2018

well. we DO know which levels a learner has seen during training? so we DO know on which observations it WILL break? even without calling it?

@berndbischl
Copy link
Sponsor Member

in any case, this does not seem like an mlr3 issue, and if somebody does not post otherwise, i dont see a simple solution here. i will try to think about this further in pipelines

@berndbischl
Copy link
Sponsor Member

berndbischl commented Dec 20, 2018

i tried to outline a solution in the issue here:

mlr-org/mlr3pipelines#71

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants