Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use EvoTrees instead of XGBoost in documentation #57
Use EvoTrees instead of XGBoost in documentation #57
Changes from 15 commits
19a09b9
faad5a9
8db3619
6abeb70
e13968c
0636445
4bac113
dac6e9b
caccd36
b352694
82dae88
07f1552
10f412f
e1d6abc
c69e32a
52e5a5a
df4772b
1cfaebc
36763fb
86baaaf
1018637
c1ca14b
80a8c68
a1f428b
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's too bad that this setup is so much more verbose than just calling
XGBoostClassifier()
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, we can just use
EvoTrees(; nrounds=100, eta=x)
(don't remember the default inXGBoostClassifier
) and would get the same setting sinceXGBoostClassifier
just usesnrounds = 100
by default without any tuning of this hyperparameter. Based on the comments above I thought it would be good though to highlight how it can be set/estimated in a better way. Maybe should add a comment though and showEvoTrees(; nrounds=100)
as well.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah for a usage example in the docstring I slightly prefer the simpler approach. But I agree that it is good to document the more robust approach somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we only support
Deterministic
andProbabilistic
, perhaps we should constrain the types forrstar
and update the docstring to only takeUnion{MLJModelInterface.Probabilistic,MLJModelInterface.Deterministic}
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, why not, probably it's more user-friendly to error out when calling
rstar
. I wonder though how useful the information would be in the docstring - do users actually know about Probabilistic/Deterministic or even MLJModelInterface?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps not. And I'm not certain how common the other subtypes of
Supervised
are.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also use StatisticalTraits, e.g.,
StatisticalTraits.prediction_type
, throw a descriptive error if arstar
is called with a model for which it is not:probabilistic
or:deterministic
(see https://github.com/JuliaAI/MLJModelInterface.jl/blob/d9e9703947fc04b0a5e63680289e41d0ba0d65bd/src/model_traits.jl#L27-L28), and dispatch on it (usingVal
, it seems all these traits returnSymbol
s but since they are based on the type of the models, the compiler should be smart enough to handle it).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And not add anything more to the docstring since we would not restrict the type at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prediction_type
shows up in the model search and if you printinfo(model)
: https://alan-turing-institute.github.io/MLJ.jl/dev/model_search/ So it seems to be quite official?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this presumes that the results of
fit
andpredict
for such models would be of the same form as we need. But it's not clear to me from the docs that they are (e.g. https://alan-turing-institute.github.io/MLJ.jl/dev/adding_models_for_general_use/#Outlier-detection-models). We would be relying on an undocumented trait implementation, where the decisions about which traits apply are not defined anywhere. In particular, for supervised outlier detection, it's not clear to me whether these models support multiple labels as we have. To be convinced, I'd need to see a test case of one of these detectors (see https://github.com/OutlierDetectionJL/OutlierDetection.jl) used to compute rstar.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's even an example that filters on
prediction_type
on the same page:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems there are large portions of their traits interface that are undocumented.
I wonder if instead of using
prediction_type
it is more useful to useinput_scitype
to check that the models accept tables of continuous values,target_scitype
to check that the model supports multiclass labels, and thenpredict_scitype
to determine whether the predictions are labels, probabilities, or something else (error).These have the benefit that the
scitype
interface is documented (mostly; notpredict_scitype
: https://alan-turing-institute.github.io/MLJ.jl/dev/adding_models_for_general_use/#Trait-declarations) and connects directly to what we need (we assume the model accepts certain inputs and labels and makes certain types of predictions).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a good idea but unfortunately it seems that
Pipeline
s do not support the traits properly (maybe they can't - even though when working with instances all information should be available?):And supporting
Unknown
as well seems to be less satisfying since that is the fallback for models that don't implement the traits... I wonder if this could/should be fixed in MLJBase forPipeline
, and hence in principle could work for models that implement traits?In any case, we could at least dispatch on the supported prediction types in
_rstar
instead of restricting it to specific model types.