Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use EvoTrees instead of XGBoost in documentation #57

Merged
merged 24 commits into from
Jan 17, 2023
Merged

Conversation

devmotion
Copy link
Member

IMO XGBoost has never been a joy to work with in our tests (see the comment in #56). This PR replaces it with EvoTrees which is a native Julia implementation of gradient boosting (supposedly faster than XGBoost).

Fixes #56.

@devmotion
Copy link
Member Author

Seems similarly problematic 😅

@sethaxen
Copy link
Member

sethaxen commented Jan 4, 2023

Most of the errors seem to be due to EvoTrees not supporting all Tables. Seems to be an easy fix. I've opened an issue: Evovest/EvoTrees.jl#205

@sethaxen
Copy link
Member

sethaxen commented Jan 4, 2023

I've closed Evovest/EvoTrees.jl#205 since I think this is an issue in our implementation (see Evovest/EvoTrees.jl#205 (comment)).

Normally, to use MLJBase a user would construct a machine using MLJBase.machine, which ties together the data and the model. Internally, this method calls MLJModelInterface.reformat on the data to format it as expected by fit and predict. However, since we use only MLJModelInterface, we can't use machine. Though it's not explicitly documented, it seems MLJ expects that reformat is called before calling fit, so we should do that in our implementation. I'll open a PR.

@sethaxen
Copy link
Member

sethaxen commented Jan 4, 2023

The other error (https://github.com/TuringLang/MCMCDiagnosticTools.jl/actions/runs/3833986140/jobs/6525937107#step:6:773) is caused because we currently use an eltype equality check on the predictions to detect whether the model is deterministic or probabilistic, and EvoTree changes the eltype. The MLJModelInterface API requires that all probabilistic models be a subtype of Probabilistic and deterministic models subtype Deterministic, so we should just constrain our methods to take a union of those two model types and then dispatch on the model to choose the algorithm.

src/rstar.jl Outdated Show resolved Hide resolved
src/rstar.jl Outdated Show resolved Hide resolved
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
src/rstar.jl Outdated Show resolved Hide resolved
src/rstar.jl Outdated Show resolved Hide resolved
src/rstar.jl Outdated Show resolved Hide resolved
src/rstar.jl Outdated
Comment on lines 52 to 54
xtest, ytest = MLJModelInterface.reformat(
classifier, MLJModelInterface.selectrows(xtable, test_ids), ycategorical[train_ids]
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can assume that reformat will produce any particular output type for ytest, so perhaps better to directly index it:

Suggested change
xtest, ytest = MLJModelInterface.reformat(
classifier, MLJModelInterface.selectrows(xtable, test_ids), ycategorical[train_ids]
)
xtest, = MLJModelInterface.selectrows(classifier, test_ids, xdata)
ytest = ycategorical[test_ids]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be inconsistent with the suggestion above though? In line with the implementation above we could use

Suggested change
xtest, ytest = MLJModelInterface.reformat(
classifier, MLJModelInterface.selectrows(xtable, test_ids), ycategorical[train_ids]
)
xtest, ytest = MLJModelInterface.selectrows(classifier, test_ids, xdata, ydata)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that xdata and ydata are now some type that we don't know, because the reformat output could be anything. So here xtest and ytest are correctly a subset of the rows, but we don't know that ytest is a vector anymore. e.g. it could be a NamedTuple with the vector stored in some field.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed the output of predict would be of the same type as the reformatted labels. And in that case _rstar would fail anyway. But maybe that assumption is not correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs are somewhat ambiguous on this. They say this:

In the case of Deterministic models, yhat should have the same scitype as the y passed to fit (see above). If y is a CategoricalVector (classification) then elements of the prediction yhat must have a pool == to the pool of the target y presented in training, even if not all levels appear in the training data or prediction itself.

In the case of Probabilistic models with univariate targets, yhat must be an AbstractVector or table whose elements are distributions. In the common case of a vector (single target), this means one distribution per row of Xnew.

However, this docs page often makes statements about the inputs X and y that it clarifies are not applicable if one implements reformat. Since there is no unreformat method that converts the output of reformat to the vector type the user is expected to provide, I think we can surmise that predict is expected to return an AbstractVector type for the predictions of the same type that the user should provide to fit. So passing predictions and ycategorical[test_ids] to _rstar should be fine.

I'm not 100% sure about this, but I don't know what else might make sense. To be 100% sure about this, we would probably need to check what predict does for a Machine input.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where predict is defined for a Machine: https://github.com/JuliaAI/MLJBase.jl/blob/c5d755e157c853850d5f2c4b9693ddf8d9bd469a/src/operations.jl#L130-L139. There's no processing of the output of predict, so it must conform to the type expected by a user (vector with the same scitype as the user-provided y).

One of the MLJ devs on Slack confirmed that outputs of predict are not formatted.

@sethaxen sethaxen mentioned this pull request Jan 4, 2023
src/rstar.jl Outdated Show resolved Hide resolved
@devmotion
Copy link
Member Author

Some of the test failures are due to the non-reproducibility of the fits: EvoClassifier requires an rng keyword argument if one wants to fix a seed/RNG.

test/rstar.jl Outdated Show resolved Hide resolved
@jeremiedb
Copy link

@devmotion Model constructors such as EvoTreeClassifier already default to a specific seed (123). However, the resulting model from these contructor is a RNG, by defaultl TaskLocalRNG, whose state will change following a model fit. Therefore, to get same results between 2 runs, you could either build the classifier with EvoTreeClassifier(...) before each run, or call Random.seed!(1234) prior to each run.

@devmotion
Copy link
Member Author

The RNG issues are fixed by the last commit. The remaining test failures are caused by:

  • Constructing an EvoTreeClassifier is too verbose and prints @info about default keyword arguments (breaks doctests)
  • EvoTreeClassifier does not produce the desired result in two test (mean of the rstar distribution should be 2 but the classifier returns ~1.54)

Maybe the first point is an upstream issue and the second point could be fixed by choosing better hyperparameters (since tests pass with SVC and passed with XGBoost, maybe also possible to use better defaults upstream?).

@jeremiedb
Copy link

jeremiedb commented Jan 8, 2023

  • Constructing an EvoTreeClassifier is too verbose and prints @info about default keyword arguments (breaks doctests)

I'm not sure to understand how info printing could break tests. Such @info printing occurs in current tests without issue and were incorporated on purpose. Not sure if there's anything that is to be fixed on EveryTrees.jl side here.

  • EvoTreeClassifier does not produce the desired result in two test (mean of the rstar distribution should be 2 but the classifier returns ~1.54)

This is effectvely likely the results of different defaults. Notably, Xgboost has=100 nrounds and eta=0.3, while EvoTrees has nrounds=10 and eta=0.1. Noentheless, I'd strongly advise for any GBT based model to be fitted with nrounds determined based on a eval data early stopping. That being said, EvoClassifier model's are currently seeded with a equi-probability for each classes, which I think could be improved by defaulting to the train data observed probabilities. I'll make a quick update to change these default, so observed means would match even is the model per se is poorly trained.

@devmotion
Copy link
Member Author

devmotion commented Jan 8, 2023

Thank you for the explanations!

I'm not sure to understand how info printing could break tests

It breaks the current doctests: https://github.com/TuringLang/MCMCDiagnosticTools.jl/actions/runs/3864437268/jobs/6587210552#step:4:466 Personally, I'd argue that in general printing the keys of the unspecified keyword arguments is not very useful (as it does not list their values) and also a bit surprising (since in my experience that is not common behaviour for functions with keyword arguments). Maybe @info could be changed to @debug or a verbosity keyword argument/setting added? Another (better?) approach could be to not print anything in the constructor but only list the model settings when fitting, depending on the verbosity setting supported by MLJ. We also allow users to specify verbosity in our estimator and forward it to MLJ, so in that way users that would prefer more verbose output could still obtain the list/values of the model options by just setting verbosity.

@devmotion
Copy link
Member Author

Noentheless, I'd strongly advise for any GBT based model to be fitted with nrounds determined based on a eval data early stopping.

Users don't have access to the data the model is trained with and we would like them to specify arbitrary models. I'm not familiar with all details of MLJ - do you know if it is possible to specify a model such that MLJ.fit(model, X, y, verbosity) automatically splits X and y (randomly) in a training and evaluation data set and fit a model wrapped in model on the training data part with early stopping on the evaluation part?

@jeremiedb
Copy link

jeremiedb commented Jan 8, 2023

@devmotion I like the suggestion regardng the printing of the contructor, looks like a reasonable change to implement. I've removed the printing of params info in the constructor and I've instead delegated that reporting within the internal API fit_evotree.
Bias initialization for classifier will also be set to reflect actual classes overall probabilities.
These will be integrated in EvoTrees v0.14.5 which is waiting for merging on the Generl registry.

Regarding the error in doctest, for my personal education, looking at

julia> distribution = rstar(EvoTreeClassifier(), samples);
, I remain unclear why printing may result in the doctest to fail. Same line in a regular test wouldn't fail, so I'm likely missing some obvious behavior of doctest which I'm not familiar with.

Finally, regarding having a proper setup to get a properly calibrated iterative model, I think the following provides a fully working example: https://github.com/JuliaAI/MLJIteration.jl#sample-usage. Note to use eta parameter like EvoTreeClassifier(eta=0.1) rather than η since the later was from old synthax.

test/rstar.jl Outdated Show resolved Hide resolved
@jeremiedb
Copy link

It appeared th predicitons could underflow when runnning on Float32 for purely separatable data (~0%, ~100%). v0.14.6 fixes that and passes test locally (though SVM failed because of Julia v1.8.4 issues on Windows)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@coveralls
Copy link

coveralls commented Jan 9, 2023

Pull Request Test Coverage Report for Build 3920236472

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 21 of 21 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.007%) to 95.218%

Totals Coverage Status
Change from base Build 3906657106: 0.007%
Covered Lines: 677
Relevant Lines: 711

💛 - Coveralls

@devmotion
Copy link
Member Author

I also re-checked and indeed all direct dependents of MCMCDiagnosticTools already require at least 1.6 in their latest releases: https://juliahub.com/ui/Packages/MCMCDiagnosticTools/pEXcT/0.2.1?page=2

src/rstar.jl Outdated
@@ -161,7 +160,9 @@ function rstar(classif::MLJModelInterface.Supervised, x::AbstractArray{<:Any,3};
end

# R⋆ for deterministic predictions (algorithm 1)
function _rstar(predictions::AbstractVector{T}, ytest::AbstractVector{T}) where {T}
function _rstar(
::MLJModelInterface.Deterministic, predictions::AbstractVector, ytest::AbstractVector
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we only support Deterministic and Probabilistic, perhaps we should constrain the types for rstar and update the docstring to only take Union{MLJModelInterface.Probabilistic,MLJModelInterface.Deterministic}.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, why not, probably it's more user-friendly to error out when calling rstar. I wonder though how useful the information would be in the docstring - do users actually know about Probabilistic/Deterministic or even MLJModelInterface?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps not. And I'm not certain how common the other subtypes of Supervised are.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also use StatisticalTraits, e.g., StatisticalTraits.prediction_type, throw a descriptive error if a rstar is called with a model for which it is not :probabilistic or :deterministic (see https://github.com/JuliaAI/MLJModelInterface.jl/blob/d9e9703947fc04b0a5e63680289e41d0ba0d65bd/src/model_traits.jl#L27-L28), and dispatch on it (using Val, it seems all these traits return Symbols but since they are based on the type of the models, the compiler should be smart enough to handle it).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And not add anything more to the docstring since we would not restrict the type at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prediction_type shows up in the model search and if you print info(model): https://alan-turing-institute.github.io/MLJ.jl/dev/model_search/ So it seems to be quite official?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This point would actually be strongest argument in favour of traits: We would also support models that are not subtypes of Probabilistic or Deterministic (for whatever reason) and that we do not know about but whose predictions would still be of the desired form (probabilistic or deterministic).

I think this presumes that the results of fit and predict for such models would be of the same form as we need. But it's not clear to me from the docs that they are (e.g. https://alan-turing-institute.github.io/MLJ.jl/dev/adding_models_for_general_use/#Outlier-detection-models). We would be relying on an undocumented trait implementation, where the decisions about which traits apply are not defined anywhere. In particular, for supervised outlier detection, it's not clear to me whether these models support multiple labels as we have. To be convinced, I'd need to see a test case of one of these detectors (see https://github.com/OutlierDetectionJL/OutlierDetection.jl) used to compute rstar.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it seems to be quite official?

There's even an example that filters on prediction_type on the same page:

julia> filter(model) = model.is_supervised &&
                       model.input_scitype >: MLJ.Table(Continuous) &&
                       model.target_scitype >: AbstractVector{<:Multiclass{3}} &&
                       model.prediction_type == :deterministic

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there are large portions of their traits interface that are undocumented.

I wonder if instead of using prediction_type it is more useful to use input_scitype to check that the models accept tables of continuous values, target_scitype to check that the model supports multiclass labels, and then predict_scitype to determine whether the predictions are labels, probabilities, or something else (error).

These have the benefit that the scitype interface is documented (mostly; not predict_scitype: https://alan-turing-institute.github.io/MLJ.jl/dev/adding_models_for_general_use/#Trait-declarations) and connects directly to what we need (we assume the model accepts certain inputs and labels and makes certain types of predictions).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good idea but unfortunately it seems that Pipelines do not support the traits properly (maybe they can't - even though when working with instances all information should be available?):

julia> using MLJBase, MLJXGBoostInterface, MLJModelInterface

julia> const MMI = MLJModelInterface

julia> classifier = XGBoostClassifier();

julia> MMI.input_scitype(classifier)
Table{<:AbstractVector{<:Continuous}}

julia> MMI.target_scitype(classifier)
AbstractVector{<:Finite} (alias for AbstractArray{<:Finite, 1})

julia> MMI.predict_scitype(classifier)
AbstractVector{Density{<:Finite}} (alias for AbstractArray{ScientificTypesBase.Density{<:Finite}, 1})

julia> classifier = Pipeline(XGBoostClassifier(); operation=predict_mode);

julia> MMI.input_scitype(classifier)
Unknown

julia> MMI.target_scitype(classifier)
AbstractVector{<:Finite} (alias for AbstractArray{<:Finite, 1})

julia> MMI.predict_scitype(classifier)
Unknown

And supporting Unknown as well seems to be less satisfying since that is the fallback for models that don't implement the traits... I wonder if this could/should be fixed in MLJBase for Pipeline, and hence in principle could work for models that implement traits?

In any case, we could at least dispatch on the supported prediction types in _rstar instead of restricting it to specific model types.

@devmotion
Copy link
Member Author

Unfortunately, the Windows issues are not fixed on Julia 1.8.5.

test/rstar.jl Outdated Show resolved Hide resolved
test/rstar.jl Outdated Show resolved Hide resolved
probabilistic classifier.

```jldoctest rstar
julia> distribution = rstar(XGBoostClassifier(), samples);
julia> model = IteratedModel(;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's too bad that this setup is so much more verbose than just calling XGBoostClassifier().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we can just use EvoTrees(; nrounds=100, eta=x) (don't remember the default in XGBoostClassifier) and would get the same setting since XGBoostClassifier just uses nrounds = 100 by default without any tuning of this hyperparameter. Based on the comments above I thought it would be good though to highlight how it can be set/estimated in a better way. Maybe should add a comment though and show EvoTrees(; nrounds=100) as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah for a usage example in the docstring I slightly prefer the simpler approach. But I agree that it is good to document the more robust approach somewhere.

@devmotion
Copy link
Member Author

I added more explanations to the examples in the docstring in 36763fb and a draft for how possibly we could use traits in 86baaaf.

@devmotion devmotion changed the title Use EvoTrees instead of XGBoost Use EvoTrees instead of XGBoost in documentation Jan 15, 2023
Copy link
Member

@sethaxen sethaxen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I had a few minor suggestions. Otherwise I think it's only missing tests where the trait errors are thrown.

src/rstar.jl Show resolved Hide resolved
src/rstar.jl Outdated Show resolved Hide resolved
@devmotion
Copy link
Member Author

I refactored the code and added more tests.

@devmotion devmotion requested a review from sethaxen January 17, 2023 01:54
@sethaxen sethaxen closed this Jan 17, 2023
@sethaxen sethaxen reopened this Jan 17, 2023
Copy link
Member

@sethaxen sethaxen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@devmotion devmotion merged commit 2e07d21 into main Jan 17, 2023
@delete-merged-branch delete-merged-branch bot deleted the dw/evotrees branch January 17, 2023 12:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rstar tests failing
4 participants