Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expose DMatrix weight in keyword arguments of xgboost()? #210

Closed
Moelf opened this issue Nov 27, 2024 · 3 comments
Closed

expose DMatrix weight in keyword arguments of xgboost()? #210

Moelf opened this issue Nov 27, 2024 · 3 comments

Comments

@Moelf
Copy link
Contributor

Moelf commented Nov 27, 2024

Currently, there's no way to train a XGBoost model with weights if you don't manually construct a DMatrix?

julia> bst = xgboost((df[!, [:a, :b]], y); sample_weight=weight)
[ Info: XGBoost: starting training.
┌ Warning: [12:07:04] WARNING: /workspace/srcdir/xgboost/src/learner.cc:742:
│ Parameters: { "sample_weight" } are not used.
└ @ XGBoost ~/.julia/packages/XGBoost/nqMqQ/src/XGBoost.jl:34
[ Info: [1]	train-rmse:0.93162391395696198
[ Info: [2]	train-rmse:0.75281592504136685
[ Info: [3]	train-rmse:0.61133704948309542
[ Info: [4]	train-rmse:0.49637999119159382
[ Info: [5]	train-rmse:0.40479635736799408
[ Info: [6]	train-rmse:0.33210060930888158
[ Info: [7]	train-rmse:0.27471060867502523
[ Info: [8]	train-rmse:0.22963046432969758
[ Info: [9]	train-rmse:0.19354232801162505
[ Info: [10]	train-rmse:0.16331169715654856
[ Info: Training rounds complete.
Booster()

julia> bst = xgboost((df[!, [:a, :b]], y); weight=weight)
[ Info: XGBoost: starting training.
┌ Warning: [12:07:04] WARNING: /workspace/srcdir/xgboost/src/learner.cc:742:
│ Parameters: { "weight" } are not used.
└ @ XGBoost ~/.julia/packages/XGBoost/nqMqQ/src/XGBoost.jl:34
[ Info: [1]	train-rmse:0.93162391395696198
[ Info: [2]	train-rmse:0.75281592504136685
[ Info: [3]	train-rmse:0.61133704948309542
[ Info: [4]	train-rmse:0.49637999119159382
[ Info: [5]	train-rmse:0.40479635736799408
[ Info: [6]	train-rmse:0.33210060930888158
[ Info: [7]	train-rmse:0.27471060867502523
[ Info: [8]	train-rmse:0.22963046432969758
[ Info: [9]	train-rmse:0.19354232801162505
[ Info: [10]	train-rmse:0.16331169715654856
[ Info: Training rounds complete.
Booster()

In Python wrapper, this is called sample_weight

@Moelf
Copy link
Contributor Author

Moelf commented Nov 27, 2024

I can make a PR if people think this is a good idea

@ExpandingMan
Copy link
Collaborator

I'm a little out of the loop here as I haven't worked on this package in a while, so please correct me if I get this wrong, but I believe the reason it works this way is that we are mirroring the C API for which weights is a parameter of the DMatrix. I don't really see why it's a problem to create it if it's needed for arguments, as I recall it has plenty of its own convenience methods so this shouldn't be hard.

I'm not necessarily opposed to adding an argument for xgboost, but maybe if you want to go that route, it would make more sense just to add a slot for the DMatrix args in general? dmatrix_args or something?

@Moelf
Copy link
Contributor Author

Moelf commented Nov 27, 2024

good point -- part of the motivation is that MLJ interface package simply pass keyword arguments to xgboost() so if somehting is not exposed via that, it needs to be handled specially in the interface package. https://github.com/JuliaAI/MLJXGBoostInterface.jl/blob/402861a70fb532f8eddec77dc9d40c6c515d6668/src/MLJXGBoostInterface.jl#L150

I guess "everything like C API" is a reasonable guide to follow. Looking at MLJ, https://juliaai.github.io/MLJ.jl/stable/weights/ it has a concept of weight, so maybe the right way to go about this is to make a PR to MLJXGBoostInterface

see: JuliaAI/MLJXGBoostInterface.jl#56

@Moelf Moelf closed this as completed Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants