expose DMatrix `weight` in keyword arguments of `xgboost()`? #210

Moelf · 2024-11-27T17:09:00Z

Currently, there's no way to train a XGBoost model with weights if you don't manually construct a DMatrix?

julia> bst = xgboost((df[!, [:a, :b]], y); sample_weight=weight)
[ Info: XGBoost: starting training.
┌ Warning: [12:07:04] WARNING: /workspace/srcdir/xgboost/src/learner.cc:742:
│ Parameters: { "sample_weight" } are not used.
└ @ XGBoost ~/.julia/packages/XGBoost/nqMqQ/src/XGBoost.jl:34
[ Info: [1]	train-rmse:0.93162391395696198
[ Info: [2]	train-rmse:0.75281592504136685
[ Info: [3]	train-rmse:0.61133704948309542
[ Info: [4]	train-rmse:0.49637999119159382
[ Info: [5]	train-rmse:0.40479635736799408
[ Info: [6]	train-rmse:0.33210060930888158
[ Info: [7]	train-rmse:0.27471060867502523
[ Info: [8]	train-rmse:0.22963046432969758
[ Info: [9]	train-rmse:0.19354232801162505
[ Info: [10]	train-rmse:0.16331169715654856
[ Info: Training rounds complete.
Booster()

julia> bst = xgboost((df[!, [:a, :b]], y); weight=weight)
[ Info: XGBoost: starting training.
┌ Warning: [12:07:04] WARNING: /workspace/srcdir/xgboost/src/learner.cc:742:
│ Parameters: { "weight" } are not used.
└ @ XGBoost ~/.julia/packages/XGBoost/nqMqQ/src/XGBoost.jl:34
[ Info: [1]	train-rmse:0.93162391395696198
[ Info: [2]	train-rmse:0.75281592504136685
[ Info: [3]	train-rmse:0.61133704948309542
[ Info: [4]	train-rmse:0.49637999119159382
[ Info: [5]	train-rmse:0.40479635736799408
[ Info: [6]	train-rmse:0.33210060930888158
[ Info: [7]	train-rmse:0.27471060867502523
[ Info: [8]	train-rmse:0.22963046432969758
[ Info: [9]	train-rmse:0.19354232801162505
[ Info: [10]	train-rmse:0.16331169715654856
[ Info: Training rounds complete.
Booster()

In Python wrapper, this is called sample_weight

The text was updated successfully, but these errors were encountered:

Moelf · 2024-11-27T17:10:36Z

I can make a PR if people think this is a good idea

ExpandingMan · 2024-11-27T17:28:30Z

I'm a little out of the loop here as I haven't worked on this package in a while, so please correct me if I get this wrong, but I believe the reason it works this way is that we are mirroring the C API for which weights is a parameter of the DMatrix. I don't really see why it's a problem to create it if it's needed for arguments, as I recall it has plenty of its own convenience methods so this shouldn't be hard.

I'm not necessarily opposed to adding an argument for xgboost, but maybe if you want to go that route, it would make more sense just to add a slot for the DMatrix args in general? dmatrix_args or something?

Moelf · 2024-11-27T21:19:55Z

good point -- part of the motivation is that MLJ interface package simply pass keyword arguments to xgboost() so if somehting is not exposed via that, it needs to be handled specially in the interface package. https://github.com/JuliaAI/MLJXGBoostInterface.jl/blob/402861a70fb532f8eddec77dc9d40c6c515d6668/src/MLJXGBoostInterface.jl#L150

I guess "everything like C API" is a reasonable guide to follow. Looking at MLJ, https://juliaai.github.io/MLJ.jl/stable/weights/ it has a concept of weight, so maybe the right way to go about this is to make a PR to MLJXGBoostInterface

see: JuliaAI/MLJXGBoostInterface.jl#56

Moelf closed this as completed Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

expose DMatrix `weight` in keyword arguments of `xgboost()`? #210

expose DMatrix `weight` in keyword arguments of `xgboost()`? #210

Moelf commented Nov 27, 2024

Moelf commented Nov 27, 2024

ExpandingMan commented Nov 27, 2024

Moelf commented Nov 27, 2024 •

edited

Loading

expose DMatrix weight in keyword arguments of xgboost()? #210

expose DMatrix weight in keyword arguments of xgboost()? #210

Comments

Moelf commented Nov 27, 2024

Moelf commented Nov 27, 2024

ExpandingMan commented Nov 27, 2024

Moelf commented Nov 27, 2024 • edited Loading

expose DMatrix `weight` in keyword arguments of `xgboost()`? #210

expose DMatrix `weight` in keyword arguments of `xgboost()`? #210

Moelf commented Nov 27, 2024 •

edited

Loading