You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using eval_set in LightGBM, if a component (or components) of eval_set is just the training data (e.g. eval_set[0][0] is X and eval_set[0][1] is y), then that component's default eval_name is just "training" in the corresponding eval_set data artifacts like best_score_, evals_result_, etc. This happens by default when eval_names is None.
In the implementation of eval_set for DaskLGBMModels involves asking the (X, y) pair in each individual eval_set within eval_set "hey, are you just the training X and y?". If they are, then Dask LightGBM will not copy the training set parts so that we skip having to .compute() them multiple times on the dask cluster.
But training a DaskLGBM estimator with an eval_set that contains the training (X, y), although it is detected as such in _train_part, LightGBM does not automatically name this "training" in the default eval_names. Instead, LightGBM just names the validation set valid_<index> just like other non-training validation dataset components.
jameslamb
changed the title
training eval_set is does not default to "training" in Dasktraining eval_set does not default to "training" in Dask
Jun 22, 2021
Description
When using
eval_set
in LightGBM, if a component (or components) ofeval_set
is just the training data (e.g.eval_set[0][0] is X and eval_set[0][1] is y
), then that component's default eval_name is just"training"
in the correspondingeval_set
data artifacts likebest_score_
,evals_result_
, etc. This happens by default wheneval_names is None
.In the implementation of
eval_set
for DaskLGBMModels involves asking the(X, y)
pair in each individual eval_set withineval_set
"hey, are you just the trainingX
andy
?". If they are, then Dask LightGBM will not copy the training set parts so that we skip having to.compute()
them multiple times on the dask cluster.But training a DaskLGBM estimator with an
eval_set
that contains the training(X, y)
, although it is detected as such in_train_part
, LightGBM does not automatically name this"training"
in the defaulteval_names
. Instead, LightGBM just names the validation setvalid_<index>
just like other non-training validation dataset components.Note: as of #4101
Reproducible example
Environment info
LightGBM version or commit hash:
3.2.1.99
Command(s) you used to install LightGBM
Additional Comments
This may be an issue with distributed LightGBM training, not specifically the DaskLightGBM codebase in
dask.py
.The text was updated successfully, but these errors were encountered: