-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dask] add support for eval sets and custom eval functions #4101
Conversation
_train_part model.fit args to lines Co-authored-by: James Lamb <[email protected]>
_train_part model.fit args to lines, pt2 Co-authored-by: James Lamb <[email protected]>
_train_part model.fit args to lines pt3 Co-authored-by: James Lamb <[email protected]>
dask_model.fit args to lines Co-authored-by: James Lamb <[email protected]>
Co-authored-by: James Lamb <[email protected]>
use is instead of id() Co-authored-by: James Lamb <[email protected]>
Co-authored-by: James Lamb <[email protected]>
Co-authored-by: James Lamb <[email protected]>
Co-authored-by: James Lamb <[email protected]>
Co-authored-by: James Lamb <[email protected]>
Co-authored-by: James Lamb <[email protected]>
Co-authored-by: James Lamb <[email protected]>
Co-authored-by: James Lamb <[email protected]>
Co-authored-by: James Lamb <[email protected]>
…cks. need to merge master - WiP
…support for eval_at for dask ranker
…ng to terminate too early
Ah, you just mean copying the contents of this note, right? Happy to duplicate. But could we just copy a link or say "see note for custom eval_metric functions in Sklearn API docs"? |
For this PR I'm totally fine with just one line of a concatenation in Dask code like this one LightGBM/python-package/lightgbm/sklearn.py Line 722 in c7134fa
Ideally, I think we can templatize it like other docstrings with shape types later. Or is it OK to use wording |
AH ok thanks, this makes sense. Addressed in 5d4ddc8 unless James thinks the custom eval note should be reformatted like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are in progress of migrating to f-strings: #4136.
Also, during rendering and checking my current suggestions I noticed that there are no Returns
sections for fit()
methods of Dask estimators. Is it OK?
I think that's ok. If it causes confusion in the future we can make it more Dask-specific.
I guess they should have a return block similar to those in the equivalent scikit-learn estimators, but there are no return sections for e.g. |
Co-authored-by: Nikita Titov <[email protected]>
Co-authored-by: Nikita Titov <[email protected]>
Co-authored-by: Nikita Titov <[email protected]>
Created #4402. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ffineis Thank you so much for all your hard work! Very important enhancement.
LGTM!
Thanks @StrikerRUS !! Appreciate the thorough vetting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did another review tonight, looks good to me! I noticed one thing but it's very small, so I'm going to approve / merge this and open a follow-up PR for it.
Thank you SO MUCH for your help with this very impactful contribution to the Dask interface.
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Followup PR regarding #3952 - implements
eval_set
functionality for lightgbm.dask but without early stopping.This is implemented this to work with all eval-* parameters:
eval_set
)eval_names
eval_sample_weight
eval_class_weight
(forDaskLGBMClassifier
only)eval_init_score
eval_group
(forDaskLGBMRanker
only)When an individual eval_set, eval_sample_weight, eval_init_score, or eval_group is the same as (data, label)/sample_weight/init_score/group, just use the latter instead of having to
compute
the training set/weights/init_score/group multiple times.This is all that's going on, making little mini eval sets out of delayed parts in a consistent manner:
Other things to know:
eval_set
parts. This check is now performed prior toclient.submit
found in_train
function. Model training still completes in this scenario, but depending on which worker returns itsfutures_classifier
,best_score_
andevals_result_
attributes can be empty or contain data. Moreover, when a worker is missingeval_set
entirely, this will fail out onceearly_stopping_rounds
becomes supported - local worker calls tomodel.fit(..., eval_data=None, early_stopping_rounds=x)
will throw a exception.