-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dask] add support for custom objective functions (fixes #3934) #4920
Conversation
One of the new tests,
Pushed 8b78419 to increase LightGBM/tests/python_package_test/test_dask.py Lines 279 to 280 in af5b40e
|
@ffineis if you have time / interest I'd welcome your thoughts on this PR too 👀 |
@jameslamb I left some suggestions on the multi-class objective function to try to get about the same score as the built-in one. Was that your intention? |
Co-authored-by: José Morales <[email protected]>
Thanks very much @jmoralez ! I think you caught some critical mistakes in my implementation. My goal isn't to perfectly replicate the built-in one, just to have something good enough to use in unit tests to build confidence that the Dask estimators are handling custom objective functions correctly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! I have a few number of minor suggestions below:
Co-authored-by: Nikita Titov <[email protected]>
It has failed again in
|
I think this multiclass objective I've added has numeric stability problems. I'll experiment a bit. |
In #4925 I'm testing that objective function against the built-in and seems to be working ok. Maybe increasing the learning rate to 0.1 could mimic the results from |
903c0ad
to
54df09e
Compare
I noticed that Linux Azure Pipelines CI jobs started to timeout randomly two or three days ago. |
Thanks for letting me know! Looks like that is what happened here as well. By the way... @jmoralez @StrikerRUS I pushed changes to this PR last night and I think it's ready for another review. See the diff in 54df09e. I found the following issues that were causing the multiclass classification tests to fail:
I ran the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for digging deep into multiclass objective problems!
I left some last minor comments below.
Co-authored-by: Nikita Titov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thanks! @jmoralez do you want a chance to re-review? Since I made some nontrivial changes after your earlier ✅ |
I guess the following warning from timeouted CI job can help us to find a root cause:
UPD: created #4948 for this. |
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Fixes #3934.
This PR proposes changes to add support for the use of custom objective functions with
lightgbm.dask
. Nothing functionally needed to change in the package to support this, but this PR makes the following changes:lightgbm.dask
Where did the objective functions added in the tests come from?
Binary classification (logistic regression)
LightGBM/tests/python_package_test/test_sklearn.py
Line 57 in af5b40e
Multiclass classification (multi-logloss)
Converted this code from the R demos to Python. I tested it manually against the R package to confirm that the two functions produce the same results.
LightGBM/R-package/demo/multiclass_custom_objective.R
Line 45 in af5b40e
Ranking and Regression (regression_l2)
LightGBM/tests/python_package_test/test_sklearn.py
Line 51 in af5b40e
I couldn't find an example of a custom objective function for the learning task. This is something that we've received feature requests about in the past (e.g. #1896, #2239, #3381) but none of those discussions produced Python or R code that could be used as a custom objective function for learning-to-rank with LightGBM.
I believe it's fine, for the purposes of these tests, to use the L2 objective for the ranking task. I found that it produced good-enough results (measured by spearman correlation of the predicted scores with the actual labels) that improved as I increased
num_iterations
, which I took as a good sign.Why not just add a custom objective function test to
test_{task}
?I considered adding something like the following to
test_classifier
,test_regressor
, andtest_ranker
in the Dask tests:@pytest.mark.parametrize('objective', ["regression_l2", _objective_least_squares])
That would ensure that custom objective functions are tested for every combination of data type, boosting type, and distributed training strategy.
However, I decided against this because there are already so many Dask tests, and I didn't want to add too much incremental burden to this project's CI. I believe the tests added in this PR provide good enough confidence that the Dask estimators are passing through custom objective functions to the
lightgbm.sklearn
equivalents.