Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dask] add support for custom objective functions (fixes #3934) #4920

Merged
merged 16 commits into from
Jan 17, 2022

Conversation

jameslamb
Copy link
Collaborator

@jameslamb jameslamb commented Dec 30, 2021

Fixes #3934.

This PR proposes changes to add support for the use of custom objective functions with lightgbm.dask. Nothing functionally needed to change in the package to support this, but this PR makes the following changes:

  • adds unit tests on the use of custom objective functions for main types of tasks
  • adds custom objective function note to the Dask API docs
  • adds a section in the "Distributed Learning Guide" showing an example of how to use a custom objective function in lightgbm.dask
  • updates the type hints in the Dask scikit-learn model objects (contributing to [python-package] type hints in python package #3756)

Where did the objective functions added in the tests come from?

Binary classification (logistic regression)

def logregobj(y_true, y_pred):

Multiclass classification (multi-logloss)

Converted this code from the R demos to Python. I tested it manually against the R package to confirm that the two functions produce the same results.

custom_multiclass_obj <- function(preds, dtrain) {

Ranking and Regression (regression_l2)

def objective_ls(y_true, y_pred):

I couldn't find an example of a custom objective function for the learning task. This is something that we've received feature requests about in the past (e.g. #1896, #2239, #3381) but none of those discussions produced Python or R code that could be used as a custom objective function for learning-to-rank with LightGBM.

I believe it's fine, for the purposes of these tests, to use the L2 objective for the ranking task. I found that it produced good-enough results (measured by spearman correlation of the predicted scores with the actual labels) that improved as I increased num_iterations, which I took as a good sign.

Why not just add a custom objective function test to test_{task}?

I considered adding something like the following to test_classifier, test_regressor, and test_ranker in the Dask tests:

@pytest.mark.parametrize('objective', ["regression_l2", _objective_least_squares])

That would ensure that custom objective functions are tested for every combination of data type, boosting type, and distributed training strategy.

However, I decided against this because there are already so many Dask tests, and I didn't want to add too much incremental burden to this project's CI. I believe the tests added in this PR provide good enough confidence that the Dask estimators are passing through custom objective functions to the lightgbm.sklearn equivalents.

@jameslamb
Copy link
Collaborator Author

One of the new tests, test_classifier_custom_objective[multiclass-classification-dataframe-with-categorical], failed across some (but not all) of the Python jobs.

AssertionError: found values in 'a' and 'b' which differ by more than the allowed amount

  • cuda 10.0 (link)
  • cuda 11.4.2 (link)
  • Linux_latest regular (link)
  • Linux bdist (link)
  • Linux sdist (link)
  • Linux regular (link)

Pushed 8b78419 to increase num_iterations and num_leaves, matching the values used in the existing test_classifier test.

"n_estimators": 50,
"num_leaves": 31

@jameslamb
Copy link
Collaborator Author

@ffineis if you have time / interest I'd welcome your thoughts on this PR too 👀

tests/python_package_test/test_dask.py Outdated Show resolved Hide resolved
tests/python_package_test/test_dask.py Outdated Show resolved Hide resolved
tests/python_package_test/test_dask.py Outdated Show resolved Hide resolved
@jmoralez
Copy link
Collaborator

@jameslamb I left some suggestions on the multi-class objective function to try to get about the same score as the built-in one. Was that your intention?

@jameslamb
Copy link
Collaborator Author

I left some suggestions on the multi-class objective function to try to get about the same score as the built-in one. Was that your intention?

Thanks very much @jmoralez ! I think you caught some critical mistakes in my implementation. My goal isn't to perfectly replicate the built-in one, just to have something good enough to use in unit tests to build confidence that the Dask estimators are handling custom objective functions correctly.

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! I have a few number of minor suggestions below:

tests/python_package_test/test_dask.py Outdated Show resolved Hide resolved
tests/python_package_test/test_dask.py Outdated Show resolved Hide resolved
tests/python_package_test/test_dask.py Outdated Show resolved Hide resolved
tests/python_package_test/test_dask.py Outdated Show resolved Hide resolved
tests/python_package_test/test_dask.py Outdated Show resolved Hide resolved
tests/python_package_test/test_dask.py Outdated Show resolved Hide resolved
@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Jan 5, 2022

One of the new tests, test_classifier_custom_objective[multiclass-classification-dataframe-with-categorical], failed across some (but not all) of the Python jobs.

It has failed again in Linux regular CI job:

            # probability estimates should be similar
>           assert_eq(p1_proba, p2_proba, atol=0.03)
...
>           assert allclose(a, b, **kwargs), msg
E           AssertionError: found values in 'a' and 'b' which differ by more than the allowed amount
...
=========================== short test summary info ============================
FAILED ../tests/python_package_test/test_dask.py::test_classifier_custom_objective[multiclass-classification-dataframe-with-categorical]
= 1 failed, 649 passed, 12 skipped, 2 xfailed, 398 warnings in 407.40s (0:06:47) =

https://dev.azure.com/lightgbm-ci/lightgbm-ci/_build/results?buildId=11911&view=logs&j=c28dceab-947a-5848-c21f-eef3695e5f11&t=fa158246-17e2-53d4-5936-86070edbaacf

@jameslamb
Copy link
Collaborator Author

It has failed again in Linux regular CI job

I think this multiclass objective I've added has numeric stability problems. I'll experiment a bit.

@jmoralez
Copy link
Collaborator

jmoralez commented Jan 5, 2022

In #4925 I'm testing that objective function against the built-in and seems to be working ok. Maybe increasing the learning rate to 0.1 could mimic the results from test_classifier and achieve that atol of 0.03

@jameslamb jameslamb force-pushed the dask-custom-objective branch from 903c0ad to 54df09e Compare January 13, 2022 03:30
@StrikerRUS
Copy link
Collaborator

I noticed that Linux Azure Pipelines CI jobs started to timeout randomly two or three days ago.

@jameslamb
Copy link
Collaborator Author

I noticed that Linux Azure Pipelines CI jobs started to timeout randomly two or three days ago.

Thanks for letting me know! Looks like that is what happened here as well.

By the way... @jmoralez @StrikerRUS I pushed changes to this PR last night and I think it's ready for another review. See the diff in 54df09e.

I found the following issues that were causing the multiclass classification tests to fail:

  1. .predict_proba() doesn't produce a probability when using a custom objective...it's identical to .predict(raw_score=True). That made the test code confusing, and I was accidentally comparing raw scores with code like assert_eq(p1_proba, p2_proba).
  2. I forgot to change the logic for going from probabilities to classes to be different for binary classification vs. multi-class classification.
  3. multiclass classification with categorical features was very sensitive to small differences in Dataset construction, probably as a result of the small number of samples used in these tests. I propose addressing this by removing more randomness by providing seed, deterministic=True, and force_col_wise=True.

I ran the test_classifier_custom_objective test as of this branch 50 consecutive times, using my dockerized setup for testing lightgbm.dask, and all 50 runs succeeded.

@jameslamb jameslamb requested a review from StrikerRUS January 13, 2022 21:06
Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for digging deep into multiclass objective problems!
I left some last minor comments below.

tests/python_package_test/test_dask.py Outdated Show resolved Hide resolved
tests/python_package_test/test_dask.py Outdated Show resolved Hide resolved
tests/python_package_test/test_dask.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jameslamb
Copy link
Collaborator Author

Thanks! @jmoralez do you want a chance to re-review? Since I made some nontrivial changes after your earlier ✅

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Jan 14, 2022

I noticed that Linux Azure Pipelines CI jobs started to timeout randomly two or three days ago.

Thanks for letting me know! Looks like that is what happened here as well.

I guess the following warning from timeouted CI job can help us to find a root cause:

2022-01-14T22:12:20.3133305Z tests/python_package_test/test_sklearn.py::test_sklearn_integration[LGBMRegressor()-check_regressors_train(readonly_memmap=True,X_dtype=float32)]
2022-01-14T22:12:20.3133965Z   /root/miniconda/envs/test-env/lib/python3.8/site-packages/threadpoolctl.py:546: RuntimeWarning: 
2022-01-14T22:12:20.3134409Z   Found Intel OpenMP ('libiomp') and LLVM OpenMP ('libomp') loaded at
2022-01-14T22:12:20.3134727Z   the same time. Both libraries are known to be incompatible and this
2022-01-14T22:12:20.3135026Z   can cause random crashes or deadlocks on Linux when loaded in the
2022-01-14T22:12:20.3135283Z   same Python program.
2022-01-14T22:12:20.3135545Z   Using threadpoolctl may cause crashes or deadlocks. For more
2022-01-14T22:12:20.3135827Z   information and possible workarounds, please see
2022-01-14T22:12:20.3136165Z       https://github.com/joblib/threadpoolctl/blob/master/multiple_openmp.md
2022-01-14T22:12:20.3136411Z   
2022-01-14T22:12:20.3136614Z     warnings.warn(msg, RuntimeWarning)

UPD: created #4948 for this.

tests/python_package_test/test_dask.py Outdated Show resolved Hide resolved
tests/python_package_test/test_dask.py Outdated Show resolved Hide resolved
@jameslamb jameslamb requested a review from jmoralez January 15, 2022 02:14
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[dask] Support custom objective functions
3 participants