-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dask] Result shape from DaskLGBMClassifier.predict(pred_contrib=True) for CSC matrices is inconsistent with LGBMClassifier #3881
Comments
Added this to #2302, where we store feature requests for this project. Anyone is welcome to contribute this feature. Leave a comment below if you'd like to pick it up and the issue can be re-opened. |
One addition:
The same is true for CSR matrix as well. See the following core Python API test to better understand what is expected: LightGBM/tests/python_package_test/test_engine.py Lines 1058 to 1100 in 217642c
|
re-opening this to note that I am currently working on a fix for this, to try to unblock #4351 |
…rices match those from sklearn estimators (fixes #3881) (#4378) * test_classifier working * adding tests * docs * tests * revert unnecessary changes in tests * test output type * linting * linting * use from_delayed() instead * docstring pycodestyle is happy with * isort * put pytest skips back * respect sparse return type * fix doc * remove unnecessary dask_array_concatenate() * Apply suggestions from code review Co-authored-by: Nikita Titov <[email protected]> * Apply suggestions from code review Co-authored-by: Nikita Titov <[email protected]> * update predict_proba() docstring * remove unnecessary np.array() * Update python-package/lightgbm/dask.py Co-authored-by: Nikita Titov <[email protected]> * fix assertion * fix test use of len() * restore np.array() in tests * use np.asarray() instead * use toarray() * remove empty functions in compat Co-authored-by: Nikita Titov <[email protected]>
See the discussion in #3866 (comment) for full details.
lightgbm.dask.DaskLGBMClassifier
tries to stay as close as possible to the API oflightgbm.sklearn.LGBMClassifier
. This feature describes one known inconsistency.In
lightgbm.sklearn.LGBMClassifier
, for multiclass classification tasks, if you call.predict(X, pred_contrib=True)
and X is ascipy.sparse.csc_matrix
, the result will be a list of CSC matrices, 1 per class.In
lightgbm.dask.DaskLGBMClassifier
, for multiclass classification taks, if you call.predict(X, pred_contrib=True)
and X is a Dask Array whose partitions are each ascipy.sparse.csc_matrix
, the result will be a Dask Array that, once.compute()
'd, returns ascipy.sparse.coo_matrix
.To complete this feature, try to make Dask's behavior match the behavior from
lightgbm.sklearn.LGBMClassifier
, or document why that can't / shouldn't be done.The text was updated successfully, but these errors were encountered: