[python] Lightgbm.sklearn LGBMRegressor Subclassing Support #3758

civilinformer · 2021-01-13T18:13:17Z

I am trying to plug the LGBMRegressor into my ML pipeline and so need to subclass it.

That does not seem to work, well at least parameters are not set and I have not gotten any further than that.

Here is some code to reproduce the problem:

from lightgbm.sklearn import LGBMRegressor

class LRegressor(LGBMRegressor):
    '''
    A light wrapper over LGBMRegressor to deal with the usual problems.
    '''
    version = 0.1
    __version__ = version

    lg_params = [ 'lgbm_use_gpu' ]

    def __init__(self, lgbm_use_gpu=True, **params):
        self.lgbm_use_gpu = lgbm_use_gpu

        lgbm_params = {}
        for key, value in params.items():
            if key not in LRegressor.lg_params:
                lgbm_params[key] = value

        if self.lgbm_use_gpu:
            lgbm_params['device'] = 'gpu'
            lgbm_params['gpu_device_id'] = 0
            lgbm_params['gpu_platform_id'] = 0
            lgbm_params['gpu_use_db'] = True
            lgbm_params['max_bin'] = 256

        super().__init__(**lgbm_params)

    def set_params(self, **params):
        lgbm_params = super().get_params()
        new_params = {}
        wrapper_params = {}
        for key, value in params.items():
            if key in LRegressor.lg_params:
                wrapper_params[key] = value
            elif key in lgbm_params:
                new_params[key] = value
            else:
                print(f"Unknown parameter {key} attempting to set LRegressor with value: {value}")

        for key, value in wrapper_params.items():
            setattr(self, key, value)

        super().set_params(**new_params)

    def get_params(self, deep=False):

        params = super().get_params()
        for key in LRegressor.lg_params:
            params[key] = getattr(self, key)

        return params

Now when trying to use this:

In [1]: from LRegressor import LRegressor as LG

In [2]: lg = LG(silent=False, gpu_device_id=3)                                                                                                                                      
Unknown parameter gpu_device_id attempting to set LRegressor with value: 0
Unknown parameter device attempting to set LRegressor with value: gpu
Unknown parameter gpu_platform_id attempting to set LRegressor with value: 0
Unknown parameter gpu_use_db attempting to set LRegressor with value: True
Unknown parameter max_bin attempting to set LRegressor with value: 256

In [3]: lg.get_params()                                                                                                                                                             
Out[3]: {'lgbm_use_gpu': True}

There is no effect. For some reason subclassing does not work?!

On the other hand getting params works for LGBMRegressor directly, as it should:

In [4]: from lightgbm.sklearn import LGBMRegressor as LG                                                                                                                           

In [5]: lg = LG()                                                                                                                                                                  

In [6]: lg.get_params()                                                                                                                                                            
Out[6]: 
{'boosting_type': 'gbdt',
 'class_weight': None,
 'colsample_bytree': 1.0,
 'importance_type': 'split',
 'learning_rate': 0.1,
 'max_depth': -1,
 'min_child_samples': 20,
 'min_child_weight': 0.001,
 'min_split_gain': 0.0,
 'n_estimators': 100,
 'n_jobs': -1,
 'num_leaves': 31,
 'objective': None,
 'random_state': None,
 'reg_alpha': 0.0,
 'reg_lambda': 0.0,
 'silent': True,
 'subsample': 1.0,
 'subsample_for_bin': 200000,
 'subsample_freq': 0}

Any insight as to why subclassing is not working? Shouldn't it work?

The text was updated successfully, but these errors were encountered:

jhn-nt · 2021-09-13T12:45:35Z

Having the same issue with LGBMClassifier, subclassing using the sklearn interface seems quite burdensome.

jameslamb · 2021-10-02T05:15:43Z

This issue was originally opened in January 2021, prior to the lightgbm 3.2.0 release (March 2021).

There have been many changes in lightgbm 3.2.0, 3.2.1, and in the upcoming 3.3.0 release (#4633). I think #3192, which changed the method resolution order to comply with scikit-learn's recommendations, is especially relevant here.

For some reason subclassing does not work?!

I don't think it's accurate to say "subclassing does not work". For example, consider the sample code below.

On latest master (a77260f), I'm able to sub-class lightgbm.sklearn.LGBMRegressor without issue.

git clone --recursive https://github.com/microsoft/LightGBM.git
cd LightGBM/python-package
python setup.py install

from lightgbm import LGBMRegressor
from copy import deepcopy

class CustomRegressor(LGBMRegressor):
    """
    Like ``lightgbm.sklearn.LGBMRegressor``, but always
    sets ``learning_rate`` to 0.123 regardless of what you pass to the constructor,
    just to show it can be done.
    """
    def set_params(self, **params):
        new_params = deepcopy(super().get_params())
        new_params['learning_rate'] = 0.123
        super().set_params(**new_params)

# instantiate a model
reg = CustomRegressor(learning_rate = 0.3)

# notice: the learning_rate value passed to the constructor was ignored and replaced with 0.123
reg.get_params()["learniing_rate"]
# 0.123

# confirm that you can train model with this sub-class
from sklearn.datasets import make_regression
X, y = make_regression()
reg.fit(X, y)
# CustomRegressor(learning_rate=0.123)

I think it would be more accurate to say that "it is not obvious how to create a sub-class of one of lightgbm's scikit-learn estimators which overrides parameters in the constructor".

This is definitely challenging! I was confused by this too until I got some help from @StrikerRUS in #3883 (for example, #3883 (review)).

An approach like the one in the original post will not work because it is incompatible with scikit-learn's expectations for how estimator classes are written, and violating those expectations can lead to unexpected and confusing behavior.

From https://scikit-learn.org/stable/developers/develop.html#instantiation

[in an __init__] There should be no logic, not even input validation, and the parameters should not be changed.

And in https://scikit-learn.org/stable/developers/develop.html#parameters-and-init

As model_selection.GridSearchCV uses set_params to apply parameter setting to estimators, it is essential that calling set_params has the same effect as setting parameters using the __init__ method. The easiest and recommended way to accomplish this is to not do any parameter validation in __init__.

This is why lightgbm's scikit-learn estimators do not call super().__init__(), and store anything passed through **kwargs and not matching an explicit keyword arguments in a private attribute self._other_params.

LightGBM/python-package/lightgbm/sklearn.py

Line 353 in a77260f

def __init__(

And it's why lightgbm's Dask estimators (which sub-class lightgbm.sklearn.LGBMRegressor, lightgbm.sklearn.LGBMClassifier, and lgbm.sklearn.LGBMRanker), use explicit keyword arguments when calling super().__init__().

LightGBM/python-package/lightgbm/dask.py

Lines 1118 to 1140 in a77260f

    
           super().__init__( 
        
               boosting_type=boosting_type, 
        
               num_leaves=num_leaves, 
        
               max_depth=max_depth, 
        
               learning_rate=learning_rate, 
        
               n_estimators=n_estimators, 
        
               subsample_for_bin=subsample_for_bin, 
        
               objective=objective, 
        
               class_weight=class_weight, 
        
               min_split_gain=min_split_gain, 
        
               min_child_weight=min_child_weight, 
        
               min_child_samples=min_child_samples, 
        
               subsample=subsample, 
        
               subsample_freq=subsample_freq, 
        
               colsample_bytree=colsample_bytree, 
        
               reg_alpha=reg_alpha, 
        
               reg_lambda=reg_lambda, 
        
               random_state=random_state, 
        
               n_jobs=n_jobs, 
        
               silent=silent, 
        
               importance_type=importance_type, 
        
               **kwargs 
        
           )

If you want to achieve this behavior of "set some parameters based on the value of others" , you might have an easier time and run into less surprises by overriding set_params() in a sub-class.

I think that the sample code below, for example, accomplishes the same thing as the intent of the post at the top of this issue ("set other GPU parameters to specific values based on whether or not I'm using the GPU").

from lightgbm import LGBMRegressor

class LRegressor(LGBMRegressor):
        
    def set_params(self, **params):
        new_params = deepcopy(params)
        if new_params.get("device", None) == "gpu":
            print("using GPU")
            self.gpu_device_id = 0
            new_params['gpu_device_id'] = 0
            new_params['gpu_platform_id'] = 0
            new_params['gpu_use_db'] = True
            new_params['max_bin'] = 256
        else:
            print("not using GPU")
        super().set_params(**new_params)

mod = LRegressor(device="gpu")

# notice that all those params like `gpu_device_id`, `gpu_use_db` are set
mod.get_params()

no-response · 2021-11-01T06:11:43Z

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM!

github-actions · 2023-08-23T14:09:00Z

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

jameslamb added the question label Feb 7, 2021

jameslamb changed the title ~~Lightgbm.sklearn LGBMRegressor Subclassing Support~~ [python] Lightgbm.sklearn LGBMRegressor Subclassing Support Oct 2, 2021

jameslamb added the awaiting response label Oct 2, 2021

no-response bot closed this as completed Nov 1, 2021

psyntelis mentioned this issue Feb 16, 2022

[python] how do I sub-class lightgbm.sklearn estimators and add new parameters? #5010

Closed

github-actions bot removed the awaiting response label Aug 23, 2023

github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python] Lightgbm.sklearn LGBMRegressor Subclassing Support #3758

[python] Lightgbm.sklearn LGBMRegressor Subclassing Support #3758

civilinformer commented Jan 13, 2021

jhn-nt commented Sep 13, 2021

jameslamb commented Oct 2, 2021

no-response bot commented Nov 1, 2021

github-actions bot commented Aug 23, 2023

[python] Lightgbm.sklearn LGBMRegressor Subclassing Support #3758

[python] Lightgbm.sklearn LGBMRegressor Subclassing Support #3758

Comments

civilinformer commented Jan 13, 2021

jhn-nt commented Sep 13, 2021

jameslamb commented Oct 2, 2021

no-response bot commented Nov 1, 2021

github-actions bot commented Aug 23, 2023