You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Issue - if I use a fitted BinaryEncoder instance in a custom classifier, there is a ValueError
"ValueError: Must train encoder before it can be used to transform data."
Important is, that I do not want to fit the encoder again and again. It should be fitted once at the beginning on the whole dataset.
Traceback (most recent call last):
File "/opt/miniconda/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3319, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 41, in
res = cross_val_predict(cls, data.loc[:, ["A"]], data.loc[:, "B"], cv=2)
File "/opt/miniconda/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 755, in cross_val_predict
for train, test in cv.split(X, y, groups))
File "/opt/miniconda/lib/python3.7/site-packages/joblib/parallel.py", line 921, in call
if self.dispatch_one_batch(iterator):
File "/opt/miniconda/lib/python3.7/site-packages/joblib/parallel.py", line 759, in dispatch_one_batch
self._dispatch(tasks)
File "/opt/miniconda/lib/python3.7/site-packages/joblib/parallel.py", line 716, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/opt/miniconda/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 182, in apply_async
result = ImmediateResult(func)
File "/opt/miniconda/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 549, in init
self.results = batch()
File "/opt/miniconda/lib/python3.7/site-packages/joblib/parallel.py", line 225, in call
for func, args, kwargs in self.items]
File "/opt/miniconda/lib/python3.7/site-packages/joblib/parallel.py", line 225, in
for func, args, kwargs in self.items]
File "/opt/miniconda/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 841, in _fit_and_predict
estimator.fit(X_train, y_train, **fit_params)
File "", line 20, in fit
X = self.encoder.transform(X)
File "/opt/miniconda/lib/python3.7/site-packages/category_encoders/binary.py", line 125, in transform
return self.base_n_encoder.transform(X)
File "/opt/miniconda/lib/python3.7/site-packages/category_encoders/basen.py", line 214, in transform
raise ValueError('Must train encoder before it can be used to transform data.')
ValueError: Must train encoder before it can be used to transform data.
The text was updated successfully, but these errors were encountered:
Thank you for the report with the reproducible example.
The issue is related to the cloning in scikit: variables with an underscore at the beginning like _dim are not cloned. Unfortunately, it is not sufficient to just move away from the prefix to the suffix notation (e.g.: _dim -> dim_)... It requires further investigation.
Versions
sklearn: '0.22.1'
category_encoders: 2.1.0
Issue - if I use a fitted BinaryEncoder instance in a custom classifier, there is a ValueError
"ValueError: Must train encoder before it can be used to transform data."
Important is, that I do not want to fit the encoder again and again. It should be fitted once at the beginning on the whole dataset.
Minimal example:
The whole error message is:
Traceback (most recent call last):
File "/opt/miniconda/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3319, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 41, in
res = cross_val_predict(cls, data.loc[:, ["A"]], data.loc[:, "B"], cv=2)
File "/opt/miniconda/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 755, in cross_val_predict
for train, test in cv.split(X, y, groups))
File "/opt/miniconda/lib/python3.7/site-packages/joblib/parallel.py", line 921, in call
if self.dispatch_one_batch(iterator):
File "/opt/miniconda/lib/python3.7/site-packages/joblib/parallel.py", line 759, in dispatch_one_batch
self._dispatch(tasks)
File "/opt/miniconda/lib/python3.7/site-packages/joblib/parallel.py", line 716, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/opt/miniconda/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 182, in apply_async
result = ImmediateResult(func)
File "/opt/miniconda/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 549, in init
self.results = batch()
File "/opt/miniconda/lib/python3.7/site-packages/joblib/parallel.py", line 225, in call
for func, args, kwargs in self.items]
File "/opt/miniconda/lib/python3.7/site-packages/joblib/parallel.py", line 225, in
for func, args, kwargs in self.items]
File "/opt/miniconda/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 841, in _fit_and_predict
estimator.fit(X_train, y_train, **fit_params)
File "", line 20, in fit
X = self.encoder.transform(X)
File "/opt/miniconda/lib/python3.7/site-packages/category_encoders/binary.py", line 125, in transform
return self.base_n_encoder.transform(X)
File "/opt/miniconda/lib/python3.7/site-packages/category_encoders/basen.py", line 214, in transform
raise ValueError('Must train encoder before it can be used to transform data.')
ValueError: Must train encoder before it can be used to transform data.
The text was updated successfully, but these errors were encountered: