-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does MAPIE Regressor support categorical variables? #406
Comments
Hi @valeman, could you provide some code so that we can see the error ? |
Here @vincentblot28, the code runs fine with underlying regressor CatBoost but gives out error with LightGBM. conformity_score = ResidualNormalisedScore(residual_estimator=sd_predictor, prefit=True) '--------------------------------------------------------------------------- File ~/miniconda3/envs/py39/lib/python3.9/site-packages/mapie/regression/regression.py:539, in MapieRegressor.fit(self, X, y, sample_weight) File ~/miniconda3/envs/py39/lib/python3.9/site-packages/mapie/conformity_scores/conformity_scores.py:211, in ConformityScore.get_conformity_scores(self, X, y, y_pred) File ~/miniconda3/envs/py39/lib/python3.9/site-packages/mapie/conformity_scores/residual_conformity_scores.py:403, in ResidualNormalisedScore.get_signed_conformity_scores(self, X, y, y_pred) File ~/miniconda3/envs/py39/lib/python3.9/site-packages/mapie/conformity_scores/residual_conformity_scores.py:352, in ResidualNormalisedScore._predict_residual_estimator(self, X) File ~/miniconda3/envs/py39/lib/python3.9/site-packages/lightgbm/sklearn.py:934, in LGBMModel.predict(self, X, raw_score, start_iteration, num_iteration, pred_leaf, pred_contrib, validate_features, **kwargs) File ~/miniconda3/envs/py39/lib/python3.9/site-packages/sklearn/utils/validation.py:915, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name) File ~/miniconda3/envs/py39/lib/python3.9/site-packages/sklearn/utils/_array_api.py:380, in _asarray_with_order(array, dtype, order, copy, xp) ValueError: could not convert string to float: 'class 1'' |
Thanks, I think I would need more details (for instance, what is you sd_estimator ?). Could you give a reproducible example so I can run it ? |
@vincentblot28 It think this problem is related to the fact that we can pass to LightGBM a DataFrame with categorical columns to get a prediction. However, if we pass the same information in a numpy format, LightGBM throws an exception. @valeman Below a simple fix that works with LightGBM. def get_signed_conformity_scores(
self,
X: ArrayLike,
y: ArrayLike,
y_pred: ArrayLike
) -> NDArray:
# .....
(X_array, y_array, y_pred,
self.residual_estimator_,
random_state) = self._check_parameters(X, y, y_pred)
full_indexes = np.argwhere(
np.logical_not(np.isnan(y_pred))
).reshape((-1,))
if not self.prefit:
cal_indexes, res_indexes = train_test_split(
full_indexes,
test_size=self.split_size,
random_state=random_state,
)
# ToDo: Check how workaround that
X_array = pd.DataFrame(X_array, columns=X.columns)
X_array = X_array.astype(X.dtypes.to_dict())
self.residual_estimator_ = self._fit_residual_estimator(
clone(self.residual_estimator_),
X_array.iloc[res_indexes],
y_array[res_indexes],
y_pred[res_indexes]
)
residuals_pred = np.maximum(
np.exp(self._predict_residual_estimator(X_array.iloc[cal_indexes])),
self.eps
)
else:
X_array = pd.DataFrame(X_array, columns=X.columns)
X_array = X_array.astype(X.dtypes.to_dict())
cal_indexes = full_indexes
residuals_pred = np.maximum(
self._predict_residual_estimator(X_array.iloc[cal_indexes]),
self.eps
)
#.....
``` |
MAPIE Regressor with CatBoost with categorical variables works fine, however when using LightGBM it seems to return error '
ValueError: could not convert string to float: 'class 1'
The text was updated successfully, but these errors were encountered: