[Core] Add interpretability capabilities trough SHAP #150

AzulGarza · 2023-06-08T22:16:44Z

Description

To enhance the interpretability of models trained using MLForecast, we propose leveraging SHAP (SHapley Additive exPlanations). SHAP is compatible with XGBoost, LightGBM, and scikit-learn models. Currently, if we want to use it, we need to create the dataset for which we desire forecast explanations (using preprocess) and iterate over each trained model using the following:

explainer = shap.Explainer(model)
shap_values = explainer(X)

The goal is to introduce a method, possibly named shap_values, to generate SHAP values for the forecasts from all trained models.

Use case

No response

The text was updated successfully, but these errors were encountered:

elisevansartefact · 2023-08-30T17:03:26Z

Currently, if we want to use it, we need to create the dataset for which we desire forecast explanations (using preprocess) and iterate over each trained model

I am not sure if this feature will be released soon, but in the meantime I cannot seem to use SHAP values using the above method.

model = MLForecast(models=[xgb.XGBRegressor()], freq=freq)
model.fit(df)
explainer = shap.Explainer(list(model.models.values())[0])

    169             algorithm = "permutation"
    171     # if we get here then we don't know how to handle what was given to us
    172     else:
--> 173         raise TypeError("The passed model is not callable and cannot be analyzed directly with the given masker! Model: " + str(model))
    175 # build the right subclass
    176 if algorithm == "exact":

TypeError: The passed model is not callable and cannot be analyzed directly with the given masker! Model: XGBRegressor(base_score=None, booster=None, callbacks=None,
             colsample_bylevel=None, colsample_bynode=None,
             colsample_bytree=None, early_stopping_rounds=None,
             enable_categorical=False, eval_metric=None, feature_types=None,
             gamma=None, gpu_id=None, grow_policy=None, importance_type=None,
             interaction_constraints=None, learning_rate=None, max_bin=None,
             max_cat_threshold=None, max_cat_to_onehot=None,
             max_delta_step=None, max_depth=None, max_leaves=None,
             min_child_weight=None, missing=nan, monotone_constraints=None,
             n_estimators=100, n_jobs=None, num_parallel_tree=None,
             predictor=None, random_state=None, ...)

Perhaps it has something to do with how the models I retrieve using model.models are not fitted, even after running model.fit(df):

list(model.models.values())[0].predict(df)
    646 if not self.__sklearn_is_fitted__():
    647     from sklearn.exceptions import NotFittedError
--> 649     raise NotFittedError("need to call fit or load_model beforehand")
    650 return self._Booster

NotFittedError: need to call fit or load_model beforehand

jmoralez · 2023-08-30T17:14:02Z

Hey @elisevansartefact. The fitted models are stored in the models_ attribute, so something like shap.Explainer(model.models_['XGBRegressor']) should work. The models argument accepts a dict as well (in case you prefer a different name for the model).
Also if you're looking to get explanations on the predictions you'll need to store the features used in time step, you may find this thread useful. Here's the relevant snippet:

from functools import partial

def extract_features(df, save_list):
 save_list.append(df)
 return df

save_list = []
extract_features_callback = partial(extract_features, save_list=save_list)
fcst.predict(..., before_predict_callback=extract_features_callback)
features = pd.concat(save_list)

jmakov · 2023-09-27T20:02:15Z

Would probably also want to use https://github.com/linkedin/fasttreeshap instead

jmoralez · 2023-10-13T18:16:26Z

Hey folks. We've added a guide which explains how to get the trained models and compute the SHAP values for training and inference. I think this gives full control on how to compute them (sample size, etc). Please let us know if you'd prefer something integrated into the library.

gofford · 2023-10-17T14:19:32Z

@jmoralez the guide makes this a lot easier but its probably worth noting that it only works for single model recursive fits. If a model (or models) is fitted with a direct strategy then each model in the list has a different explainer.

AzulGarza added enhancement feature labels Jun 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] Add interpretability capabilities trough SHAP #150

[Core] Add interpretability capabilities trough SHAP #150

AzulGarza commented Jun 8, 2023

elisevansartefact commented Aug 30, 2023

jmoralez commented Aug 30, 2023

jmakov commented Sep 27, 2023

jmoralez commented Oct 13, 2023

gofford commented Oct 17, 2023

[Core] Add interpretability capabilities trough SHAP #150

[Core] Add interpretability capabilities trough SHAP #150

Comments

AzulGarza commented Jun 8, 2023

Description

Use case

elisevansartefact commented Aug 30, 2023

jmoralez commented Aug 30, 2023

jmakov commented Sep 27, 2023

jmoralez commented Oct 13, 2023

gofford commented Oct 17, 2023