Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Add interpretability capabilities trough SHAP #150

Open
AzulGarza opened this issue Jun 8, 2023 · 5 comments
Open

[Core] Add interpretability capabilities trough SHAP #150

AzulGarza opened this issue Jun 8, 2023 · 5 comments

Comments

@AzulGarza
Copy link
Member

Description

To enhance the interpretability of models trained using MLForecast, we propose leveraging SHAP (SHapley Additive exPlanations). SHAP is compatible with XGBoost, LightGBM, and scikit-learn models. Currently, if we want to use it, we need to create the dataset for which we desire forecast explanations (using preprocess) and iterate over each trained model using the following:

explainer = shap.Explainer(model)
shap_values = explainer(X)

The goal is to introduce a method, possibly named shap_values, to generate SHAP values for the forecasts from all trained models.

Use case

No response

@elisevansartefact
Copy link

Currently, if we want to use it, we need to create the dataset for which we desire forecast explanations (using preprocess) and iterate over each trained model

I am not sure if this feature will be released soon, but in the meantime I cannot seem to use SHAP values using the above method.

model = MLForecast(models=[xgb.XGBRegressor()], freq=freq)
model.fit(df)
explainer = shap.Explainer(list(model.models.values())[0])

    169             algorithm = "permutation"
    171     # if we get here then we don't know how to handle what was given to us
    172     else:
--> 173         raise TypeError("The passed model is not callable and cannot be analyzed directly with the given masker! Model: " + str(model))
    175 # build the right subclass
    176 if algorithm == "exact":

TypeError: The passed model is not callable and cannot be analyzed directly with the given masker! Model: XGBRegressor(base_score=None, booster=None, callbacks=None,
             colsample_bylevel=None, colsample_bynode=None,
             colsample_bytree=None, early_stopping_rounds=None,
             enable_categorical=False, eval_metric=None, feature_types=None,
             gamma=None, gpu_id=None, grow_policy=None, importance_type=None,
             interaction_constraints=None, learning_rate=None, max_bin=None,
             max_cat_threshold=None, max_cat_to_onehot=None,
             max_delta_step=None, max_depth=None, max_leaves=None,
             min_child_weight=None, missing=nan, monotone_constraints=None,
             n_estimators=100, n_jobs=None, num_parallel_tree=None,
             predictor=None, random_state=None, ...)

Perhaps it has something to do with how the models I retrieve using model.models are not fitted, even after running model.fit(df):

list(model.models.values())[0].predict(df)
    646 if not self.__sklearn_is_fitted__():
    647     from sklearn.exceptions import NotFittedError
--> 649     raise NotFittedError("need to call fit or load_model beforehand")
    650 return self._Booster

NotFittedError: need to call fit or load_model beforehand

@jmoralez
Copy link
Member

Hey @elisevansartefact. The fitted models are stored in the models_ attribute, so something like shap.Explainer(model.models_['XGBRegressor']) should work. The models argument accepts a dict as well (in case you prefer a different name for the model).
Also if you're looking to get explanations on the predictions you'll need to store the features used in time step, you may find this thread useful. Here's the relevant snippet:

from functools import partial

def extract_features(df, save_list):
 save_list.append(df)
 return df

save_list = []
extract_features_callback = partial(extract_features, save_list=save_list)
fcst.predict(..., before_predict_callback=extract_features_callback)
features = pd.concat(save_list)

@jmakov
Copy link

jmakov commented Sep 27, 2023

Would probably also want to use https://github.com/linkedin/fasttreeshap instead

@jmoralez
Copy link
Member

Hey folks. We've added a guide which explains how to get the trained models and compute the SHAP values for training and inference. I think this gives full control on how to compute them (sample size, etc). Please let us know if you'd prefer something integrated into the library.

@gofford
Copy link

gofford commented Oct 17, 2023

@jmoralez the guide makes this a lot easier but its probably worth noting that it only works for single model recursive fits. If a model (or models) is fitted with a direct strategy then each model in the list has a different explainer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants