-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SLEP021: Unified API for computing feature importance #86
base: main
Are you sure you want to change the base?
Changes from all commits
7aabb2a
e753bad
93dcd2f
bcf37e7
0459715
341c79a
f95a7a3
7f1fcb5
487e762
71fed85
934904c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,6 +25,7 @@ | |
slep012/proposal | ||
slep017/proposal | ||
slep019/proposal | ||
slep021/proposal | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,214 @@ | ||
.. _slep_021: | ||
|
||
================================================== | ||
SLEP021: Unified API to compute feature importance | ||
================================================== | ||
|
||
:Author: Thomas J Fan, Guillaume Lemaitre | ||
:Status: Draft | ||
:Type: Standards Track | ||
:Created: 2023-03-09 | ||
|
||
Abstract | ||
-------- | ||
|
||
This SLEP proposes a common API for computing feature importance. | ||
|
||
Detailed description | ||
-------------------- | ||
|
||
Motivation | ||
~~~~~~~~~~ | ||
|
||
Data scientists rely on feature importance when inspecting a trained model. | ||
However, there is currently not a single algorithm providing **the** feature | ||
importance. In practice, several algorithms are available, all having their | ||
pros and cons. | ||
|
||
In scikit-learn, there are different ways to compute and inspect feature | ||
importances. Some models, e.g. some tree based models, expose a | ||
`feature_importances_` attribute upon `fit`, and we also have utilities such as | ||
the `permutation_importance` function to compute a different type of feature | ||
importance. There has been some work documenting their limitations, but we have | ||
not provided a nice API to implement alternatives. | ||
|
||
Therefore, this SLEP proposes an API for providing feature importance allowing | ||
to be flexible to switch between methods and extensible to add new methods. It | ||
is a follow-up of initial discussions from :issue:`20059`. | ||
|
||
Current state | ||
~~~~~~~~~~~~~ | ||
|
||
Available methods | ||
^^^^^^^^^^^^^^^^^ | ||
|
||
The following methods are available in scikit-learn to provide some feature | ||
importance: | ||
|
||
- The function :func:`sklearn.inspection.permutation_importance`. It requests | ||
a fitted estimator and a dataset. Addtional parameters can be provided. The | ||
method returns a `Bunch` containing 3 attributes: all decrease in score for | ||
all repeatitions, the mean, and the standard deviation across the repeats. | ||
This method is therefore estimator agnostic. | ||
- The linear estimators have a `coef_` attributes once fitted, which is | ||
sometimes used their corresponding importance. We documented the limitations | ||
when it comes to interpret those coefficients. | ||
- The decision tree-based estimators have a `feature_importances_` attribute | ||
once fitted. | ||
|
||
Use cases | ||
^^^^^^^^^ | ||
|
||
The first usage of feature importance is to inspect a fitted model. Usually, | ||
the feature importance will be plotted to visualize the importance of the | ||
features:: | ||
|
||
>>> tree = DecisionTreeClassifier().fit(X_train, y_train) | ||
>>> plt.barh(X_train.columns, tree.feature_importances_) | ||
|
||
The analysis can be done further by checking the variance of the feature | ||
importance. :func:`sklearn.inspection.permutation_importance` already provides | ||
a way to do that since it repeats the computation several time. For the model | ||
specific feature importance, the user can use cross-validation to get an idea | ||
of the dispersion:: | ||
|
||
>>> cv_results = cross_validate(tree, X_train, y_train, return_estimator=True) | ||
>>> feature_importances = [est.feature_importances_ for est in cv_results["estimator"]] | ||
>>> plt.boxplot(feature_importances, labels=X_train.columns) | ||
|
||
The second usage is about the feature selection. Meta-estimator such as | ||
:class:`sklearn.model_selection.SelectFromModel` internally use an array of | ||
length `(n_features,)` to select feature and retrain a model on this subset of | ||
feature. | ||
|
||
By default :class:`sklearn.model_selection.SelectFromModel` relies on the | ||
estimator to expose `coef_` or `feature_importances_`:: | ||
|
||
>>> SelectFromModel(tree).fit(X_train, y_train) # `tree` exposes `feature_importances_` | ||
|
||
For more flexibility, a string can be provided:: | ||
|
||
>>> linear_model = make_pipeline(StandardScaler(), LogisticRegression()) | ||
>>> SelectFromModel( | ||
... linear_model, importance_getter="named_steps.logisticregression.coef_" | ||
... ).fit(X_train, y_train) | ||
|
||
:class:`sklearn.model_selection.SelectFromModel` rely by default on | ||
the estimator to expose a `coef_` or `feature_importances_` attribute. It is | ||
also possible to provide a string corresponding the attribute name returning | ||
the feature importance. It allows to deal with estimator embedded inside a | ||
pipeline, for instance. Finally, a callable taking an estimator and returning | ||
a NumPy array can also be provided. | ||
|
||
Current pitfalls | ||
~~~~~~~~~~~~~~~~ | ||
|
||
On a methodological perspective, scikit-learn does not encourage for good | ||
practice. Indeed, since it provides a defacto `feature_importances_` attribute | ||
for the decision tree, it is tempting for user to believe that this method is | ||
the best one. | ||
|
||
In the same spirit, the :class:`sklearn.model_selection.SelectFromModel` | ||
meta-estimator uses defacto the `feature_importances_` or `coef_` for selecting | ||
features. | ||
|
||
In both cases, it should be better to request the user to be more explicit and | ||
request to choose a specific method to compute the feature importance for | ||
inspection or feature selection. | ||
|
||
Additionally, `feature_importances_` and `coef_` are statistics derived from | ||
the training set. We already documented that the reported | ||
`feature_importances_` will potentially show biases for features used by the | ||
model to overfit. Thus, it will potentially negatively impact the feature | ||
Comment on lines
+119
to
+122
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. also a link to the docs we're mentioning would be nice |
||
selection once used in the :class:`sklearn.model_selection.SelectFromModel` | ||
meta-estimator. | ||
|
||
On an API perspective, the current functionality for feature importance are | ||
available via functions or attributes, with no common API. | ||
|
||
Solution | ||
~~~~~~~~ | ||
|
||
A common API | ||
^^^^^^^^^^^^ | ||
|
||
**Proposal 1**: Expose a parameter in `__init__` to select the method to use | ||
to compute the feature importance. The computation will be done using a method, | ||
e.g. `get_feature_importance` that could take additional parameters requested | ||
by the feature importance method. This method could therefore be used | ||
internally by :class:`sklearn.model_selection.SelectFromModel`. | ||
Comment on lines
+135
to
+139
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe I'm violating some SOLID principle, but we could incorporate feature importance agnostic techniques into some mixin like |
||
|
||
**Proposal 2**: Create a meta-estimator that takes a model and a method in | ||
`__init__`. Then, a method `fit` could compute the feature importance given | ||
some data. Then, the feature importance could be available through a fitted | ||
attribute `feature_importances_` or a method `get_feature_importance`. We could | ||
reuse such meta-estimator in the | ||
:class:`sklearn.model_selection.SelectFromModel`. | ||
|
||
Then, we should rely on a common API for the methods computing the feature | ||
importance. It seems that they should all at least accept a fitted estimator, | ||
some dataset, and potentially some extra parameters. | ||
|
||
**Proposal 3**: Similarly to the proposal 2 and taking inspiration from the | ||
SHAP package [2]_, we could create a class `Explainer` providing a | ||
`get_feature_importance` method given some data. | ||
|
||
Currently scikit-learn provides only global feature importance. The previous | ||
API could be extended by providing a `get_samples_importance` to compute an | ||
explanation per sample if the given method supports it (e.g. Shapley values). | ||
Comment on lines
+156
to
+158
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. About " There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Theoretically speaking, I would be surprised that the average of the absolute would always be the way to go. Having a class would allow us to redefine the proper way to combine individual explanations. |
||
|
||
**Proposal 4**: Create a meta-estimator `FeatureImportanceCalculator` that | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I still need to choose my favorite proposal, although they are all very elegant (superficially thinking, I prefer the functions option). But I find proposal 1 less complex than adding an extra layer of abstraction proposed in 2, 3, and 4 (creating meta-estimators). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is a reasonable point but I think that most of the inspection/display tools are now leaving outside. |
||
could be passed around plotting displays or to an | ||
`estimator.get_feature_importance` method. | ||
|
||
Plotting | ||
^^^^^^^^ | ||
|
||
Add a new :class:`sklearn.inspection.FeatureImportanceDisplay` class to | ||
:mod:`sklearn.inspection`. Two methods could be useful for this display: (i) | ||
:meth:`sklearn.inspection.FeatureImportanceDisplay.from_estimator` to plot a | ||
a single estimate of feature importance and (ii) | ||
:meth:`sklearn.inspection.FeatureImportanceDisplay.from_cv_results` to plot a | ||
an estimate of the feature importance together with the variance. | ||
|
||
The display should therefore be aware how to retrieve the feature importance | ||
given the estimator. | ||
|
||
Discussion | ||
---------- | ||
|
||
Issues where some discussions related to feature importance has been discussed: | ||
:issue:`20059`, :issue:`21170`. | ||
|
||
In SHAP package [2]_, the API is similar to the proposal 2. A class `Explainer` | ||
takes a model, an algorithm, and some additional parameters (that could be | ||
used by some algorithm). The computation of the Shapley values is done and | ||
return using the method `shap_values`. | ||
|
||
Related issues | ||
-------------- | ||
|
||
Some discussions happened in the past. In this section, we aggregate all issues | ||
related to this topic: | ||
|
||
- :issue:`15132`: proposal to add `feature_importances_` into the | ||
`HistGradientBoosting` classifier and regressor models. | ||
- :issue:`18223`: proposal to implement the PIMP feature importance. | ||
- :issue:`18603`: implement OOB permutation importance for `RandomForest`. | ||
- :issue:`21170`: implement variable importances for linear models. | ||
|
||
References and Footnotes | ||
------------------------ | ||
|
||
.. [1] Each SLEP must either be explicitly labeled as placed in the public | ||
domain (see this SLEP as an example) or licensed under the `Open Publication | ||
License`_. | ||
.. [2] https://shap.readthedocs.io/en/latest/ | ||
|
||
.. _Open Publication License: https://www.opencontent.org/openpub/ | ||
|
||
|
||
Copyright | ||
--------- | ||
|
||
This document has been placed in the public domain [1]_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a link to the docs for this would be nice for the record.