Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SLEP021: Unified API for computing feature importance #86

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
slep012/proposal
slep017/proposal
slep019/proposal
slep021/proposal

.. toctree::
:maxdepth: 1
Expand Down
214 changes: 214 additions & 0 deletions slep021/proposal.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,214 @@
.. _slep_021:

==================================================
SLEP021: Unified API to compute feature importance
==================================================

:Author: Thomas J Fan, Guillaume Lemaitre
:Status: Draft
:Type: Standards Track
:Created: 2023-03-09

Abstract
--------

This SLEP proposes a common API for computing feature importance.

Detailed description
--------------------

Motivation
~~~~~~~~~~

Data scientists rely on feature importance when inspecting a trained model.
However, there is currently not a single algorithm providing **the** feature
importance. In practice, several algorithms are available, all having their
pros and cons.

In scikit-learn, there are different ways to compute and inspect feature
importances. Some models, e.g. some tree based models, expose a
`feature_importances_` attribute upon `fit`, and we also have utilities such as
the `permutation_importance` function to compute a different type of feature
importance. There has been some work documenting their limitations, but we have
not provided a nice API to implement alternatives.

Therefore, this SLEP proposes an API for providing feature importance allowing
to be flexible to switch between methods and extensible to add new methods. It
is a follow-up of initial discussions from :issue:`20059`.

Current state
~~~~~~~~~~~~~

Available methods
^^^^^^^^^^^^^^^^^

The following methods are available in scikit-learn to provide some feature
importance:

- The function :func:`sklearn.inspection.permutation_importance`. It requests
a fitted estimator and a dataset. Addtional parameters can be provided. The
method returns a `Bunch` containing 3 attributes: all decrease in score for
all repeatitions, the mean, and the standard deviation across the repeats.
This method is therefore estimator agnostic.
- The linear estimators have a `coef_` attributes once fitted, which is
sometimes used their corresponding importance. We documented the limitations
when it comes to interpret those coefficients.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a link to the docs for this would be nice for the record.

- The decision tree-based estimators have a `feature_importances_` attribute
once fitted.

Use cases
^^^^^^^^^

The first usage of feature importance is to inspect a fitted model. Usually,
the feature importance will be plotted to visualize the importance of the
features::

>>> tree = DecisionTreeClassifier().fit(X_train, y_train)
>>> plt.barh(X_train.columns, tree.feature_importances_)

The analysis can be done further by checking the variance of the feature
importance. :func:`sklearn.inspection.permutation_importance` already provides
a way to do that since it repeats the computation several time. For the model
specific feature importance, the user can use cross-validation to get an idea
of the dispersion::

>>> cv_results = cross_validate(tree, X_train, y_train, return_estimator=True)
>>> feature_importances = [est.feature_importances_ for est in cv_results["estimator"]]
>>> plt.boxplot(feature_importances, labels=X_train.columns)

The second usage is about the feature selection. Meta-estimator such as
:class:`sklearn.model_selection.SelectFromModel` internally use an array of
length `(n_features,)` to select feature and retrain a model on this subset of
feature.

By default :class:`sklearn.model_selection.SelectFromModel` relies on the
estimator to expose `coef_` or `feature_importances_`::

>>> SelectFromModel(tree).fit(X_train, y_train) # `tree` exposes `feature_importances_`

For more flexibility, a string can be provided::

>>> linear_model = make_pipeline(StandardScaler(), LogisticRegression())
>>> SelectFromModel(
... linear_model, importance_getter="named_steps.logisticregression.coef_"
... ).fit(X_train, y_train)

:class:`sklearn.model_selection.SelectFromModel` rely by default on
the estimator to expose a `coef_` or `feature_importances_` attribute. It is
also possible to provide a string corresponding the attribute name returning
the feature importance. It allows to deal with estimator embedded inside a
pipeline, for instance. Finally, a callable taking an estimator and returning
a NumPy array can also be provided.

Current pitfalls
~~~~~~~~~~~~~~~~

On a methodological perspective, scikit-learn does not encourage for good
practice. Indeed, since it provides a defacto `feature_importances_` attribute
for the decision tree, it is tempting for user to believe that this method is
the best one.

In the same spirit, the :class:`sklearn.model_selection.SelectFromModel`
meta-estimator uses defacto the `feature_importances_` or `coef_` for selecting
features.

In both cases, it should be better to request the user to be more explicit and
request to choose a specific method to compute the feature importance for
inspection or feature selection.

Additionally, `feature_importances_` and `coef_` are statistics derived from
the training set. We already documented that the reported
`feature_importances_` will potentially show biases for features used by the
model to overfit. Thus, it will potentially negatively impact the feature
Comment on lines +119 to +122
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also a link to the docs we're mentioning would be nice

selection once used in the :class:`sklearn.model_selection.SelectFromModel`
meta-estimator.

On an API perspective, the current functionality for feature importance are
available via functions or attributes, with no common API.

Solution
~~~~~~~~

A common API
^^^^^^^^^^^^

**Proposal 1**: Expose a parameter in `__init__` to select the method to use
to compute the feature importance. The computation will be done using a method,
e.g. `get_feature_importance` that could take additional parameters requested
by the feature importance method. This method could therefore be used
internally by :class:`sklearn.model_selection.SelectFromModel`.
Comment on lines +135 to +139

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm violating some SOLID principle, but we could incorporate feature importance agnostic techniques into some mixin like ClassifierMixin and RegressorMixin and specific methods within each class when applicable. In this sense, we would have the "main" feature importance method selected during init (defining the behavior of get_feature_importance). Still, one could always use the others because the estimator would have get_permutation_importance, get_mdi_importance, get_abs_coef_importance etc.


**Proposal 2**: Create a meta-estimator that takes a model and a method in
`__init__`. Then, a method `fit` could compute the feature importance given
some data. Then, the feature importance could be available through a fitted
attribute `feature_importances_` or a method `get_feature_importance`. We could
reuse such meta-estimator in the
:class:`sklearn.model_selection.SelectFromModel`.

Then, we should rely on a common API for the methods computing the feature
importance. It seems that they should all at least accept a fitted estimator,
some dataset, and potentially some extra parameters.

**Proposal 3**: Similarly to the proposal 2 and taking inspiration from the
SHAP package [2]_, we could create a class `Explainer` providing a
`get_feature_importance` method given some data.

Currently scikit-learn provides only global feature importance. The previous
API could be extended by providing a `get_samples_importance` to compute an
explanation per sample if the given method supports it (e.g. Shapley values).
Comment on lines +156 to +158

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About "explain_one", I'm trying to think of scenarios where you can use it and "explain_all" inside the same Explainer. I don't see many different scenarios than doing something like explain_all(X) = np.mean(np.abs(explain_one(X)), axis=1) (for SHAP or LIME for instance). Did you have any other technique in mind?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically speaking, I would be surprised that the average of the absolute would always be the way to go. Having a class would allow us to redefine the proper way to combine individual explanations.


**Proposal 4**: Create a meta-estimator `FeatureImportanceCalculator` that

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to choose my favorite proposal, although they are all very elegant (superficially thinking, I prefer the functions option). But I find proposal 1 less complex than adding an extra layer of abstraction proposed in 2, 3, and 4 (creating meta-estimators).
From the user's point of view, not having to "leave the estimator class" facilitates and increases the chance of use, at least for a scikit-learn noob. I don't know if this is a concern. I always see the scikit-learn development following this path and wanted to understand why. Is it just an object orientation issue?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the user's point of view, not having to "leave the estimator class" facilitates and increases the chance of use

This is a reasonable point but I think that most of the inspection/display tools are now leaving outside.

could be passed around plotting displays or to an
`estimator.get_feature_importance` method.

Plotting
^^^^^^^^

Add a new :class:`sklearn.inspection.FeatureImportanceDisplay` class to
:mod:`sklearn.inspection`. Two methods could be useful for this display: (i)
:meth:`sklearn.inspection.FeatureImportanceDisplay.from_estimator` to plot a
a single estimate of feature importance and (ii)
:meth:`sklearn.inspection.FeatureImportanceDisplay.from_cv_results` to plot a
an estimate of the feature importance together with the variance.

The display should therefore be aware how to retrieve the feature importance
given the estimator.

Discussion
----------

Issues where some discussions related to feature importance has been discussed:
:issue:`20059`, :issue:`21170`.

In SHAP package [2]_, the API is similar to the proposal 2. A class `Explainer`
takes a model, an algorithm, and some additional parameters (that could be
used by some algorithm). The computation of the Shapley values is done and
return using the method `shap_values`.

Related issues
--------------

Some discussions happened in the past. In this section, we aggregate all issues
related to this topic:

- :issue:`15132`: proposal to add `feature_importances_` into the
`HistGradientBoosting` classifier and regressor models.
- :issue:`18223`: proposal to implement the PIMP feature importance.
- :issue:`18603`: implement OOB permutation importance for `RandomForest`.
- :issue:`21170`: implement variable importances for linear models.

References and Footnotes
------------------------

.. [1] Each SLEP must either be explicitly labeled as placed in the public
domain (see this SLEP as an example) or licensed under the `Open Publication
License`_.
.. [2] https://shap.readthedocs.io/en/latest/

.. _Open Publication License: https://www.opencontent.org/openpub/


Copyright
---------

This document has been placed in the public domain [1]_.