Skip to content

Commit bb759e6

Browse files
Update PFI Docs (#32900)
* Add PFI * Update original PFI article * Update original sample * Link to AutoML
1 parent 072d87c commit bb759e6

File tree

2 files changed

+52
-20
lines changed

2 files changed

+52
-20
lines changed

docs/machine-learning/how-to-guides/explain-machine-learning-model-permutation-feature-importance-ml-net.md

Lines changed: 13 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Interpret ML.NET models with Permutation Feature Importance
33
description: Understand the feature importance of models with Permutation Feature Importance in ML.NET
4-
ms.date: 10/05/2021
4+
ms.date: 12/06/2022
55
author: luisquintanilla
66
ms.author: luquinta
77
ms.custom: mvc,how-to
@@ -105,38 +105,31 @@ string[] featureColumnNames =
105105
.Select(column => column.Name)
106106
.Where(columnName => columnName != "Label").ToArray();
107107

108-
// 2. Define estimator with data pre-processing steps
109-
IEstimator<ITransformer> dataPrepEstimator =
108+
// 2. Define training pipeline
109+
IEstimator<ITransformer> sdcaEstimator =
110110
mlContext.Transforms.Concatenate("Features", featureColumnNames)
111-
.Append(mlContext.Transforms.NormalizeMinMax("Features"));
111+
.Append(mlContext.Transforms.NormalizeMinMax("Features"))
112+
.Append(mlContext.Regression.Trainers.Sdca());
112113

113-
// 3. Create transformer using the data pre-processing estimator
114-
ITransformer dataPrepTransformer = dataPrepEstimator.Fit(data);
115-
116-
// 4. Pre-process the training data
117-
IDataView preprocessedTrainData = dataPrepTransformer.Transform(data);
118-
119-
// 5. Define Stochastic Dual Coordinate Ascent machine learning estimator
120-
var sdcaEstimator = mlContext.Regression.Trainers.Sdca();
121-
122-
// 6. Train machine learning model
123-
var sdcaModel = sdcaEstimator.Fit(preprocessedTrainData);
114+
// 3. Train machine learning model
115+
var sdcaModel = sdcaEstimator.Fit(data);
124116
```
125117

126118
## Explain the model with Permutation Feature Importance (PFI)
127119

128120
In ML.NET use the [`PermutationFeatureImportance`](xref:Microsoft.ML.PermutationFeatureImportanceExtensions) method for your respective task.
129121

130122
```csharp
123+
// Use the model to make predictions
124+
var transformedData = sdcaModel.Transform(data);
125+
126+
// Calculate feature importance
131127
ImmutableArray<RegressionMetricsStatistics> permutationFeatureImportance =
132128
mlContext
133129
.Regression
134-
.PermutationFeatureImportance(sdcaModel, preprocessedTrainData, permutationCount:3);
130+
.PermutationFeatureImportance(sdcaModel, transformedData, permutationCount:3);
135131
```
136132

137-
> [!NOTE]
138-
> For pipelines that combine the preprocessing transforms and trainer, assuming that the trainer is at the end of the pipeline, you'll need to extract it using the `LastTransformer` property.
139-
140133
The result of using [`PermutationFeatureImportance`](xref:Microsoft.ML.PermutationFeatureImportanceExtensions) on the training dataset is an [`ImmutableArray`](xref:System.Collections.Immutable.ImmutableArray) of [`RegressionMetricsStatistics`](xref:Microsoft.ML.Data.RegressionMetricsStatistics) objects. [`RegressionMetricsStatistics`](xref:Microsoft.ML.Data.RegressionMetricsStatistics) provides summary statistics like mean and standard deviation for multiple observations of [`RegressionMetrics`](xref:Microsoft.ML.Data.RegressionMetrics) equal to the number of permutations specified by the `permutationCount` parameter.
141134

142135
The metric used to measure feature importance depends on the machine learning task used to solve your problem. For example, regression tasks may use a common evaluation metric such as R-squared to measure importance. For more information on model evaluation metrics, see [evaluate your ML.NET model with metrics](../resources/metrics.md).
@@ -179,6 +172,7 @@ Taking a look at the five most important features for this dataset, the price of
179172

180173
## Next steps
181174

175+
- [Use Permutation Feature Importance (PFI) with AutoML](how-to-use-the-automl-api.md#determine-feature-importance)
182176
- [Make predictions with a trained model](machine-learning-model-predictions-ml-net.md)
183177
- [Retrain a model](retrain-model-ml-net.md)
184178
- [Deploy a model in an ASP.NET Core Web API](serve-model-web-api-ml-net.md)

docs/machine-learning/how-to-guides/how-to-use-the-automl-api.md

Lines changed: 39 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: How to use the ML.NET Automated ML (AutoML) API
33
description: The ML.NET Automated ML (AutoML) API automates the model building process and generates a model ready for deployment. Learn the options that you can use to configure automated machine learning tasks.
4-
ms.date: 11/10/2022
4+
ms.date: 12/06/2022
55
ms.custom: mvc,how-to
66
ms.topic: how-to
77
---
@@ -685,3 +685,41 @@ Checkpoints provide a way for you to save intermediary outputs from the training
685685
var checkpointPath = Path.Join(Directory.GetCurrentDirectory(), "automl");
686686
experiment.SetCheckpoint(checkpointPath);
687687
```
688+
689+
## Determine feature importance
690+
691+
As machine learning is introduced into more aspects of everyday life such as healthcare, it's of utmost importance to understand why a machine learning model makes the decisions it does. Permutation Feature Importance (PFI) is a technique used to explain classification, ranking, and regression models. At a high level, the way it works is by randomly shuffling data one feature at a time for the entire dataset and calculating how much the performance metric of interest decreases. The larger the change, the more important that feature is. For more information on PFI, see [interpret model predictions using Permutation Feature Importance](explain-machine-learning-model-permutation-feature-importance-ml-net.md).
692+
693+
> [!NOTE]
694+
> Calculating PFI can be a time consuming operation. How much time it takes to calculate is proportional to the number of feature columns you have. The more features, the longer PFI will take to run.
695+
696+
To determine feature importance using AutoML:
697+
698+
1. Get the best model.
699+
700+
```csharp
701+
var bestModel = expResult.Model;
702+
```
703+
704+
1. Apply the model to your dataset.
705+
706+
```csharp
707+
var transformedData = bestModel.Transform(trainValidationData.TrainSet);
708+
```
709+
710+
1. Calculate feature importance using <xref:Microsoft.ML.PermutationFeatureImportanceExtensions.PermutationFeatureImportance%2A>
711+
712+
In this case, the task is regression but the same concept applies to other tasks like ranking and classification.
713+
714+
```csharp
715+
var pfiResults =
716+
mlContext.Regression.PermutationFeatureImportance(bestModel, transformedData, permutationCount:3);
717+
```
718+
719+
1. Order feature importance by changes to evaluation metrics.
720+
721+
```csharp
722+
var featureImportance =
723+
pfi.Select(x => Tuple.Create(x.Key, x.Value.Regression.RSquared))
724+
.OrderByDescending(x => x.Item2)
725+
```

0 commit comments

Comments
 (0)