-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathexplanations.qmd
808 lines (715 loc) · 48.6 KB
/
explanations.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
# Shapley Explanations
Shapley explanations are a broad class of techniques characterized by their use
of the Shapley value from cooperative game theory as the building block for
generating quantitative explanations. Methods differ in how the underlying game
is defined, how the Shapley values are estimated, and as a result, how those
values should be interpreted when used as a model explanation. In this section,
we reexamine the Shapley-based model explanation literature from a causal
perspective using our evaluation framework, with a particular focus on the
specific types of target explanatory questions that can be addressed by each
method and the additional considerations, information, and assumptions required
to do so.
## Overview
The key differentiator between Shapley explanations is how the underlying game
– as defined by the payout, players, and the value function – is formulated.
Before getting into the specifics of each method, we review how each of these
components has been adapted to the task of generating model explanations, and
in the case of specifying a value function, how this choice can be reframed
using a causal perspective in the context of our evaluation framework.
The definition of the payout is directly related to the scope of the resulting
explanation. In cooperative game theory, the payout is the value received by
the coalition of players when they cooperate. When applied to model
explanations, the payout is a numeric attribute of the model such as its
accuracy, variance explained, or the value that the model predicts for a
particular input. When the payout is defined in terms of a model prediction,
the result is a local explanation, which is an explanation of a particular
instance. In contrast, global explanations address model behavior in the
aggregate. Payouts defined in terms of model performance (e.g. accuracy,
variance explained, etc.) lead to global explanations. However, in some cases,
global explanations can be constructed from local ones, for example, by
averaging local explanations across all predictions. In this work, we are
primarily interested in Shapley methods that yield local explanations; however,
we briefly cover methods that generate global explanations as well.
In the vast majority of cases, a model’s features are treated as the players in
the game and groups of players constitute a coalition. When reviewing a
particular method, we will omit this detail from our discussion except in those
cases where a different formulation is used.
For the remainder of this paper we use the following notation^[This
notation largely mirrors @kumar_problems_2020 with slight differences to
improve readability]:
- $D$: The set of features $\{1, 2, ..., D\}$
- $S$: A subset of features, $S \subseteq D$
- $f$: The machine learning model, $f(x_1, x_2, ..., x_N)$
- $\mathbf{X}$: a multivariate random variable $\{X_1, X_2, ..., X_N\}$
- $\mathbf{x}$: a set of values $\{x_1, x_2, ..., x_N\}$
- $\mathbf{X}_S$: the set of random variables $\{X_i: i \in S\}$
- $\mathbf{X}_{\bar{S}}$: the set of random variables $\{X_i: i \notin S\}$
- $\mathbf{x}_S$: the set of values $\{x_i: i \in S\}$
- $\mathbf{x}_{\bar{S}}$: the set of values $\{x_i: i \notin S\}$
### Value Function
The most consequential aspect of the game formulation is the value function. As
noted previously, the value function specifies the payout for every subset of
players. Since most machine learning models cannot make predictions when inputs
are missing (as is the case for any coalition besides the grand coalition), the
value function must provide a replacement value for every feature that is not
part of the coalition. In essence, it must simulate what happens when features
are removed from the model (@janzing_feature_2019,
@merrick_explanation_2020, @covert_explaining_2020). The
brute-force approach is to avoid simulating altogether and apply the same
learning algorithm (and hyperparameters) to every subset of features,
effectively training $2^N$ separate models where $N$ is the number of features.
Other alternatives include replacing the missing value with a fixed value (e.g.
zero), a value from a reference data point, or the expected value over a
collection of data points (reference distribution). Each of these alternatives
to the brute force approach can be thought of as a special case of the last
option (@merrick_explanation_2020, @sundararajan_many_2020).
However, there are numerous ways to select a reference distribution, which has
generated significant debate over which choice is the correct one.
Within our evaluation framework, the correct reference distribution is the one
aligned with the target explanatory question, which is defined by the intended
level of explanation and whether it is associative, interventional, or
counterfactual. Shapley-based methods can be used to provide either model or
world level explanations. However, much of the Shapley explanation literature
implicitly assumes one of the two and fails to recognize the existence or
validity of the other. There are two notable exceptions:
@basu_shapley_2020 makes a similar distinction using what they call
“modes of interpretation” and @wang_shapley_2021 introduce the notion of
“boundaries of explanation,” which captures a similar idea. Failure to clearly
distinguish between these different levels of explanation has fueled the debate
over which reference distribution is correct. Before discussing particular
methods, we revisit the different classes of reference distributions used by
various Shapley methods and the types of explanatory questions that each can
address.
Reference distributions can be categorized as unconditional or conditional
based on whether they consider features jointly or independently. In cases
where features are independent – a situation that is rarely, if ever, true in
practice – this distinction is irrelevant and the resulting Shapley values for
both types are equivalent. Conditional value functions can be further
categorized based on whether they are observational or interventional. Each of
these three classes of reference distribution (unconditional, observational
conditional, and interventional conditional) yield Shapley explanations at a
particular level (model or world) that can be used to address particular types
of target explanatory questions when interpreted correctly (see
@fig-reference-distribution-selection).
```{mermaid}
%%| label: fig-reference-distribution-selection
%%| fig-cap: Selecting A Reference Distribution
flowchart
A{Level}
A -->|model| B[Marginal]
A -->|World| C{question}
C -->|associative| D[observational]
C -->|interventional| E[interventional]
C -->|counterfactual| F[interventional]
```
Value functions based on an observational conditional distribution replace
missing features with values that are consistent with the observed
relationships among features.
$$
v(S) = E[f(\mathbf{X}) | \mathbf{X}_S = \mathbf{x}_s]
$$ {#eq-observational-value-function}
This choice of reference distribution is unable to distinguish between
correlation that is due to a causal effect and correlation due to confounding
(i.e. spurious correlation). As a result, the Shapley values generated by
methods that use this formulation are only able to address associative
world-level target explanatory questions.
In contrast, value functions based on an interventional conditional
distribution replace missing features with values that are consistent with the
causal relationships between features.
$$
v(S) = E[f(\mathbf{X}) | do(\mathbf{X}_S = \mathbf{x}_s)]
$$ {#eq-interventional-value-function}
However, methods based on this type of value function may require the
practitioner to provide auxiliary causal information depending on the type of
explanatory question. As noted in the background section on causality,
interventional questions require a GCM and counterfactual questions require an
SCM.
Finally, value functions based on an unconditional reference distribution
replace individual feature values independent of the values of other features.
$$
v(S) = E[f(\mathbf{x}_S, \mathbf{X}_{\bar{S}})]
$$ {#eq-marginal-value-function}
In our view, this type of reference distribution is appropriate for generating
model-level explanations as it effectively ignores real-world causal
relationships by assuming feature independence. As noted previously, the model
itself is sufficient for addressing associative, interventional, and
counterfactual questions so no auxiliary causal information is required.
@janzing_feature_2019 make the case that a causal perspective supports
using a marginal reference distribution in all cases. To make their point, they
introduce a distinction between real-world (“true”) features $\tilde{X_1},
\tilde{X_2}, ..., \tilde{X_n}$ and features that serve as input to the model
$X_1, X_2, ..., X_n$ (see @fig-janzing-model-vs-world). Model
features have no causal relationships (e.g. are independent) even if causal
relationships exist between their real-world counterparts. Using this setup,
they show that the backdoor criterion is satisfied and the interventional
conditional value function (@eq-interventional-value-function})
yields identical Shapley values as an unconditional one
(@eq-marginal-value-function}).
![Model vs. World Causality](./images/janzing_model_vs_wolrd.png){#fig-janzing-model-vs-world}
In our view, the causal perspective assumed by @janzing_feature_2019 is
limited because it implicitly assumes that all target explanatory questions can
be addressed through model-level explanations. To illustrate the point,
consider a model that takes loan amount, income, and savings account balance as
inputs and predicts risk of default. The question "What if I resubmit by loan
application and say my income is $X$?'' likely has a model-level intention,
while a question like “What if I increase my income to $X$” is targeting a
world-level explanation. If higher income is causally related to savings
account balance (e.g. always depositing a fixed percentage of income), the
answer to the second question is not necessarily the same as the first. In our
evaluation framework, part of what makes an explanation correct is that it
aligns with the target explanatory question. Therefore, understanding the
explainee’s desired level of explanation is critical
The adoption of the Janzing-style causal perspective with its failure to
adequately distinguish between levels of explanation has led to an unfortunate
pattern in the literature where the terms interventional, marginal, and
unconditional are used interchangeably. Since this equivalence only applies to
model-level explanations, we use the term interventional only when a causal
perspective is intentionally adopted.
### Indirect Influence
Differentiating between model and world-level explanations also helps to
resolve the ongoing “indirect influence” debate in the Shapley explanation
literature. The key question behind this debate is whether a real-world feature
that exhibits only an indirect influence on the model should be assigned a
Shapley value of zero. Another way to frame this question is: should a feature
that is not functionally used by the model be considered irrelevant when
providing a model explanation?
One school of thought argues that Shapley-based attributions should assign zero
importance to irrelevant features. Let $X_1, X_2$ be two features and consider
the model $f(x_1, x_2) = x_2$. The model $f$ clearly does not functionally
depend on $x_1$ and therefore, the argument goes, $x_1$ is irrelevant. Numerous
authors have demonstrated that when a value function based on a conditional
expectation (observational or interventional) is used (see example 3.3 from
@sundararajan_many_2020, section 2 from @merrick_explanation_2020, and example
1 from @janzing_feature_2019), irrelevant features may receive non-zero
attributions. They argue that a non-zero attribution is both counterintuitive
and constitutes an apparent violation of the dummy axiom.
@merrick_explanation_2020 demonstrates the practical implications of this
violation for assessing fairness. Consider two models that make decisions about
who should be hired by a moving company. Model A exclusively considers the
applicant’s gender, a protected category, while model B takes both gender and
the applicant’s lifting capacity into account. If gender and lifting capacity
are correlated and a conditional method is used, both gender and lifting
capacity receive non-zero attributions in model A even though lifting capacity
is functionally irrelevant. They argue that these attributions hide the degree
of bias in the model’s decisions. @sundararajan_many_2020 provides a different
view, suggesting that these attributions may lead practitioners to incorrectly
believe that a model is sensitive to a protected feature. It is also possible
to construct scenarios in which attributions appear to violate the symmetry
(@sundararajan_many_2020 and @merrick_explanation_2020), and linearity
(@sundararajan_many_2020) axioms. Opponents of methods based on a conditional
value function use these violations and their implications for assessing model
fairness to support the use of marginal SHAP methods, which will never assign a
non-zero attribution @merrick_explanation_2020 to an irrelevant feature.
Model fairness considerations have also been used to justify conditional
Shapley methods. @adler_auditing_2016 propose a method for auditing black box
models for indirect influence, which they argue has implications for assessing
algorithmic fairness. They provide an example of auditing a model used to
approve home loans to ensure that race does not have an undue influence on the
outcome. Even if the model does not functionally depend on race, it may include
other variables (e.g. zipcode) that serve as proxies for race, allowing race to
have an indirect influence on the model. From this standpoint, conditional
Shapley methods that assign non-zero attributions to features that are not
explicitly included in the model is a desirable property.
Both sides acknowledge that the choice of value function is directly related to
the indirect influence debate. The underlying implication in the literature is
that practitioners should use their belief about the right solution to the
indirect influence debate to select a value function. In our view, the value
function should be selected to align with the target explanatory question,
which requires understanding the level of explanation that is sought. If a
model-level explanation is desired, then an unconditional value function is
appropriate and, as a result, irrelevant features will receive zero
attributions. Conversely, if it is a world-level explanation that that
explainee is after, then a conditional method is appropriate and it is not only
permissible, but desirable, that irrelevant features receive non-zero
attributions.
## Global Explanatation
Shapley-based model explanations arose out of the relative feature importance
literature where the resulting importances can be viewed as a global model
explanation. These methods formulate the game by treating features as players
and the total variance explained as the payout. We briefly review these
historical roots because the developments in this literature foreshadow those
that occurred later when Shapley-based methods were developed to generate local
explanations.
### Linear Models
The earliest Shapley-based model explanation methods were developed to quantify
relative feature importance for linear models^[See
@gromping_estimators_2007 or a more detailed overview, which we summarize
briefly here.]. The first such method (**LMG**) was made known by
@kruskal_relative_1987, but originated with @lindeman_introduction_1980, who
suggested that relative feature importance be computed by averaging the
contributions of each feature to the total variance explained over all possible
orderings of the features. They justified this computationally-expensive
approach by demonstrating that other approaches for computing relative feature
importance (e.g. comparing the magnitude of the regression coefficients or
decomposing total variance using semipartial correlations) yield different
values depending on the order in which features are considered. However, the
connection between LMG and the Shapley value was not made explicit until
@stufken_hierarchical_1992. In what was largely a re-invention of LMG,
@lipovetsky_analysis_2001 introduced **Shapley Net Effects**, but explicitly
appealed to the axiomatically-grounded Shapley value from cooperative game
theory to justify their approach. One critique of LMG and Shapley Net Effects
was that functionally-irrelevant features could receive a non-zero relative
importance when features are dependent. In response, @feldman_proportional_2005
introduced the proportional marginal variance decomposition **PMVD** method,
which weights permutations of features in a data-dependent way such that
functionally-irrelevant features are assigned zero importance. These concerns
are the precursor to the indirect influence debate, albeit without that
particular terminology.
### Black-Box Models
A related, but more general, line of research uses the Shapley value to
estimate relative feature importance for arbitrary black-box models by
attributing the model’s total variance explained to individual features.
@owen_sobol_2014 introduced a Shapley-based method for decomposing the variance
of the output of a model, which was later named **Shapley Effect** by
@song_shapley_2016. To simplify computation, Shapley Effect assumes feature
independence and computes Sobol indices in order to provide an upper and lower
bound for the exact Shapley value. Recognizing the limitation of assuming
feature independence, @song_shapley_2016 extended this approach to handle
dependent features. They propose an algorithm to approximate the Shapley value
that extends @castro_polynomial_2009 and involves two levels of sampling in
order to estimate the necessary conditional variances: sampling feature
permutations and sampling from the empirical distribution. @owen_shapley_2017
provides conceptual backing to @song_shapley_2016, making the case that the
Shapley value is the correct approach to estimating feature importance when
features are dependent. The primary alternative, they argue, is a version of
ANOVA, which avoids the feature independence assumption, but introduces
conceptual problems. First, importances can be negative, which the authors
argue is counter-intuitive. Moreover, the possibility of negative importances
allows for a variable that is not functionally used by a model to receive
non-zero importance. @owen_shapley_2017 argues that the Shapley value is
preferred because it avoids both of these limitations.
Questions about how to handle dependent features and whether
functionally-irrelevant features should be attributed importance drove
methodological advancements around the use of Shapley values to generate global
model explanations. In both the linear and black-box settings, the earliest
methods assumed feature independence with more complicated methods that could
account for feature dependence coming later. Both literatures also contain
arguments about the proper way to handle features that have an indirect
influence on the output. In fact, one of the motivations for PMVD is a concern
over the fact that LMG violates the dummy axiom (referred to as the exclusion
axiom in the original work), which says that functionally irrelevant features
should receive zero importance @gromping_estimators_2007. In discussing
the merits of this concern, @gromping_estimators_2007 notes that the
relevance of the dummy axiom depends on the purpose behind computing relative
feature importance. If the purpose is to understand how much a feature
contributes to a model’s predictions (e.g. a model-level explanation), then a
feature that is not functionally used by the model should receive zero
importance. On the other hand, if the purpose is to understand how real-world
interventions impact the model (world-level explanations), then assigning
non-zero importances to functionally irrelevant features is justified. As the
previous section hopefully makes clear and the subsequent section will expand
upon, each of these trends and debates have an analog in the Shapley-based
local explanation literature.
## Local Explanations
In this section, we review Shapley-based methods for generating local
explanations through the lens of our evaluation framework. Much like the
Shapley-based global explanation literature, the methods and associated debates
in this literature are largely driven by two concerns: how to generate
explanations when features are dependent, and whether features that the model
does not explicitly depend upon should be part of an explanation. Our goal is
to provide enough information about each of the methods – the type of
explanatory questions they address, their underlying assumptions, and their
limitations – to allow practitioners to produce correct Shapley-based
explanations. For a summary of the methods, see @tbl-local-explanation-methods.
| **Method** | **Definition** | **Estimation** | **Level** | **Question** |
|-------------------------------------------------|----------------------------|----------------------------|-----------|----------------|
| Shapley Regression Values (SRV) | observational conditional | observational conditional | world | associative |
| Shapley Sampling Values (SSV) | observational conditional | unconditional | model | counterfactual |
| KernelSHAP | observational conditional | unconditional | model | counterfactual |
| Conditional Kernel SHAP | observational conditional | observational conditional | world | associative |
| Baseline Shapley (BSHAP) | N/A | N/A | N/A | N/A |
| Quantitative Input Influence (causal-QII) | unconditional | unconditional | model | counterfactual |
| Distal Asymmetric Shapley Values (d-ASV) | observational conditional | | N/A | N/A |
| Proximate Asymmetric Shapely Values (p-ASV) | observational conditional | | N/A | N/A |
| Causal Shapley Values (CSV) | interventional conditional | interventional conditional | both | interventional |
| Shapley Flow (SF) | N/A | N/A | both | counterfactual |
| Recursive Shapley Values (RSV) | N/A | N/A | both | counterfactual |
: Local Shapley Explanation Methods {#tbl-local-explanation-methods}
### Observational and Implicitly-Causal Methods
The earliest and most commonly-used methods leverage a purely observational
approach to generating explanations. Many of these methods define a value
function in one way, but estimate Shapley values that correspond to a different
value function @kumar_problems_2020. For the purposes of evaluating the
correctness of the resulting explanations, what matters is that the estimated
Shapley values are aligned with the target explanatory questions. Most of these
methods rely on some form of sampling to approximate the Shapley values.
Following @merrick_explanation_2020, we advocate for quantifying the
uncertainty in these estimates as part of generating the explanations.
@merrick_explanation_2020 propose computing confidence intervals for the
estimates; however, we note that other methods from the uncertainty
quantification literature are worth exploring. Unfortunately, a review of those
methods is outside the scope of this work.
#### Shapley Regression Values
@strumbelj_explaining_2009 proposed Interactions Methods for Explanations
(IME), the earliest method for generating Shapley-based local explanations. The
authors define the target Shapley values using a conditional observational
value function (@eq-observational-value-function) and estimate them
by fitting separate models for each subset of features.
@covert_explaining_2020 showed that this brute-force estimation procedure
yields values that are aligned with the defined value function. Based on the
estimation procedure, subsequent work refers to these as **Shapley
Regression Values** (SRV). The resulting explanations are only able to address
target explanatory questions on the first rung of the ladder of causality
(associative/”how” questions).
#### Shapley Sampling Values
In follow up work, @strumbelj_efficient_2010 proposed a method that
simulates feature removal using a product of uniform distributions, where the
bounds for each uniform distribution are determined based on the minimum and
maximum values in the training data. The resulting Shapley values are referred
to as **Shapley Sampling Values** (SSV).
Let $\mathcal{U(X_i)}$ refer to the uniform distribution associated with feature $i$:
$$
\hat{v}(S) = E_{\Pi_{i\in D} \mathcal{U}(\mathbf{X}_i)}[f(\mathbf{x}_S, \mathbf{X}_{\bar{S}}]
$$ {#eq-shapley-sampling-values}
The estimation procedure for SSV solves two problems with SRV: it does not
require retraining an exponential number of models and does not require full
access to the training data. However, since SSV relies on sampling to estimate
the Shapley values, it is important – under our evaluation framework – to
assess the variability of the resulting Shapley values.
@merrick_explanation_2020 propose a method for generating confidence
intervals for Shapley values that could be used. Methods from the uncertainty
quantification literature are also relevant for this effort. The computational
benefits of SSV also come at the cost of generating Shapley values that do not
align with the intended value function (observational conditional) unless the
features are independent. Therefore, the resulting explanations should not be
used to answer associative world-level explanatory questions unless this
assumption is validated. However, because the estimation procedure yields
values that are unbiased with respect to an unconditional value function, they
can be used (without additional assumptions) to address associative,
interventional, and counterfactual model-level explanatory questions. The
ability to do this comes from the fact that the model itself serves as the
structural causal model required to answer such questions.
@strumbelj_explaining_2014 proposed additional improvements to the
approximation algorithm using quasi-random and adaptive sampling. Since the
primary contribution is an efficiency improvement in the approximation
algorithm that relies on the same assumptions, we do not consider this a new
method and the same considerations around quantifying the uncertainty of the
estimates and interpreting the values correctly applies.
#### KernelSHAP
@lundberg_unified_2017 introduced a new method for estimating Shapley values
defined using a conditional observational value function, referred to as
**KernelSHAP**, that uses weighted linear regression to simulate feature
removal using a joint marginal distribution.
$$
\hat{v}(S) = E[f(\mathbf{x_s}, \mathbf{X}_{\bar{X}}]
$$ {#eq-kernel-shap}
KernelSHAP, like SSV, is a sampling-based estimator that requires an
independence assumption for the resulting values to be unbiased with respect to
the defined value function. Therefore, the same considerations around
quantifying uncertainty and interpreting the values as explanations apply.
The authors also proposed the term Shapley Additive Explanations (SHAP), which
has subsequently been used to refer to different concepts in the literature.
The original paper uses the term to refer to the collection of methods that
define the target Shapley value in terms of a conditional observational value
function. By this definition, all of the methods we have discussed thus far
(SRV, SSV, and KernelSHAP) are SHAP methods. However, the term has also been
used to refer to the class of additive feature attribution methods – methods
whose attributions sum to the model’s output. All of the methods that fall
under the SHAP umbrella are additive feature attribution methods, however,
there are other methods (both Shapley-based and otherwise) that fall under this
more general category. In other cases, SHAP is used to refer to the KernelSHAP
estimation procedure or the [SHAP python
package](https://github.com/slundberg/shap), which includes multiple estimation
procedures. Although it is counter to current practice, we recommend against
using the term SHAP because of its multiple meanings and because, by the
original definition, it is redundant with simply defining the value function
associated with a Shapley-based method.
#### Conditional KernelSHAP
@aas_explaining_2020} developed an extension to KernelSHAP that estimates
Shapley values corresponding to an observational conditional value function
rather than an unconditional one. The authors propose four different ways to,
more efficiently than SRV, approximate the required conditional distributions.
1. Multivariate Gaussian distribution
2. Gaussian copula
3. Empirical conditional distribution
4. Combination
The first option is best when the features are approximately normally
distributed. When the features themselves are not normally distributed, but the
dependence structure between them is well described by a normal distribution,
the second option can be used. When neither the features nor their dependence
structure can be described by a normal distribution, then the third option can
be used. This option idea is similar to the smoothing-based approach suggested
in @sundararajan_many_2020 and involves taking an expectation over
similar data points. The final option is to use one of the previous three
alternatives depending on the number of features whose removal is simulated via
conditioning. They note that using the empirical conditional distribution works
well for a small number of conditioning variables, but one of the other three
methods should be used otherwise.
The authors provide an empirical evaluation of the different alternatives using
simulated data and find that for modest levels of correlation between features
($\rho = 0.05$), all of their proposed extension methods provide better
approximations than KernelSHAP. The authors use this result to claim that
explanations arising from KernelSHAP can be very wrong when features are
dependent. Although it is not explicitly stated, their implicit definition of
correctness is about how closely the method used to estimate the Shapley value
approximates the values as defined.
In our view, it is better to treat KernelSHAP and Conditional KernelSHAP as
approximating different value functions rather than as better and worse
approximations of the same value function. Both estimators yield Shapley values
that can be used to provide correct model explanations provided they are used
to address the correct types of target explanatory questions. For Conditional
KernelSHAP, the resulting Shapley values form the basis for explanations that
can address world-level associative explanatory questions only. Like KernelSHAP
and SRV, Conditional KernelSHAP relies on sampling, so the same considerations
around uncertainty quantification apply. Conditional KernelSHAP requires one
additional consideration: the degree to which the distributional assumption
associated with the approximation technique is valid.
#### BShap
@sundararajan_many_2020 were the first to explore Shapley-based
explanations as a class of methods and discuss the apparent problems with an
observational conditional value function. They show empirically how different
methods that define the value function in the same way yield different Shapley
values for a given feature, rendering the "uniqueness" result of the Shapley
value practically meaningless. They solve this problem by proposing an
alternative axiomatization that lends itself to a truly unique solution known
as **Baseline Shapley** (BShap). This new axiomatization includes three
new axioms (affine scale invariance, demand monotonicity, and proportionality)
to the original three (dummy, symmetry, linearity) required to derive the
original Shapley value.
BShap simulates feature removal by replacing their values with the values from
some fixed baseline ($\mathbf{x}'$).
$$
\hat{v}(S) = f(\mathbf{x}_S, \mathbf{x}'_{\bar{S}})
$$
The authors also introduce an extension to BShap called **Random Baseline
Shapley** (RBShap) that takes an expectation over a collection of baseline
values drawn according to some distribution $\mathcal{D}$.
$$
\hat{v}(S) = E_{\mathcal{D}}[f(\mathbf{x}_S, \mathbf{x}'_{\bar{S}}]
$$
They show that various Shapley-based methods can be subsumed under RBShap
depending on the choice of $\mathcal{D}$ and for this reason, we treat RBShap
as a unification approach rather than a separate Shapley-based method.
The authors were also the first in the Shapley-based local explanation
literature to explore the theoretical and practical problems with defining the
value function using an observational conditional value function. First,
computing the necessary conditional expectations is computational challenging
and fraught with additional complications. For example, using the training data
to approximate the conditional distributions can be problematic due to
sparsity. Conditioning on the "removed" features can be seen as filtering the
training data down to those observations that agree with the instance being
explained and then taking the expectation over the remaining observations.
Especially when continuous variables are involved, the number of training data
observations that match the instance to be explained is likely small. This
sparsity problem means that the conditional expectation must, practically
speaking, be estimated using some other approximation technique (e.g. one of
the four alternatives noted in @aas_explaining_2020). However, they note
that these techniques either involve additional assumptions or computational
complexity. Second, @sundararajan_many_2020 showed that using an
observational conditional expectation can lead to attributions that, under
certain conditions, violate the Shapley axioms. In particular, they demonstrate
that when features are correlated, a feature that is not functionally used by
the model can receive a non-zero attribution, which violates the dummy axiom.
As we saw earlier, the same argument was previously made in the Shapley-based
global explanation literature, and is directly related to the indirect
influence debate.
### Causal Methods
In contrast to the previous section, the following methods all explicitly
incorporate causal reasoning into the explanation-generating process. The
primary differentiator between these methods are the causal assumptions and
auxiliary causal information required in order to generate the Shapley values.
#### Causal Quantitative Input Influence (QII)
@datta_algorithmic_2016 introduced a family of measures for quantifying how
much the inputs to a system influence the outputs. The ultimate goal is to use
these measures to generate a “transparency report” for individuals subjected to
an automated decision. Reminiscent of trends (both earlier and
contemporaneously) in the relative importance literature, they were interested
in providing measures of influence that take the correlation between inputs
into account. However, they were the first, in both the global and local
Shapley-based explanation literature, to frame this objective in explicitly
causal terms with their **causal QII** method. As the emphasis on treating
the model as an input-output system makes clear, they were concerned with
model-level explanations. They define causal QII using an unconditional value
function and simulate feature removal using the product of the marginal
distributions of removed features.
$$
v(S) = \hat{v}(S) = E_{\Pi_{i \in C} p(\mathbf{X}_i)}[f(\mathbf{x}_S, \mathbf{X}_{\bar{S}})]
$$
Like other Shapley-based methods that approximate an unconditional value
function, explanations derived from causal QII are able to address model-level
counterfactual (rung 3) questions. Although causal QII is limited to
model-level explanations, it does not require any auxiliary information. There
are two main differences between QII and other methods that approximate an
unconditional value function. First, causal QII uses a different
marginalization method than either KernelSHAP or SSV. Second, causal QII is
motivated by taking an explicitly-causal perspective rather than assuming
feature independence in order to simplify computing values associated with an
observational conditional value function. It is this second difference that
@janzing_feature_2019 are honing in on when they argue that the use of an
unconditional value function, as implicitly argued by the creators of causal
QII, is justified by a causal perspective.
#### Asymmetric Shapley Values
@frye_asymmetric_2020 introduced the first method, **Asymmetric
Shapley Values**(ASV) that leverages auxiliary causal information to generate
Shapley-based explanations. They define the value function using an
observational conditional expectation
(@eq-observational-value-function}). Recognizing the difficulty of
providing (and defending) a full graphical causal model, ASV requires only a
partial causal ordering of the features. For example, given a set of features
$X_1, X_2, ..., X_n$, a practitioner may provide the ordering $\{X_1, X_2\}$
indicating that $X_1$ is a causal ancestor of $X_2$. In **Distal
Asymmetric Shapley Values** (d-ASV), this causal information is included by
assigning zero weight to any permutations
(@eq-permutation-shapley-formula})for which $X_2$ precedes $X_1$.
They argue that this aligns with explanations that attribute effects to root
causes. Alternatively, **Proximate Asymmetric Shapley Values** (p-ASV)
assigns zero weight to any permutations for which $X_1$ precedes $X_2$ such
that attributions favor immediate causes. The non-uniform weighting of the
permutations results in values that violate the symmetry axiom, which
corresponds to a quasivalue from cooperative game theory.
```{dot}
//| label: fig-asv-alt
//| fig-cap: Alterative Graphical Causal Model with $\{X_1, X_2\}$
//| file: figures/asv_alt.dot
```
One problem with ASV is that a single causal ordering is consistent with
multiple graphical causal models. For example, the ordering $\{X_1, X_2\}$ is
consistent with both (@fig-chain) and (@fig-asv-alt). If the goal is
to generate attributions that recover the causal relationship between $X_1$ on
$Y$, these differences in the underlying graphical causal models are relevant
(require different conditioning variables to satisfy the backdoor criterion),
but are not accounted for in ASV. As a result, @heskes_causal_2020 note
that even though ASV incorporates causal information, it can sometimes lead to
improper (i.e. counter-intuitive) causal explanations.
The Shapley values generated by ASV do not map cleanly onto the types of
explanatory questions as we have defined based on Pearl-style causality. ASV is
not able to address model-level explanations because the estimated Shapley
values do not align with an unconditional value function. When any partial
causal ordering is provided, ASV is not able to address associative world-level
questions because the weights assigned to different permutations lead to
Shapley values that differ from those where a uniform weighting is applied.
Similarly, these weights lead to values that also do not, in general, match
values based on an interventional conditional value function
@heskes_causal_2020. Although ASV-based Shapley values may provide
valuable insights in other contexts, under our evaluation framework, they
should not be used to provide model explanations.
#### Causal Shapley Values
@heskes_causal_2020 introduced an extension to ASV that leads to Shapley values
that have a proper causal interpretation. Like @janzing_feature_2019, they
define the target Shapley using an interventional conditional value function
(@eq-interventional-value-function). However, their approach is more closely
aligned with @frye_asymmetric_2020 in that they are interested in generating
world-level explanations without requiring a full causal graph. Their key idea
is to use a partial causal ordering of groups of features along with
information about whether features within a group share a common ancestor or
mutually-interact to generate a DAG of components, that is used in-lieu of a
GCM (also a DAG). As a result, the practitioner does not need to provide a full
GCM, but only a causal chain graph, which has a well-defined interventional
formula that they derive using Pearl's do-calculus.
The authors note that one of the main benefits of CSV is that the resulting
explanations are able to differentiate between “direct” and “indirect” causal
effects. These ideas are directly related to the indirect influence debate and
our notion of levels of explanation. A direct causal effect is the causal
effect of a feature on the model’s output. Shapley values based on an
unconditional value function are only able to estimate these direct effects. As
we noted previously, this means that features that are not functionally used by
the model have zero direct effect. In contrast, a feature that is not
functionally used may still have a non-zero indirect causal effect. We prefer
to view these as providing different levels of explanation: a direct causal
effect corresponds to a model-level explanation and an indirect causal effect
corresponds to a world-level explanation.
Another contribution of their work is to clarify that whether a Shapley value
is symmetric or asymmetric is a choice that can be made independently of how
the value function is specified. While this may be obvious from examining
@eq-permutation-shapley-formula and is well-known in the cooperative
game theory literature (asymmetric values are known as quasivalues), it had not
been surfaced previously in the Shapley-based model explanation literature.
CSV has two practical limitations: it requires the explainer to provide
substantial auxiliary causal information and requires approximating conditional
distributions. The first is problematic as this type of auxiliary causal
information simply may not be available because neither the explainer nor the
explainee have sufficient domain expertise to provide the necessary
information. The second limitation is not unique to CSV, but is still relevant
to assessing the correctness of the resulting explanations. One of the
alternatives for approximating the necessary conditional distributions proposed
by @aas_explaining_2020 can be used. However, the considerations with
applying one of these approaches that was discussed earlier still apply. When
the necessary causal information is available and the required conditional
distributions can be approximated, then CSV is a compelling option because it
is able to generate explanations that address all types of model-level
questions as well as associative and interventional world-level questions.
#### Shapley Flow
@wang_shapley_2021 develop **Shapley Flow**(SF), which extends the
set-based Shapley axioms to arbitrary graphs. Like ASV and CSV, they are
interested in an approach that is able to generate world-level explanations,
and like CSV, SF is able to generate both world and model-level explanations.
One of the motivations for SF is that CSV divides the credit between a feature
and it’s causal descendents, which they view as a counter-intuitive attribution
policy. For example, consider a chain (see @fig-chain) where their
critique is that CSV splits the credit that should be assigned to $X_1$ between
$X_1$ and $X_2$. To avoid this issue, they use a rather idiosyncratic game
formulation that requires the explainer to provide a structural causal model.
The graphical causal model associated with the SCM contains nodes for each
feature that is causally-related to the output, whether or not it is
functionally used by the model. The edges in the GCM represent one of two
things: a functional relationship between the feature and the model output, or
a causal relationship between the features whether or not they are used by the
model.
Shapley Flow departs from the typical game formulation, treating source-to-sink
paths in the provided SCM as the players in the game and a partial ordering of
these paths as the coalitions. Attributions are assigned to edges, whereas
other methods assign credit to individual nodes. The attribution for an
individual feature can be computed by summing the attributions of all incoming
edges. The importance of each edge is computed by considering how much the
model output changes when the edge is added. To simulate edge removal, they
introduce the notion of active versus inactive edges. The foreground value is
passed when the edge is active and the background value is passed when it is
inactive. This foreground value is computed using the equation specified by the
SCM. A background value can be a single value or a distribution of values.
These background values are similar to BShap/RBShap @sundararajan_many_2020 and
single reference games and reference distributions from
@merrick_explanation_2020. Using this setup, SF is capable of generating both
model and world-level explanations.
The authors introduce the notion of a “boundary of explanation,” which is a
more flexible way of framing the distinction between model and world-level
explanations. To make things concrete, consider @fig-janzing-model-vs-world.
One boundary of explanation treats $\hat{Y}$ as the sink node and includes the
edges $\{(X_1, \hat{Y}), (X_2, \hat{Y})$. This boundary leads to model-level
explanations. Alternatively, the edges $\{(\tilde{X_1}, X_1), (\tilde{X_2},
X_2)\}$ lead to world-level explanations. One of the Shapley Flow axioms is
boundary consistency, which ensures that the attribution for a given edge is
the same across different explanation boundaries. Because of this axiom, they
assign zero weight to certain orderings and is part of the reason for the
idiosyncratic game formulation.
In principle, the SF framework is capable of generating explanations that
address both model and world-level explanatory questions of all types
(associative, interventional, and counterfactual). However, this power comes at
the cost of requiring a structural causal model. As we saw earlier, an SCM is
composed of a GCM as well as the functional (mathematical) equations governing
the relationships between features. While challenging, it is conceivable that
the explainer or explainee may be able to provide a densible GCM, that is, one
that is consistent with the data as well as their domain expertise. However,
the further practical problem of identifying the functional equations between
features still remains. In their examples provided as part of the appendix,
@wang_shapley_2021 approximate these functional relationships by training
additional models. Each auxiliary model uses an endogenous feature from the GCM
as the outcome and the parents of that feature as the inputs. The number of
auxiliary models that must be trained is equal to the number of endogenous
variables in the proposed GCM. Although SF is quite powerful, these practical
considerations likely make the method infeasible for many use cases.
#### Recursive Shapley Values
@singal_flow-based_2021 introduced an alternative flow-based solution to
the attribution problem using Shapley values called **Recursive Shapley
Values** (RSV). RSV requires a graphical model as well as the functional
relationships between variables; however, the authors note that the
relationships do not need to be causal, allowing for them to capture arbitrary
computation (e.g. a neural network).
RSV shares some similarities with SF, but differs in how the game is formulated
and how the Shapley values are computed. Like SF, RSV treats the provided graph
as a messaging passing system (e.g. foreground and background values), assigns
attributions to edges, and derives from a set of flow-based axioms that mirror
the Shapley axioms. The players are the edges and coalitions are sets of edges,
which more closely mirrors node-based approaches where features are players and
sets of features constitute the coalitions. The final attributions are computed
by combining the Shapley values from a sequence of games defined recursively in
a top-down fashion starting with source nodes. In their view, this is a more
natural way to formulate the game than the idiosyncratic way introduced by
@wang_shapley_2021.
RSV can be used to provide either model or world-level explanations. To
generate model-level explanations, the explainer provides a graph where the
model features are the source nodes and the model output is the sink node.
World-level explanations can be generated by providing a structural causal
model, which may include features not functionally used by the model. RSV
suffers from the same practical limitations of SF, but is also potentially more
computationally expensive. Because RSV is defined recursively, Shapley values
for a sequence of games, rather than for a single game, is required.