Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update vignettes ++ #421

Merged
merged 14 commits into from
Nov 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions R/approach_copula.R
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ prepare_data.copula <- function(internal, index_features, ...) {

S <- internal$iter_list[[iter]]$S[index_features, , drop = FALSE]


if (causal_sampling) {
# Casual Shapley values (either symmetric or asymmetric)

Expand All @@ -73,11 +74,12 @@ prepare_data.copula <- function(internal, index_features, ...) {
prepare_copula <- ifelse(causal_first_step, prepare_data_copula_cpp, prepare_data_copula_cpp_caus)

# Set if we have to reshape the output of the prepare_gauss function
reshape_prepare_gauss_output <- ifelse(causal_first_step, TRUE, FALSE)
reshape_prepare_copula_output <- ifelse(causal_first_step, TRUE, FALSE)

# For not the first step, the number of MC samples for causal Shapley values are n_explain, see prepdare_data_causal
n_MC_samples_updated <- ifelse(causal_first_step, n_MC_samples, n_explain)


# Update data when not in the first causal sampling step, see prepdare_data_causal for explanations
if (!causal_first_step) {
# Update the `copula.x_explain_gaussian_mat`
Expand All @@ -93,12 +95,12 @@ prepare_data.copula <- function(internal, index_features, ...) {
} else {
# Regular Shapley values (either symmetric or asymmetric)

# Set which copula data generating function to use
prepare_copula <- prepare_data_copula_cpp

# Set if we have to reshape the output of the prepare_copula function
reshape_prepare_copula_output <- TRUE

# Set which copula data generating function to use
prepare_copula <- prepare_data_copula_cpp

# Set that the number of updated MC samples, only used when sampling from N(0, 1)
n_MC_samples_updated <- n_MC_samples
}
Expand Down
4 changes: 2 additions & 2 deletions R/shapley_setup.R
Original file line number Diff line number Diff line change
Expand Up @@ -126,8 +126,8 @@ shapley_setup <- function(internal) {
S_causal_steps_unique <- unique(S_causal_unlist[grepl("\\.S(?!bar)", names(S_causal_unlist), perl = TRUE)]) # Get S
S_causal_steps_unique <- S_causal_steps_unique[!sapply(S_causal_steps_unique, is.null)] # Remove NULLs
S_causal_steps_unique <- S_causal_steps_unique[lengths(S_causal_steps_unique) > 0] # Remove extra integer(0)
S_causal_steps_unique <- c(list(integer(0)), S_causal_steps_unique, list(seq(n_shapley_values)))
S_causal_steps_unique_S <- coalition_matrix_cpp(coalitions = S_causal_steps_unique, m = n_shapley_values)
S_causal_steps_unique <- c(list(integer(0)), S_causal_steps_unique, list(seq(n_features)))
S_causal_steps_unique_S <- coalition_matrix_cpp(coalitions = S_causal_steps_unique, m = n_features)

# Insert into the internal list
internal$iter_list[[iter]]$S_causal_steps <- S_causal_steps
Expand Down
67 changes: 46 additions & 21 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -28,24 +28,40 @@ knitr::opts_chunk$set(

## Brief NEWS

This is `shapr` version 1.0.0, which provides a full suit of new functionality.
See the [NEWS](https://github.com/NorskRegnesentral/shapr/blob/master/NEWS.md) for details
This is `shapr` version 1.0.0 (Released on GitHub Nov 2024), which provides a full restructuring of the code based, and
provides a full suit of new functionality, including:

### Breaking change (June 2023)
* A long list of approaches for estimating the contribution/value function $v(S)$, including Variational Autoencoders,
and regression-based methods
* Iterative Shapley value estimation with convergence detection
* Parallelized computations with progress updates
* Reweighted Kernel SHAP for faster convergence
* New function `explain_forecast()` for explaining forecasts
* Several other methodological, computational and user-experience improvements
* Python wrapper making the core functionality of `shapr` available in Python

As of version 0.2.3.9000, the development version of shapr (master branch on GitHub from June 2023) has been severely restructured, introducing a new syntax for explaining models, and thereby introducing a range of breaking changes. This essentially amounts to using a single function (`explain()`) instead of two functions (`shapr()` and `explain()`).
The CRAN version of `shapr` (v0.2.2) still uses the old syntax.
Below we provide a brief overview of the breaking changes.
See the [NEWS](https://github.com/NorskRegnesentral/shapr/blob/master/NEWS.md) for the full list of details.

### Breaking changes

The new syntax for explaining models essentially amounts to using a single function (`explain()`) instead of two functions (`shapr()` and `explain()`).
In addition, custom models are now explained by passing the prediction function directly to `explain()`,
some input arguments got new names, and a few functions for edge cases was removed to simplify the code base.

Note that the CRAN version of `shapr` (v0.2.2) still uses the old syntax.
The examples below uses the new syntax.
[Here](https://github.com/NorskRegnesentral/shapr/blob/cranversion_0.2.2/README.md) is a version of this README with the syntax of the CRAN version (v0.2.2).

### Python wrapper

As of version 0.2.3.9100 (master branch on GitHub from June 2023), we provide a Python wrapper (`shaprpy`) which allows explaining python models with the methodology implemented in `shapr`, directly from Python. The wrapper is available [here](https://github.com/NorskRegnesentral/shapr/tree/master/python). See also details in the [NEWS](https://github.com/NorskRegnesentral/shapr/blob/master/NEWS.md).
We now also provide a Python wrapper (`shaprpy`) which allows explaining python models with the methodology implemented in `shapr`, directly from Python.
The wrapper is available [here](https://github.com/NorskRegnesentral/shapr/tree/master/python).


## The package

The shapr R package implements an enhanced version of the KernelSHAP method, for approximating Shapley values,
The `shapr` R package implements an enhanced version of the Kernel SHAP method, for approximating Shapley values,
with a strong focus on conditional Shapley values.
The core idea is to remain completely model-agnostic while offering a variety of methods for estimating contribution
functions, enabling accurate computation of conditional Shapley values across different feature types, dependencies,
Expand All @@ -62,39 +78,48 @@ for details and further examples.

## Installation

To install the current stable release from CRAN (note, using the old explanation syntax), use

```{r, eval = FALSE}
install.packages("shapr")
```

To install the current development version (with the new explanation syntax), use
We highly recommend to install the development version of shapr (with the new explanation syntax and all functionality),

```{r, eval = FALSE}
remotes::install_github("NorskRegnesentral/shapr")
```

If you would like to install all packages of the models we currently support, use
To also install all dependencies, use

```{r, eval = FALSE}
remotes::install_github("NorskRegnesentral/shapr", dependencies = TRUE)
```

**The CRAN version of `shapr` (NOT RECOMMENDED) can be installed with**

If you would also like to build and view the vignette locally, use
```{r, eval = FALSE}
remotes::install_github("NorskRegnesentral/shapr", dependencies = TRUE, build_vignettes = TRUE)
vignette("understanding_shapr", "shapr")
install.packages("shapr")
```

You can always check out the latest version of the vignette [here](https://norskregnesentral.github.io/shapr/articles/understanding_shapr.html).

## Example
`shapr` supports computation of Shapley values with any predictive model which takes a set of numeric features and produces a numeric outcome.
`shapr` supports computation of Shapley values with any predictive model which takes a set of numeric features and
produces a numeric outcome.

The following example shows how a simple `xgboost` model is trained using the *airquality* dataset, and how `shapr` explains the individual predictions.
The following example shows how a simple `xgboost` model is trained using the *airquality* dataset, and how `shapr`
explains the individual predictions.

We first enable parallel computation and progress updates with the following code chunk.
These are optional, but recommended for improved performance and user friendliness,
particularly for problems with many features.

```{r init_no_eval,eval = FALSE}
# Enable parallel computation
# Requires the future and future_lapply packages
future::plan("multisession", workers = 2) # Increase the number of workers for increased performance with many features

# Enable progress updates of the v(S)-computations
# Requires the progressr package
progressr::handlers(global = TRUE)
handlers("cli") # Using the cli package as backend (recommended for the estimates of the remaining time)
```

Here comes the actual example
```{r basic_example, warning = FALSE}
library(xgboost)
library(shapr)
Expand Down
90 changes: 55 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,37 +18,51 @@ MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.or

## Brief NEWS

This is `shapr` version 1.0.0, which provides a full suit of new
functionality. See the
This is `shapr` version 1.0.0 (Released on GitHub Nov 2024), which
provides a full restructuring of the code based, and provides a full
suit of new functionality, including:

- A long list of approaches for estimating the contribution/value
function $v(S)$, including Variational Autoencoders, and
regression-based methods
- Iterative Shapley value estimation with convergence detection
- Parallelized computations with progress updates
- Reweighted Kernel SHAP for faster convergence
- New function `explain_forecast()` for explaining forecasts
- Several other methodological, computational and user-experience
improvements
- Python wrapper making the core functionality of `shapr` available in
Python

Below we provide a brief overview of the breaking changes. See the
[NEWS](https://github.com/NorskRegnesentral/shapr/blob/master/NEWS.md)
for details
for the full list of details.

### Breaking change (June 2023)
### Breaking changes

As of version 0.2.3.9000, the development version of shapr (master
branch on GitHub from June 2023) has been severely restructured,
introducing a new syntax for explaining models, and thereby introducing
a range of breaking changes. This essentially amounts to using a single
function (`explain()`) instead of two functions (`shapr()` and
`explain()`). The CRAN version of `shapr` (v0.2.2) still uses the old
The new syntax for explaining models essentially amounts to using a
single function (`explain()`) instead of two functions (`shapr()` and
`explain()`). In addition, custom models are now explained by passing
the prediction function directly to `explain()`, some input arguments
got new names, and a few functions for edge cases was removed to
simplify the code base.

Note that the CRAN version of `shapr` (v0.2.2) still uses the old
syntax. The examples below uses the new syntax.
[Here](https://github.com/NorskRegnesentral/shapr/blob/cranversion_0.2.2/README.md)
is a version of this README with the syntax of the CRAN version
(v0.2.2).

### Python wrapper

As of version 0.2.3.9100 (master branch on GitHub from June 2023), we
provide a Python wrapper (`shaprpy`) which allows explaining python
models with the methodology implemented in `shapr`, directly from
We now also provide a Python wrapper (`shaprpy`) which allows explaining
python models with the methodology implemented in `shapr`, directly from
Python. The wrapper is available
[here](https://github.com/NorskRegnesentral/shapr/tree/master/python).
See also details in the
[NEWS](https://github.com/NorskRegnesentral/shapr/blob/master/NEWS.md).

## The package

The shapr R package implements an enhanced version of the KernelSHAP
The `shapr` R package implements an enhanced version of the Kernel SHAP
method, for approximating Shapley values, with a strong focus on
conditional Shapley values. The core idea is to remain completely
model-agnostic while offering a variety of methods for estimating
Expand All @@ -68,37 +82,25 @@ for details and further examples.

## Installation

To install the current stable release from CRAN (note, using the old
explanation syntax), use

``` r
install.packages("shapr")
```

To install the current development version (with the new explanation
syntax), use
We highly recommend to install the development version of shapr (with
the new explanation syntax and all functionality),

``` r
remotes::install_github("NorskRegnesentral/shapr")
```

If you would like to install all packages of the models we currently
support, use
To also install all dependencies, use

``` r
remotes::install_github("NorskRegnesentral/shapr", dependencies = TRUE)
```

If you would also like to build and view the vignette locally, use
**The CRAN version of `shapr` (NOT RECOMMENDED) can be installed with**

``` r
remotes::install_github("NorskRegnesentral/shapr", dependencies = TRUE, build_vignettes = TRUE)
vignette("understanding_shapr", "shapr")
install.packages("shapr")
```

You can always check out the latest version of the vignette
[here](https://norskregnesentral.github.io/shapr/articles/understanding_shapr.html).

## Example

`shapr` supports computation of Shapley values with any predictive model
Expand All @@ -108,6 +110,24 @@ The following example shows how a simple `xgboost` model is trained
using the *airquality* dataset, and how `shapr` explains the individual
predictions.

We first enable parallel computation and progress updates with the
following code chunk. These are optional, but recommended for improved
performance and user friendliness, particularly for problems with many
features.

``` r
# Enable parallel computation
# Requires the future and future_lapply packages
future::plan("multisession", workers = 2) # Increase the number of workers for increased performance with many features

# Enable progress updates of the v(S)-computations
# Requires the progressr package
progressr::handlers(global = TRUE)
handlers("cli") # Using the cli package as backend (recommended for the estimates of the remaining time)
```

Here comes the actual example

``` r
library(xgboost)
library(shapr)
Expand Down Expand Up @@ -158,14 +178,14 @@ explanation <- explain(
#> max_n_coalitions is NULL or larger than or 2^n_features = 16,
#> and is therefore set to 2^n_features = 16.
#>
#> ── Starting `shapr::explain()` at 2024-10-23 19:31:59 ──────────────────────────
#> ── Starting `shapr::explain()` at 2024-11-20 12:23:18 ──────────────────────────
#> • Model class: <xgb.Booster>
#> • Approach: empirical
#> • Iterative estimation: FALSE
#> • Number of feature-wise Shapley values: 4
#> • Number of observations to explain: 6
#> • Computations (temporary) saved at:
#> '/tmp/Rtmp6d4Iza/shapr_obj_3be21200fd9e8.rds'
#> '/tmp/Rtmp4yBCHY/shapr_obj_17459f7fdc4b8f.rds'
#>
#> ── Main computation started ──
#>
Expand Down
Loading
Loading