NorskRegnesentral · martinju · Nov 22, 2024 · Nov 20, 2024 · Nov 20, 2024 · Nov 21, 2024
diff --git a/R/approach_copula.R b/R/approach_copula.R
@@ -63,6 +63,7 @@ prepare_data.copula <- function(internal, index_features, ...) {
 
   S <- internal$iter_list[[iter]]$S[index_features, , drop = FALSE]
 
+
   if (causal_sampling) {
     # Casual Shapley values (either symmetric or asymmetric)
 
@@ -73,11 +74,12 @@ prepare_data.copula <- function(internal, index_features, ...) {
     prepare_copula <- ifelse(causal_first_step, prepare_data_copula_cpp, prepare_data_copula_cpp_caus)
 
     # Set if we have to reshape the output of the prepare_gauss function
-    reshape_prepare_gauss_output <- ifelse(causal_first_step, TRUE, FALSE)
+    reshape_prepare_copula_output <- ifelse(causal_first_step, TRUE, FALSE)
 
     # For not the first step, the number of MC samples for causal Shapley values are n_explain, see prepdare_data_causal
     n_MC_samples_updated <- ifelse(causal_first_step, n_MC_samples, n_explain)
 
+
     # Update data when not in the first causal sampling step, see prepdare_data_causal for explanations
     if (!causal_first_step) {
       # Update the `copula.x_explain_gaussian_mat`
@@ -93,12 +95,12 @@ prepare_data.copula <- function(internal, index_features, ...) {
   } else {
     # Regular Shapley values (either symmetric or asymmetric)
 
-    # Set which copula data generating function to use
-    prepare_copula <- prepare_data_copula_cpp
-
     # Set if we have to reshape the output of the prepare_copula function
     reshape_prepare_copula_output <- TRUE
 
+    # Set which copula data generating function to use
+    prepare_copula <- prepare_data_copula_cpp
+
     # Set that the number of updated MC samples, only used when sampling from N(0, 1)
     n_MC_samples_updated <- n_MC_samples
   }

diff --git a/R/shapley_setup.R b/R/shapley_setup.R
@@ -126,8 +126,8 @@ shapley_setup <- function(internal) {
     S_causal_steps_unique <- unique(S_causal_unlist[grepl("\\.S(?!bar)", names(S_causal_unlist), perl = TRUE)]) # Get S
     S_causal_steps_unique <- S_causal_steps_unique[!sapply(S_causal_steps_unique, is.null)] # Remove NULLs
     S_causal_steps_unique <- S_causal_steps_unique[lengths(S_causal_steps_unique) > 0] # Remove extra integer(0)
-    S_causal_steps_unique <- c(list(integer(0)), S_causal_steps_unique, list(seq(n_shapley_values)))
-    S_causal_steps_unique_S <- coalition_matrix_cpp(coalitions = S_causal_steps_unique, m = n_shapley_values)
+    S_causal_steps_unique <- c(list(integer(0)), S_causal_steps_unique, list(seq(n_features)))
+    S_causal_steps_unique_S <- coalition_matrix_cpp(coalitions = S_causal_steps_unique, m = n_features)
 
     # Insert into the internal list
     internal$iter_list[[iter]]$S_causal_steps <- S_causal_steps

diff --git a/README.Rmd b/README.Rmd
@@ -28,24 +28,40 @@ knitr::opts_chunk$set(
 
 ## Brief NEWS
 
-This is `shapr` version 1.0.0, which provides a full suit of new functionality. 
-See  the [NEWS](https://github.com/NorskRegnesentral/shapr/blob/master/NEWS.md) for details
+This is `shapr` version 1.0.0 (Released on GitHub Nov 2024), which provides a full restructuring of the code based, and 
+provides a full suit of new functionality, including:
 
-### Breaking change (June 2023)
+* A long list of approaches for estimating the contribution/value function $v(S)$, including Variational Autoencoders,
+and regression-based methods
+* Iterative Shapley value estimation with convergence detection
+* Parallelized computations with progress updates
+* Reweighted Kernel SHAP for faster convergence
+* New function `explain_forecast()` for explaining forecasts
+* Several other methodological, computational and user-experience improvements
+* Python wrapper making the core functionality of `shapr` available in Python
 
-As of version 0.2.3.9000, the development version of shapr (master branch on GitHub from June 2023) has been severely restructured, introducing a new syntax for explaining models, and thereby introducing a range of breaking changes. This essentially amounts to using a single function (`explain()`) instead of two functions (`shapr()` and `explain()`).
-The CRAN version of `shapr` (v0.2.2) still uses the old syntax. 
+Below we provide a brief overview of the breaking changes.
+See the [NEWS](https://github.com/NorskRegnesentral/shapr/blob/master/NEWS.md) for the full list of  details.
+
+### Breaking changes
+
+The new syntax for explaining models essentially amounts to using a single function (`explain()`) instead of two functions (`shapr()` and `explain()`).
+In addition, custom models are now explained by passing the prediction function directly to `explain()`, 
+some input arguments got new names, and a few functions for edge cases was removed to simplify the code base. 
+
+Note that the CRAN version of `shapr` (v0.2.2) still uses the old syntax. 
 The examples below uses the new syntax. 
 [Here](https://github.com/NorskRegnesentral/shapr/blob/cranversion_0.2.2/README.md) is a version of this README with the syntax of the CRAN version (v0.2.2).
 
 ### Python wrapper
 
-As of version 0.2.3.9100 (master branch on GitHub from June 2023), we provide a Python wrapper (`shaprpy`) which allows explaining python models with the methodology implemented in `shapr`, directly from Python. The wrapper is available [here](https://github.com/NorskRegnesentral/shapr/tree/master/python). See also details in the [NEWS](https://github.com/NorskRegnesentral/shapr/blob/master/NEWS.md).
+We now also provide a Python wrapper (`shaprpy`) which allows explaining python models with the methodology implemented in `shapr`, directly from Python. 
+The wrapper is available [here](https://github.com/NorskRegnesentral/shapr/tree/master/python). 
 
 
 ## The package
 
-The shapr R package implements an enhanced version of the KernelSHAP method, for approximating Shapley values, 
+The `shapr` R package implements an enhanced version of the Kernel SHAP method, for approximating Shapley values, 
 with a strong focus on conditional Shapley values. 
 The core idea is to remain completely model-agnostic while offering a variety of methods for estimating contribution 
 functions, enabling accurate computation of conditional Shapley values across different feature types, dependencies, 
@@ -62,39 +78,48 @@ for details and further examples.
 
 ## Installation
 
-To install the current stable release from CRAN (note, using the old explanation syntax), use
-
-```{r, eval = FALSE}
-install.packages("shapr")
-```
-
-To install the current development version (with the new explanation syntax), use
+We highly recommend to install the development version of shapr (with the new explanation syntax and all functionality),
 
 ```{r, eval = FALSE}
 remotes::install_github("NorskRegnesentral/shapr")
 ```
 
-If you would like to install all packages of the models we currently support, use
+To also install all dependencies, use
 
 ```{r, eval = FALSE}
 remotes::install_github("NorskRegnesentral/shapr", dependencies = TRUE)
 ```
 
+**The CRAN version of `shapr` (NOT RECOMMENDED) can be installed with**
 
-If you would also like to build and view the vignette locally, use 
 ```{r, eval = FALSE}
-remotes::install_github("NorskRegnesentral/shapr", dependencies = TRUE, build_vignettes = TRUE)
-vignette("understanding_shapr", "shapr")
+install.packages("shapr")
 ```
 
-You can always check out the latest version of the vignette [here](https://norskregnesentral.github.io/shapr/articles/understanding_shapr.html). 
 
 ## Example
-`shapr` supports computation of Shapley values with any predictive model which takes a set of numeric features and produces a numeric outcome. 
+`shapr` supports computation of Shapley values with any predictive model which takes a set of numeric features and 
+produces a numeric outcome. 
 
-The following example shows how a simple `xgboost` model is trained using the *airquality* dataset, and how `shapr` explains the individual predictions. 
+The following example shows how a simple `xgboost` model is trained using the *airquality* dataset, and how `shapr` 
+explains the individual predictions. 
 
+We first enable parallel computation and progress updates with the following code chunk. 
+These are optional, but recommended for improved performance and user friendliness, 
+particularly for problems with many features.
+
+```{r init_no_eval,eval = FALSE}
+# Enable parallel computation
+# Requires the future and future_lapply packages
+future::plan("multisession", workers = 2) # Increase the number of workers for increased performance with many features
+
+# Enable progress updates of the v(S)-computations
+# Requires the progressr package
+progressr::handlers(global = TRUE)
+handlers("cli") # Using the cli package as backend (recommended for the estimates of the remaining time)
+```
 
+Here comes the actual example
 ```{r basic_example, warning = FALSE}
 library(xgboost)
 library(shapr)

diff --git a/README.md b/README.md
@@ -18,37 +18,51 @@ MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.or
 
 ## Brief NEWS
 
-This is `shapr` version 1.0.0, which provides a full suit of new
-functionality. See the
+This is `shapr` version 1.0.0 (Released on GitHub Nov 2024), which
+provides a full restructuring of the code based, and provides a full
+suit of new functionality, including:
+
+- A long list of approaches for estimating the contribution/value
+  function $v(S)$, including Variational Autoencoders, and
+  regression-based methods
+- Iterative Shapley value estimation with convergence detection
+- Parallelized computations with progress updates
+- Reweighted Kernel SHAP for faster convergence
+- New function `explain_forecast()` for explaining forecasts
+- Several other methodological, computational and user-experience
+  improvements
+- Python wrapper making the core functionality of `shapr` available in
+  Python
+
+Below we provide a brief overview of the breaking changes. See the
 [NEWS](https://github.com/NorskRegnesentral/shapr/blob/master/NEWS.md)
-for details
+for the full list of details.
 
-### Breaking change (June 2023)
+### Breaking changes
 
-As of version 0.2.3.9000, the development version of shapr (master
-branch on GitHub from June 2023) has been severely restructured,
-introducing a new syntax for explaining models, and thereby introducing
-a range of breaking changes. This essentially amounts to using a single
-function (`explain()`) instead of two functions (`shapr()` and
-`explain()`). The CRAN version of `shapr` (v0.2.2) still uses the old
+The new syntax for explaining models essentially amounts to using a
+single function (`explain()`) instead of two functions (`shapr()` and
+`explain()`). In addition, custom models are now explained by passing
+the prediction function directly to `explain()`, some input arguments
+got new names, and a few functions for edge cases was removed to
+simplify the code base.
+
+Note that the CRAN version of `shapr` (v0.2.2) still uses the old
 syntax. The examples below uses the new syntax.
 [Here](https://github.com/NorskRegnesentral/shapr/blob/cranversion_0.2.2/README.md)
 is a version of this README with the syntax of the CRAN version
 (v0.2.2).
 
 ### Python wrapper
 
-As of version 0.2.3.9100 (master branch on GitHub from June 2023), we
-provide a Python wrapper (`shaprpy`) which allows explaining python
-models with the methodology implemented in `shapr`, directly from
+We now also provide a Python wrapper (`shaprpy`) which allows explaining
+python models with the methodology implemented in `shapr`, directly from
 Python. The wrapper is available
 [here](https://github.com/NorskRegnesentral/shapr/tree/master/python).
-See also details in the
-[NEWS](https://github.com/NorskRegnesentral/shapr/blob/master/NEWS.md).
 
 ## The package
 
-The shapr R package implements an enhanced version of the KernelSHAP
+The `shapr` R package implements an enhanced version of the Kernel SHAP
 method, for approximating Shapley values, with a strong focus on
 conditional Shapley values. The core idea is to remain completely
 model-agnostic while offering a variety of methods for estimating
@@ -68,37 +82,25 @@ for details and further examples.
 
 ## Installation
 
-To install the current stable release from CRAN (note, using the old
-explanation syntax), use
-
-``` r
-install.packages("shapr")
-```
-
-To install the current development version (with the new explanation
-syntax), use
+We highly recommend to install the development version of shapr (with
+the new explanation syntax and all functionality),
 
 ``` r
 remotes::install_github("NorskRegnesentral/shapr")
 ```
 
-If you would like to install all packages of the models we currently
-support, use
+To also install all dependencies, use
 
 ``` r
 remotes::install_github("NorskRegnesentral/shapr", dependencies = TRUE)
 ```
 
-If you would also like to build and view the vignette locally, use
+**The CRAN version of `shapr` (NOT RECOMMENDED) can be installed with**
 
 ``` r
-remotes::install_github("NorskRegnesentral/shapr", dependencies = TRUE, build_vignettes = TRUE)
-vignette("understanding_shapr", "shapr")
+install.packages("shapr")
 ```
 
-You can always check out the latest version of the vignette
-[here](https://norskregnesentral.github.io/shapr/articles/understanding_shapr.html).
-
 ## Example
 
 `shapr` supports computation of Shapley values with any predictive model
@@ -108,6 +110,24 @@ The following example shows how a simple `xgboost` model is trained
 using the *airquality* dataset, and how `shapr` explains the individual
 predictions.
 
+We first enable parallel computation and progress updates with the
+following code chunk. These are optional, but recommended for improved
+performance and user friendliness, particularly for problems with many
+features.
+
+``` r
+# Enable parallel computation
+# Requires the future and future_lapply packages
+future::plan("multisession", workers = 2) # Increase the number of workers for increased performance with many features
+
+# Enable progress updates of the v(S)-computations
+# Requires the progressr package
+progressr::handlers(global = TRUE)
+handlers("cli") # Using the cli package as backend (recommended for the estimates of the remaining time)
+```
+
+Here comes the actual example
+
 ``` r
 library(xgboost)
 library(shapr)
@@ -158,14 +178,14 @@ explanation <- explain(
 #> max_n_coalitions is NULL or larger than or 2^n_features = 16, 
 #> and is therefore set to 2^n_features = 16.
 #> 
-#> ── Starting `shapr::explain()` at 2024-10-23 19:31:59 ──────────────────────────
+#> ── Starting `shapr::explain()` at 2024-11-20 12:23:18 ──────────────────────────
 #> • Model class: <xgb.Booster>
 #> • Approach: empirical
 #> • Iterative estimation: FALSE
 #> • Number of feature-wise Shapley values: 4
 #> • Number of observations to explain: 6
 #> • Computations (temporary) saved at:
-#> '/tmp/Rtmp6d4Iza/shapr_obj_3be21200fd9e8.rds'
+#> '/tmp/Rtmp4yBCHY/shapr_obj_17459f7fdc4b8f.rds'
 #> 
 #> ── Main computation started ──
 #>