xregion - Optimally Informative and Robust Explanations for Boosted Trees

1. Introduction

xregion is a tool which computes an optimal explanation for a prediction for a Boosted Tree. More specifically, given an input instance x with predicted class c, output an explanation in the form of a set of intervals on each input feature such that any other instance y which has feature values within the intervals of the explanation is always classified as c. This is the implementation of the procedures outlined in the Honours thesis.

Currently, xregions supports XGBoost trees trained with the binary::logistic and multi::softprob objectives. Categorical features are not supported.

2. Usage

First, install all requirements. A working installation of the Z3 theorem prover is required.

Save the desired XGBoost Tree as a .json file using the save_model method of the classifier and put it into the ./models folder. To define the minimum and maximum values of each feature's domain, create a .lims file with the same name as the model.json file and put it into the models folder. Each line of the .lims file should have the format <feature_id>, <domain_min>, <domain_max>. The feature_id field should start from zero and be sequential with no numbers missing.

To compute an optimal explanation for an instance, use the command:

python xregions.py -m <model_name> -e ([v1, v2, ..., vm]|random) --seed-gen <seed_generation_method>

where v1, v2, ... etc are the feature values of the input instance. To compute the optimal explanation for an arbitrary instance, use "random" instead of inputting an array.

The seed region generation method specifies which algorithm is used to generate regions in each iteration. This can take the values "ucs", "maxsat", or "incrmaxsat". The details of these approaches are provided in the thesis.

The following command performs the experiment performed in the thesis. All models in the ./models folder are benchmarked in a multiprocessed manner.

python xregions.py -m <any> --benchmark-all

Note that the -m flag has no effect on the result. All models are benchmarked.

3. References

Liffiton, M.H., Previti, A., Malik, A. et al. Fast, flexible MUS enumeration. Constraints 21, 223–250 (2016). https://doi.org/10.1007/s10601-015-9183-0

Ignatiev, A., Izza, Y., Stuckey, P. J., & Marques-Silva, J. (2022). Using MaxSAT for Efficient Explanations of Tree Ensembles. Proceedings of the AAAI Conference on Artificial Intelligence, 36(4), 3776-3785. https://doi.org/10.1609/aaai.v36i4.20292

Ignatiev, A., Narodytska, N., Asher, N., & Marques-Silva, J. (2020). On Relating 'Why?' and 'Why Not?' Explanations. ArXiv, abs/2012.11067.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
benchmark		benchmark
models		models
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
train_models.py		train_models.py
xregions.py		xregions.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

xregion - Optimally Informative and Robust Explanations for Boosted Trees

1. Introduction

2. Usage

3. References

About

Releases

Packages

Languages

License

ruiwynt/optimal-explanations

Folders and files

Latest commit

History

Repository files navigation

xregion - Optimally Informative and Robust Explanations for Boosted Trees

1. Introduction

2. Usage

3. References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages