This repository accompanies the AISTATS'24 paper: "DAGnosis: Localized Identification of Data Inconsistencies using Structures".
We suggest creating a new environment before using the code, e.g. with:
conda create --name dagnosis python=3.10
We can then install the package from source:
pip install .
We illustrate how to use DAGnosis in a synthetic setup, via the files in the folder experiments/synthetic
.
The bash scripts run_linear.sh
and run_mlp.sh
run the full pipeline: generate the data, train the conformal estimators, and test the conformal estimators, for linear and MLP SEMs respectively. The bash commands for these must be run from inside the experiments/synthetic
directory.
To compute the inconsistency detection metrics (F1, Precision, Recall), go to the folder experiments/synthetic
and run:
python compute_metrics.py PATH_SAVE_METRIC=path_metrics
where path_metrics
denotes the folder where the metrics are saved.
Similarly, you can reproduce the sensitivity experiment by going to the folder experiments/synthetic/sensitivity
and using the script run.sh
, followed by
python compute_metrics.py PATH_SAVE_METRIC=path_metrics
To run the experiments on the UCI Adult Income dataset, go to the folder experiments/adult
.
In order to train and test the conformal estimators, run
python train_test_adult.py
The artifacts will be saved in the folder artifacts_adult
.
Then, the results can be obtained by executing:
python proportion_flagging.py
which will print the list of downstream accuracies and proportions of samples flagged (Figure 3 a) and b)).
If you use this software, please cite the original paper:
TODO