-
Notifications
You must be signed in to change notification settings - Fork 0
/
response.tex
23 lines (12 loc) · 3.59 KB
/
response.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
\begin{center}
\large
Response Letter
\end{center}
We thank the reviewers for their useful comments and feedback on improving the manuscript and the presentation. In the following, we summarize changes reflected in the revised manuscript.
\paragraph{Distribution shifts in life-long learning.} We propose Hamming distance heuristics in {\imli} with the assumption that distribution remains same across batches. One way to account for distribution shifts is to consider last $ p $ $ (> 1) $ batches instead of the (single) previous batch in the objective function in mini-batch learning. For a feature variable $ B^i_j $, we consider its majority assignment in last $ p $ classification rules and encode as a soft clause to retain the majority assignment in the current batch. Moreover, we can reweigh the soft clause by prioritizing assignments of $ B^i_j $ in recent batches. We mention this in Section~\ref{interpretability_imli_sec:incremental_learning}, page 28.
%When distribution changes, the classifier $ \mathcal{R}_{i} $ for the $ i^ $
\paragraph{Discussions of related works in experiments.}{\imli} scales well than existing interpretable rule-based classifiers because of the incremental solving approach. Most existing works rely on rule-mining of potential classification rules followed by an optimization algorithm such as Bayesian optimization and branch and bound algorithm. In experiments, we observe the incremental learning of {\imli} to be more scalable than the state-of-the-art methods. We elaborate this in Section~\ref{interpretability_imli_sec:experiments_scalability}, page 39.
\paragraph{How does {\justicia} outperform a direct approach by learning a Bayesian Network on limited samples?} We agree with the reviewer that learning a Bayesian Network on limited samples does not always result in a better robustness (e.g., less standard deviation of fairness metrics estimation) of {\justicia} than the direct approach of estimating fairness on a dataset.
Elaborately, in Chapter~\ref{chapter:justicia} Figure~\ref{fairness_justicia_fig:sample-size}, {\justicia} demonstrates higher robustness than the direct approach, where we consider a specific distribution of non-sensitive features conditioning only on sensitive features. Thus, Figure~\ref{fairness_justicia_fig:sample-size} does not involve experiments with Bayesian Network capturing correlations of all features, which we indeed introduce later in Chapter~\ref{chapter:fvgm}. In our revised experiment, we observe that {\justicia} with Bayesian network, called {\fvgm}, exhibits less robustness due to Bayesian network learning.
\paragraph{Clarification on Fairness Influence Function (FIF).} Our additive axiom for FIFs is based on the idea of decomposing the total unfairness of the classifiers among different subsets of features~\cite{begley2020explainability,lundberg2020explaining}. We consider that the sum of FIFs of all subsets of non-sensitive features is equal \textit{to the resultant unfairness of the classifier}, where unfairness is a real number in $ [0,1] $, such as statistical parity, equalized odds, and predictive parity.
\paragraph{Clarification on Bayesian Network.} We consider a Bayesian network as an input distribution to express the conditional dependencies and independencies among features. In Chapter~\ref{chapter:CNF_feature_correlation}, we demonstrate the encoding of probabilistic inference into SSAT via additional Boolean variables and clauses~\cite{chavira2008probabilistic}. We also discuss the complexity of the SSAT encoding in terms of the complexity of the Bayesian network (ref.~\ref{chapter_fairness_preliminaries_BN}, page 15).