Implementation of Training Data Attribution (TDA) methods using PyTorch - namely torch.func.
TDA methods attempt to attribute a score to training points in relation to how important or influential they are for the prediction of a given test point.
For now I have only implemented the simple gradient similarity as proposed in [4] which also is a key element of TracIn [2] as a warmup exercise but I plan on working towards an implementation of influence functions that utilize Arnoldi iterations to efficiently estimate the inverse Hessian as done in [3].
[1] Koh, P. W., & Liang, P. (2017, July). Understanding black-box predictions via influence functions. In International conference on machine learning (pp. 1885-1894). PMLR.
[2] Pruthi, G., Liu, F., Kale, S., & Sundararajan, M. (2020). Estimating training data influence by tracing gradient descent. Advances in Neural Information Processing Systems, 33, 19920-19930.
[3] Schioppa, A., Zablotskaia, P., Vilar, D., & Sokolov, A. (2022, June). Scaling up influence functions. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 36, No. 8, pp. 8179-8186).
[4] Charpiat, G., Girard, N., Felardos, L., & Tarabalka, Y. (2019). Input similarity from the neural network perspective. Advances in Neural Information Processing Systems, 32.