This project is a Python application for missing-value imputation and for reproducing the experiments from the publication kNNSampler: Stochastic Imputations for Recovering Missing Value Distributions. It's a joint work by SAP SE and Eurecom with funding support from the ANRT.
knnSampler is a kNN-based method for missing-value imputation with support for multiple imputation and uncertainty quantification. It aims to preserve the underlying data distribution when imputing missing values (see the publication for more details).
If you use knnSampler, please cite the original publication:
@misc{pashmchi2025knnsamplerstochasticimputationsrecovering,
title={kNNSampler: Stochastic Imputations for Recovering Missing Value Distributions},
author={Parastoo Pashmchi and Jerome Benoit and Motonobu Kanagawa},
year={2025},
eprint={2509.08366},
archivePrefix={arXiv},
primaryClass={stat.ML},
url={https://arxiv.org/abs/2509.08366},
}
This project requires Python 3.12+ and Poetry 2+.
- Clone the repository
git clone <repository_url>
- Navigate to the project directory
cd <repository_directory>
- Install dependencies
poetry install --no-root
The project uses a self-documenting configuration file assets/config.conf
.
poetry run task main
Runs the main imputation pipeline using assets/config.conf
.
Note: the benchmarking scripts do not have dedicated configuration files. To change benchmark settings, edit the top of benchmark_all.py and benchmark_knnsampler.py.
For comparing imputation algorithms with each other.
poetry run task benchmark_all
For detailed knnSampler results with different parameter ranges.
poetry run task benchmark_knnsampler
This project is open to feature requests and bug reports via GitHub issues. Contributions and feedback are welcome. See CONTRIBUTING.md for details.
poetry install --no-root --with dev
poetry run task format
poetry run task lint
poetry run task test
poetry run pre-commit install
If you find a bug that may pose a security issue, follow our security policy instructions to report it. Do not open GitHub issues for security reports.
By participating in this project, you agree to follow our Code of Conduct.
Copyright 2025 SAP SE or an SAP affiliate company and knnSampler contributors. See LICENSE for details. Detailed third-party licensing information is available via the REUSE tool.