Skip to content
generated from SAP/repository-template

Machine learning imputation method to recover the distribution of missing values, based on kNN. This method can be enabled to be used as multiple imputation and provide uncertainty quantification.

License

Notifications You must be signed in to change notification settings

SAP/knn-sampler

REUSE status

Missing-value imputation

This project is a Python application for missing-value imputation and for reproducing the experiments from the publication kNNSampler: Stochastic Imputations for Recovering Missing Value Distributions. It's a joint work by SAP SE and Eurecom with funding support from the ANRT.

knnSampler imputation algorithm

knnSampler is a kNN-based method for missing-value imputation with support for multiple imputation and uncertainty quantification. It aims to preserve the underlying data distribution when imputing missing values (see the publication for more details).

How to cite

If you use knnSampler, please cite the original publication:

@misc{pashmchi2025knnsamplerstochasticimputationsrecovering,
      title={kNNSampler: Stochastic Imputations for Recovering Missing Value Distributions}, 
      author={Parastoo Pashmchi and Jerome Benoit and Motonobu Kanagawa},
      year={2025},
      eprint={2509.08366},
      archivePrefix={arXiv},
      primaryClass={stat.ML},
      url={https://arxiv.org/abs/2509.08366}, 
}

Running the project

Prerequisites

This project requires Python 3.12+ and Poetry 2+.

  1. Clone the repository
git clone <repository_url>
  1. Navigate to the project directory
cd <repository_directory>
  1. Install dependencies
poetry install --no-root

Run algorithms

The project uses a self-documenting configuration file assets/config.conf.

poetry run task main

Runs the main imputation pipeline using assets/config.conf.

Benchmark algorithms

Note: the benchmarking scripts do not have dedicated configuration files. To change benchmark settings, edit the top of benchmark_all.py and benchmark_knnsampler.py.

Benchmark all algorithms

For comparing imputation algorithms with each other.

poetry run task benchmark_all

Benchmark knnSampler

For detailed knnSampler results with different parameter ranges.

poetry run task benchmark_knnsampler

Contributing

This project is open to feature requests and bug reports via GitHub issues. Contributions and feedback are welcome. See CONTRIBUTING.md for details.

Install development dependencies

poetry install --no-root --with dev

Code formatting

poetry run task format

Code linting

poetry run task lint

Code testing

poetry run task test

Install code-quality Git hooks

poetry run pre-commit install

Security / Disclosure

If you find a bug that may pose a security issue, follow our security policy instructions to report it. Do not open GitHub issues for security reports.

Code of Conduct

By participating in this project, you agree to follow our Code of Conduct.

License

Copyright 2025 SAP SE or an SAP affiliate company and knnSampler contributors. See LICENSE for details. Detailed third-party licensing information is available via the REUSE tool.

About

Machine learning imputation method to recover the distribution of missing values, based on kNN. This method can be enabled to be used as multiple imputation and provide uncertainty quantification.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 8