A Python library for generating missing values in complete datasets (i.e. amputation) and exploration of incomplete datasets.
Check out the documentation and find examples!
Amputation is the opposite of imputation: the generation of missing values in complete datasets. This is useful for evaluating the effect of missing values in your model, mostly in experimental settings, but also as a preprocessing step in developing models.
Our MultivariateAmputation class is compatible with the scikit-learn-style fit
and transform
paradigm and can be used in a scikit-learn Pipeline
.
The underlying methodology has been proposed by Schouten, Lugtig and Vink (2018) and has been implemented in an R-function as well: mice::ampute. Compared to ampute
, pyampute
's parameters are easier to specify and allow for more variation. See this blogpost to learn more.
import numpy as np
from pyampute.ampute import MultivariateAmputation
n = 1000
m = 10
rng = np.random.default_rng()
X_compl = rng.standard_normal((m, n))
ma = MultivariateAmputation()
X_incompl = ma.fit_transform(X_compl)
Among others, we also provide an mdPatterns class, which displays missing data patterns in incomplete datasets.
from pyampute.exploration.md_patterns import mdPatterns
mdp = mdPatterns()
patterns = mdp.get_patterns(X_incompl)
pip install pyampute
git clone https://github.com/RianneSchouten/pyampute.git pip install ./pyampute
BSD 3-Clause License
@misc{schouten_rianne_m_2022_6946887,
author = {Schouten, Rianne M and
Zamanzadeh, Davina and
Singh, Prabhant},
title = {pyampute: a Python library for data amputation},
month = aug,
year = 2022,
publisher = {Zenodo},
doi = {10.25080/majora-212e5952-03e},
url = {https://doi.org/10.25080/majora-212e5952-03e}
}
@article{Schouten2018,
title={Generating missing values for simulation purposes: {A} multivariate amputation procedure},
author={Schouten, Rianne M. and Lugtig, Peter and Vink, Gerko},
journal={Journal of Statistical Computation and Simulation},
volume={88},
number={15},
pages={2909--2930},
year={2018}
}
Watch our SciPy'22 presentation here.
For questions, comments and if you would like to contribute, please do not hesitate to contact us. You can find our contact details here.
Cheers,