Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated setup and readme. #21

Merged
merged 1 commit into from
Oct 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 30 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,60 +1,69 @@
![logo](https://github.com/rvandewater/ReciPys/blob/development/docs/figures/recipys_logo.png?raw=true)

# 🥧ReciPys🐍

[![CI](https://github.com/rvandewater/recipys/actions/workflows/ci.yml/badge.svg)](https://github.com/rvandewater/recipys/actions/workflows/ci.yml)
[![Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
![Platform](https://img.shields.io/badge/platform-linux--64%20|%20win--64%20|%20osx--64-lightgrey)
[![License](https://img.shields.io/badge/license-MIT-green)](LICENSE)
[![PyPI version shields.io](https://img.shields.io/pypi/v/recipies.svg)](https://pypi.python.org/pypi/recipies/)
[![arXiv](https://img.shields.io/badge/arXiv-2306.05109-b31b1b.svg)](http://arxiv.org/abs/2306.05109)

The ReciPys package is a preprocessing framework operating on Pandas dataframes.
The ReciPys package is a preprocessing framework operating on [Polars](https://github.com/pola-rs/polars)
and [Pandas](https://github.com/pandas-dev/pandas) dataframes. The backend can be chosen by the user.
The operation of this package is inspired by the R-package [recipes](https://recipes.tidymodels.org/).
This package allows the user to apply a number of extensible operations for imputation, feature generation/extraction,
scaling, and encoding.
This package allows the user to apply a number of extensible operations for imputation, feature generation/extraction,
scaling, and encoding.
It operates on modified Dataframe objects from the established data science package Pandas.
## Installation

## Installation

You can install ReciPys from pip using:

```
pip install recipies
```

> Note that the package is called `recipies` and not `recipys` on pip due to a name clash with an existing package.
>
>
You can install ReciPys from source to ensure you have the latest version:

```
conda env update -f environment.yml
conda activate recipys
pip install -e .
```

> Note that the last command installs the package called `recipies`.

## Usage
To define preprocessing operations, one has to supply _roles_ to the different columns of the Dataframe.

To define preprocessing operations, one has to supply _roles_ to the different columns of the Dataframe.
This allows the user to create groups of columns which have a particular function.
Then, we provide several "steps" that can be applied to the datasets, among which: Historical accumulation,
Resampling the time resolution, A number of imputation methods, and a wrapper for any
Then, we provide several "steps" that can be applied to the datasets, among which: Historical accumulation,
Resampling the time resolution, A number of imputation methods, and a wrapper for any
[Scikit-learn](https://github.com/scikit-learn/scikit-learn) preprocessing step.
We believe to have covered any basic preprocessing needs for prepared datasets.
Any missing step can be added by following the step interface.

# 📄Paper

If you use this code in your research, please cite the following publication:
If you use this code in your research, please cite the following publication (a standalone paper is in preparation):

```
@article{vandewaterYetAnotherICUBenchmark2023,
title = {Yet Another ICU Benchmark: A Flexible Multi-Center Framework for Clinical ML},
shorttitle = {Yet Another ICU Benchmark},
url = {http://arxiv.org/abs/2306.05109},
language = {en},
urldate = {2023-06-09},
publisher = {arXiv},
author = {van de Water, Robin and Schmidt, Hendrik and Elbers, Paul and Thoral, Patrick and Arnrich, Bert and Rockenschaub, Patrick},
month = jun,
year = {2023},
note = {arXiv:2306.05109 [cs]},
keywords = {Computer Science - Machine Learning},
@inproceedings{vandewaterYetAnotherICUBenchmark2024,
title = {Yet Another ICU Benchmark: A Flexible Multi-Center Framework for Clinical ML},
shorttitle = {Yet Another ICU Benchmark},
booktitle = {The Twelfth International Conference on Learning Representations},
author = {van de Water, Robin and Schmidt, Hendrik Nils Aurel and Elbers, Paul and Thoral, Patrick and Arnrich, Bert and Rockenschaub, Patrick},
year = {2024},
month = oct,
urldate = {2024-02-19},
langid = {english},
}

```

This paper can also be found on arxiv: https://arxiv.org/pdf/2306.05109.pdf


Expand Down
34 changes: 34 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
[build-system]
requires = ["setuptools>=42", "wheel"]
build-backend = "setuptools.build_meta"

[project]
name = "recipies"
version = "1.0"
description = "A modular preprocessing package for Pandas Dataframe"
readme = "README.md"
license = { text = "MIT license" }
authors = [
{ name = "Robin van de Water", email = "[email protected]" }
]
classifiers = [
"Development Status :: 4 - Beta",
"Intended Audience :: Developers",
"License :: OSI Approved :: MIT License",
"Natural Language :: English",
"Programming Language :: Python :: 3.12"
]
keywords = ["recipies", "pandas", "dataframe", "polars", "preprocessing", "recipys"]
dependencies = [
"coverage>=7.0.0",
"numpy>=1.24.0",
"pandas>=2.0.0",
"polars[all]>=1.0.0",
"scikit-learn>=1.4.0"
]

[project.scripts]
recipys = "recipys.recipe:Recipe"

[tool.pytest.ini_options]
testpaths = ["tests"]
86 changes: 0 additions & 86 deletions setup.py

This file was deleted.

Loading