Name	Name	Last commit message	Last commit date
Latest commit actions-user 1.1.1 Jan 28, 2025 5b7a220 · Jan 28, 2025 History 112 Commits
.github/workflows	.github/workflows	Create ci-cd.yml	Jan 28, 2025
docs	docs	finally	Jan 26, 2025
src	src	added dataset to src/data/	Jan 26, 2025
tests	tests	docstring	Jan 18, 2025
.coverage	.coverage	xml	Jan 28, 2025
.readthedocs.yml	.readthedocs.yml	read	Jan 26, 2025
CHANGELOG.md	CHANGELOG.md	1.1.1	Jan 28, 2025
CONDUCT.md	CONDUCT.md	created repository and set up as python project	Jan 8, 2025
CONTRIBUTING.md	CONTRIBUTING.md	Update conda environment setup	Jan 25, 2025
LICENSE	LICENSE	created repository and set up as python project	Jan 8, 2025
README.md	README.md	fix: added codecov to readme.md	Jan 28, 2025
coverage.xml	coverage.xml	xml	Jan 28, 2025
poetry.lock	poetry.lock	petry	Jan 26, 2025
pyproject.toml	pyproject.toml	1.1.1	Jan 28, 2025

Repository files navigation

datamop

datamop is a data cleaning and wrangling package designed to streamline the preprocessing of datasets. Whether you meet missing values, inconsistent categorical columns or need scaling for numeric columns when dealing with data. datamop provides a simple and consistent solution to automate and simplify these repetitive tasks. The following are core functions of this package:

sweep_nulls(): Handle missing values such as imputation or removal, based on user preference.
column_encoder(): Encodes categorical columns using either one-hot encoding or ordinal encoding, based on user preference.
column_scaler(): Scales numerical columns, including Min-Max scaling and Z-score standardization, based on user preference.

datamop fits into Python data preprocessing ecosystem by offering a more lightweight and user-friendly alternative to complex libraries like pandas, scikit-learn. datamop focuses specifically on handling missing values, encoding categorical columns and normalizing numerical columns. datamop changes scikit-learn tasks performed by modules like SimpleImputer, OneHotEncoder, OrdinalEncoder and StandardScaler with fewer steps and easier customization. Similar functionality can be found in:

pandas (fillna(), etc.): pandas documentation
scikit-learn (SimpleImputer, OneHotEncoder, LabelEncoder, MinMaxScaler, etc.): scikit-learn preprocessing

Contributors

The authors of this project are Sepehr Heydarian, Ximin Xu, and Essie Zhang.

Installation

$ pip install datamop

Usage

datamop can be used to encode columns in a DataFrame using one-hot or ordinal encoding as follows:

import pandas as pd
import datamop

df = pd.DataFrame({
    'Sport': ['Tennis', 'Basketball', 'Football', 'Badminton'],
    'Level': ['A', 'B', 'C', 'D']
})

encoded_df_onehot = datamop.column_encoder(df, columns=['Sport'], method='one-hot')
encoded_df_ordinal = datamop.column_encoder(df, columns=['Level'], method='ordinal', order={'Level': ['A', 'B', 'C', 'D']})

This package can also be used to handle missing values such as imputation or removal, based on user preference as following:

import numpy as np
df = pd.DataFrame({
    'a': [10, np.nan, 30],
    'b': [1.5, 2.5, np.nan],
    'c': ['x', np.nan, 'z']
    })
cleaned = datamop.sweep_nulls(df, strategy='mean')

Additionally, this package can be used to scale numerical columns as following:

df = pd.DataFrame({"price": [25, 50, 75]})
df_scaled = datamop.column_scaler(df, column = 'price', method='minmax', new_min=0, new_max=1)

Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms. See CONTRIBUTING file here.

License

datamop was created by Sepehr Heydarian, Ximin Xu, Essie Zhang. It is licensed under the terms of the MIT license. See LICENSE file here.

Credits

datamop was created with cookiecutter and the py-pkgs-cookiecutter template.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

datamop

Contributors

Installation

Usage

Contributing

License

Credits

About

Releases 4

Packages

Contributors 4

Languages

License

UBC-MDS/DataMop_package_group14

Folders and files

Latest commit

History

Repository files navigation

datamop

Contributors

Installation

Usage

Contributing

License

Credits

About

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 4

Languages

Packages