snakepot

snakepot is a snakemake workflow designed to train and evaluate a binary classifier using the TPOT auto-ML library.

Quick Start

Install snakepot (requires conda)
Edit parameters in config.json
Run snakemake
View outputs in new directory

Features

I developed snakepot during my elective at the William Harvey Research Institute. We used snakepot to quickly train a baseline model on a variety of gene-phenotype datasets. The workflow takes the following steps:

Clean the data set (simple N/A drop by rows and explicit drop by columns)
Split data into train/test/validate sets
Call the TPOT automated machine learning algorithm to train a classifier
Save classifier and re-run it on the houldout/validation data
Evaluate the classifier on the holdout set
Call the classifier for predictions on the unlabelled data

An example dataset (/test/data.csv) and config file (config.json) are provided.

Setup

# Build conda environment
conda env create -f environment.yaml
conda activate snakepot
# Install python helper scripts to path
pip install . 
# Run the workflow in Snakefile using config.json
snakemake

config.json

Parameter	Description
directory	Output directory for new files
input	Input CSV file. All data must be encoded as binary or continuous variables
drop_columns	Features to drop from the data. Skipped if not found
encode_columns	Categorical features to encode. Skipped if not found
target_column	The name of the target variable
target_1	Target variable value to label as '1'
target_0	Target variable value to label as '0'
to_predict	Target variable value for final predictions
perc_split	Percentage of training data (target '1' or '0') to split for holdout set
TPOT_max_time	Maximum time to run TPOT in muntes

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
_assets		_assets
src		src
test		test
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
Snakefile		Snakefile
config.json		config.json
environment.yaml		environment.yaml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

snakepot

Quick Start

Features

Setup

config.json

License

About

Releases

Packages

Languages

License

NMNS93/snakepot

Folders and files

Latest commit

History

Repository files navigation

snakepot

Quick Start

Features

Setup

config.json

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages