Machine Learning Template

Repository Structure

├── Makefile
├── README.md
├── config
│   └── environment.yaml
├── data
│   ├── iterim
│   ├── processed
│   └── raw
├── notebooks
├── requirements.txt
├── saved
│   ├── figures
│   └── models
├── scripts
│   ├── eval.py
│   ├── process.py
│   └── train.py
└── src
    ├── __init__.py
    ├── model
    │   └── __init__.py
    ├── test
    └── utils

The Makefile should be used to setup the repository (ex. install required OS libraries) or to save complex commands/workflows such as training or running experiments with many command line arguments.
The config folder is versioned using DVC and should include configuration files related to your environment or experiments.
The data folder is versioned using DVC and includes subfolders for interim, raw, and processed data.
The notebooks folder should be used for data exploration or in other situations where data visualization may be useful.
The requirements.txt file should be used to keep track of required dependencies for the project and should ideally include specific package versions for future reproducability.
The saved folder is versioned using DVC and includes subfolders for figures and models.
The scripts folder contains python script files for evaluation (eval.py), data processing (process.py), and training (train.py).
The src package should contain source code files and testing.
- The model subfolder should include source code related to model architecture and usage.
- The test subfolder should include unit tests to verify code integrity.
- The utils subfolder should include utility functions used by notebooks and script files.

Getting Started

Installing Dependencies

python3 -m venv .env
source .env/bin/activate
pip install -r requirements.txt

Initialize Data Version Control

If dvc is already initialized, skip to step 4.

dvc init
dvc config core.autostage true
git commit -m "Initialize DVC"

git rm -r --cached 'config'
dvc add config
git commit -m "Add config folder to DVC"

git rm -r --cached 'data'
dvc add data
git commit -m "Add data folder to DVC"

git rm -r --cached 'saved'
dvc add saved
git commit -m "Add saved folder to DVC"

git push

Create DVC Remote

Follow tutorial here to create S3 bucket

dvc remote add -d remote s3://<insert bucket name here>/dvc

dvc remote modify remote region <insert bucket region here>

git add .dvc/config
git commit -m "Added S3 Remote"
git push

Add AWS Credentials to DVC Remote

dvc remote modify --local remote access_key_id <insert AWS_ACCESS_KEY_ID here>

dvc remote modify --local remote secret_access_key <insert AWS_SECRET_ACCESS_KEY here>

Push to Assembla

If you are using Assembla for code versioning,

git remote remove origin
git remote add origin [email protected]:<insert .git link here>
git push --set-upstream origin main

Using DVC

Adding Files

To add new files to DVC or version existing data changes,

dvc add <name of file/folder>

If you are adding new data,

git commit -m “Added <name of file/folder> to dvc”

If you are updating existing data,

git commit -m “Updated <name of file/folder> with <data changes>”

git tag -a “<version>” -m “<name of file/folder> version <version>”
git push
dvc push

Checkout Data

To checkout a specific version of a file or folder,

git checkout tags/<version> <name of file/folder>
dvc pull

To go back to the current data version,

git checkout <current branch>
dvc pull

Jupyter Notebook

To use jupyter notebooks within your venv environment,

pip install jupyter ipykernel
python -m ipykernel install --user --name=.env

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Template

Repository Structure

Getting Started

Push to Assembla

Using DVC

Adding Files

Checkout Data

Jupyter Notebook

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
config		config
data		data
notebooks		notebooks
saved		saved
scripts		scripts
src		src
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt

justinwaltrip/ml-template

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Template

Repository Structure

Getting Started

Push to Assembla

Using DVC

Adding Files

Checkout Data

Jupyter Notebook

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages