We recommend trying out Snakemake as a replacement for GNU Make for managing workflows with several parts, such as:
- download raw data
- process raw data into clean data
- divide dataset into train and test
- train models
-
- train model A
-
- train model B
- make result figure (using results from A and B)
Using snakemake, you can easily manage data science workflows in a Python-like syntax.
Tutorial: https://snakemake.readthedocs.io/en/stable/tutorial/tutorial.html
https://github.com/tufts-ml/mastre-predict-and-downselect (Request access from Mike)
See especially the toy data workflow and the 'movie reviews' workflows.