This is a boilerplate repo for a machine-learning project involving Time Series forecasting.
In particular
- It provides a cross-validation framework to ensure model are tested thoroughly and without data leakage
- It is agnostic of the type of model involved
- It is well suited for short research projects, typical of few-weeks coding bootcamps such as Le Wagon DataScience
ts_boilerplate
packagemain.py
comprises the main routes to be called from the CLI (train
andcross-validate
)params.py
contains project-level global variable to be set manually
data
folder containsraw
andclean
folder should contain 2D arraysdata
with (axis 0) representing timesteps, and (axis 1) columns containing tagets and covariates, as per picturedata.shape = (length, n_targets+n_covariates)
Xy
may persist your tuple (X,y) of 3D arrays to be fed to your modelsX.shape = (n_samples, input_length, n_covariates) y.shape = (n_samples, output_length, n_targets)
notebooks
tutorial_ts_forecasting.ipynb
is a recommended read before diving into this project. It contains visuals that will help you fill global project params and understand naming conventionscreate_dummy_tests.ipynb
will help you understand how tests have been built
tests
folder detailed below
Run this in your terminal from the root project folder to check your code
pytest
pytest -m "not optional"
to only check mandatory testspytest -m "not optional" -m "not slow"
to also avoid tests that may be slow (involving fitting your model)
These tests require ts_boilerplate/params.py
to be filled corresponding to your true project speficities
- Basic training & cross-val routes, with test
- Make model fit well for univariate, multivariate (n_tagets >1) & sequences (output_sequence_lenght>1)
- Make tests pass for stride > 1
- Add tests about the model (shape of prediction, etc)
- Add backtesting as main route. Very important concept to teach to students
- Refacto
model.py
- Rename
pipeline.py
because it may comprises the pre-processing such as scaling etc... - Turn into a class
TsPipeline()
instead of pure functions
- Rename
- Add tests for future-covariates
- create Makefile
- Include DAG of the project
- publish to lewagon community