The code can be installed via pip in editable mode in a virtual environment with the following commands:
git clone https://github.com/mie-lab/geospatial_optimal_transport
cd geospatial_optimal_transport
conda create -n otenv python=3.9
conda activate otenv
pip install -e .
This installs the package called geoemd
in your virtual environment, together with all dependencies. The code was tested on OS and Linux.
We provide all preprocessed data in the data_submission folder.
If you want to reproduce the preprocessing, please use the following instructions:
1) Bike sharing data
Public bike sharing data from Montreal were downloaded here. A script is provided to read all data and to convert it into the hourly number of bike pick-ups per station:
python preprocessing/bikes_montreal.py --path path/to/downloaded/folder
The script will output the preprocessed data into the same folder.
2) Charging occupancy data
The charging station occupancy data were downloaded [here](here: https://gitlab.com/smarter-mobility-data-challenge/tutorials). It is taken from the Smarter Mobility Data Challenge
(see paper).
We first run the data through the notebook from the winning team of the challenge for preprocessing. We then saved the intermediate data and run it through the following script for further preprocessing steps (paths are hard-coded in the script):
python preprocessing/charging.py
3) Traffic forecasting data
Data was downloaded here. We save the data in a folder data/traffic
. Then, we preprocess it through the following script:
python preprocessing/traffic.py
NOTE: The purpose of this repo is to demonstrate an evaluation framework. The goal is not to develop new superior models for bike sharing or other applications. Thus, the predictions themselves do not matter much and were mainly created to demonstrate the framework on realistic data. We provide our outputs here. The following instructions are only necessary to retrain the models.
We use the darts
library for time series prediction.
python train_test.py [-h] [-d DATA_PATH] [-s STATION_PATH] [-o OUT_PATH] [-c CONFIG] [-m MODEL] [--multi_vs_ind MULTI_VS_IND] [-r RECONCILE] [-x HIERARCHY] [-l LAGS] [--output_chunk_length OUTPUT_CHUNK_LENGTH] [--n_epochs N_EPOCHS] [--x_loss_function X_LOSS_FUNCTION] [--x_scale X_SCALE] [--num_stacks NUM_STACKS] [--lags_past_covariates LAGS_PAST_COVARIATES] [--y_clustermethod Y_CLUSTERMETHOD][--y_cluster_k Y_CLUSTER_K] [--model_path MODEL_PATH] [--load_model_name LOAD_MODEL_NAME] [--ordered_samples] [--optimize_optuna]
E.g. we ran it with
python scripts/train_test.py -d data_submission/data_raw/bikes_data.csv -s data_submission/data_raw/bikes_stations.csv -o outputs/bikes --model_path trained_models/bikes --model nhits --n_epochs 100 --x_loss_function emdpartialspatial
This will train with a Sinkhorn loss (unbalanced OT) in the NHiTS model, and will save the model in the folder trained_models/bikes
, and save the output in outputs/bikes
.
Our evaluation script is applied on a whole folder with the outputs from several models, and saves the results in a folder with the same name + "_plots", e.g., outputs/bikes_plots
:
python scripts/evaluate.py -n bikes --redo
We provide notebooks to reproduce all figures and tables from the manuscript. These notebooks can be run without any previous steps, just using the data in data_submission
.
- synthetic: This notebook provides the code for reproducing the experiments on synthetic data, including Figure 3 and Figure 4.
- case study bike sharing: Reproducing case study where the evaluation framework is applied on bike sharing data (Figure 5 and Figure 6)
- case study spatial regression: Reproducing case study on spatial regression (Table 3 and Figure 9)
- scales: Reproduce the analysis on spatial and temporal scales (Figure 7 and Table 2)
- sinkhorn loss: Reproduce the results of training with the Sinkhorn loss with this notebook (Table 4 and Figure 11) as well as the analysis in Appendix D
To reproduce the experiments on spatial regression (section 5), first run
python scripts/benchmark_with_OT.py
Then the figures and tables can be reproduced with the spatial regression notebok.
- Sometimes XGB within the darts library causes issues. XGB is not used in this repo, so as a hot fix, the import in
site-packages/darts/models/__init__.py
can be removed to ignore the issue.