Skip to content

mie-lab/geospatial_optimal_transport

Repository files navigation

GeOT: A spatially explicit framework for evaluating spatio-temporal predictions

Installation:

The code can be installed via pip in editable mode in a virtual environment with the following commands:

git clone https://github.com/mie-lab/geospatial_optimal_transport
cd  geospatial_optimal_transport
conda create -n otenv python=3.9
conda activate otenv
pip install -e .

This installs the package called geoemd in your virtual environment, together with all dependencies. The code was tested on OS and Linux.

Data download and preprocessing:

We provide all preprocessed data in the data_submission folder.

If you want to reproduce the preprocessing, please use the following instructions:

1) Bike sharing data

Public bike sharing data from Montreal were downloaded here. A script is provided to read all data and to convert it into the hourly number of bike pick-ups per station:

python preprocessing/bikes_montreal.py --path path/to/downloaded/folder

The script will output the preprocessed data into the same folder.

2) Charging occupancy data

The charging station occupancy data were downloaded [here](here: https://gitlab.com/smarter-mobility-data-challenge/tutorials). It is taken from the Smarter Mobility Data Challenge (see paper).

We first run the data through the notebook from the winning team of the challenge for preprocessing. We then saved the intermediate data and run it through the following script for further preprocessing steps (paths are hard-coded in the script):

python preprocessing/charging.py

3) Traffic forecasting data

Data was downloaded here. We save the data in a folder data/traffic. Then, we preprocess it through the following script:

python preprocessing/traffic.py

Training models


NOTE: The purpose of this repo is to demonstrate an evaluation framework. The goal is not to develop new superior models for bike sharing or other applications. Thus, the predictions themselves do not matter much and were mainly created to demonstrate the framework on realistic data. We provide our outputs here. The following instructions are only necessary to retrain the models.


We use the darts library for time series prediction.

python train_test.py [-h] [-d DATA_PATH] [-s STATION_PATH] [-o OUT_PATH] [-c CONFIG] [-m MODEL] [--multi_vs_ind MULTI_VS_IND] [-r RECONCILE] [-x HIERARCHY] [-l LAGS] [--output_chunk_length OUTPUT_CHUNK_LENGTH] [--n_epochs N_EPOCHS] [--x_loss_function X_LOSS_FUNCTION] [--x_scale X_SCALE] [--num_stacks NUM_STACKS] [--lags_past_covariates LAGS_PAST_COVARIATES] [--y_clustermethod Y_CLUSTERMETHOD][--y_cluster_k Y_CLUSTER_K] [--model_path MODEL_PATH] [--load_model_name LOAD_MODEL_NAME] [--ordered_samples] [--optimize_optuna]

E.g. we ran it with

python scripts/train_test.py  -d data_submission/data_raw/bikes_data.csv     -s data_submission/data_raw/bikes_stations.csv    -o outputs/bikes     --model_path trained_models/bikes  --model nhits --n_epochs 100 --x_loss_function emdpartialspatial

This will train with a Sinkhorn loss (unbalanced OT) in the NHiTS model, and will save the model in the folder trained_models/bikes, and save the output in outputs/bikes.

Evaluation

Our evaluation script is applied on a whole folder with the outputs from several models, and saves the results in a folder with the same name + "_plots", e.g., outputs/bikes_plots:

python scripts/evaluate.py -n bikes --redo 

Reproduce results from the paper

We provide notebooks to reproduce all figures and tables from the manuscript. These notebooks can be run without any previous steps, just using the data in data_submission.

  • synthetic: This notebook provides the code for reproducing the experiments on synthetic data, including Figure 3 and Figure 4.
  • case study bike sharing: Reproducing case study where the evaluation framework is applied on bike sharing data (Figure 5 and Figure 6)
  • case study spatial regression: Reproducing case study on spatial regression (Table 3 and Figure 9)
  • scales: Reproduce the analysis on spatial and temporal scales (Figure 7 and Table 2)
  • sinkhorn loss: Reproduce the results of training with the Sinkhorn loss with this notebook (Table 4 and Figure 11) as well as the analysis in Appendix D

To reproduce the experiments on spatial regression (section 5), first run

python scripts/benchmark_with_OT.py

Then the figures and tables can be reproduced with the spatial regression notebok.

Troubleshooting

  • Sometimes XGB within the darts library causes issues. XGB is not used in this repo, so as a hot fix, the import in site-packages/darts/models/__init__.py can be removed to ignore the issue.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages