GeOT: A spatially explicit framework for evaluating spatio-temporal predictions

Installation:

The code can be installed via pip in editable mode in a virtual environment with the following commands:

git clone https://github.com/mie-lab/geospatial_optimal_transport
cd  geospatial_optimal_transport
conda create -n otenv python=3.9
conda activate otenv
pip install -e .

This installs the package called geoemd in your virtual environment, together with all dependencies. The code was tested on OS and Linux.

Data download and preprocessing:

We provide all preprocessed data in the data_submission folder.

If you want to reproduce the preprocessing, please use the following instructions:

1) Bike sharing data

Public bike sharing data from Montreal were downloaded here. A script is provided to read all data and to convert it into the hourly number of bike pick-ups per station:

python preprocessing/bikes_montreal.py --path path/to/downloaded/folder

The script will output the preprocessed data into the same folder.

2) Charging occupancy data

The charging station occupancy data were downloaded [here](here: https://gitlab.com/smarter-mobility-data-challenge/tutorials). It is taken from the Smarter Mobility Data Challenge (see paper).

We first run the data through the notebook from the winning team of the challenge for preprocessing. We then saved the intermediate data and run it through the following script for further preprocessing steps (paths are hard-coded in the script):

python preprocessing/charging.py

3) Traffic forecasting data

Data was downloaded here. We save the data in a folder data/traffic. Then, we preprocess it through the following script:

python preprocessing/traffic.py

Training models

NOTE: The purpose of this repo is to demonstrate an evaluation framework. The goal is not to develop new superior models for bike sharing or other applications. Thus, the predictions themselves do not matter much and were mainly created to demonstrate the framework on realistic data. We provide our outputs here. The following instructions are only necessary to retrain the models.

We use the darts library for time series prediction.

python train_test.py [-h] [-d DATA_PATH] [-s STATION_PATH] [-o OUT_PATH] [-c CONFIG] [-m MODEL] [--multi_vs_ind MULTI_VS_IND] [-r RECONCILE] [-x HIERARCHY] [-l LAGS] [--output_chunk_length OUTPUT_CHUNK_LENGTH] [--n_epochs N_EPOCHS] [--x_loss_function X_LOSS_FUNCTION] [--x_scale X_SCALE] [--num_stacks NUM_STACKS] [--lags_past_covariates LAGS_PAST_COVARIATES] [--y_clustermethod Y_CLUSTERMETHOD][--y_cluster_k Y_CLUSTER_K] [--model_path MODEL_PATH] [--load_model_name LOAD_MODEL_NAME] [--ordered_samples] [--optimize_optuna]

E.g. we ran it with

python scripts/train_test.py  -d data_submission/data_raw/bikes_data.csv     -s data_submission/data_raw/bikes_stations.csv    -o outputs/bikes     --model_path trained_models/bikes  --model nhits --n_epochs 100 --x_loss_function emdpartialspatial

This will train with a Sinkhorn loss (unbalanced OT) in the NHiTS model, and will save the model in the folder trained_models/bikes, and save the output in outputs/bikes.

Evaluation

Our evaluation script is applied on a whole folder with the outputs from several models, and saves the results in a folder with the same name + "_plots", e.g., outputs/bikes_plots:

python scripts/evaluate.py -n bikes --redo

Reproduce results from the paper

We provide notebooks to reproduce all figures and tables from the manuscript. These notebooks can be run without any previous steps, just using the data in data_submission.

synthetic: This notebook provides the code for reproducing the experiments on synthetic data, including Figure 3 and Figure 4.
case study bike sharing: Reproducing case study where the evaluation framework is applied on bike sharing data (Figure 5 and Figure 6)
case study spatial regression: Reproducing case study on spatial regression (Table 3 and Figure 9)
scales: Reproduce the analysis on spatial and temporal scales (Figure 7 and Table 2)
sinkhorn loss: Reproduce the results of training with the Sinkhorn loss with this notebook (Table 4 and Figure 11) as well as the analysis in Appendix D

To reproduce the experiments on spatial regression (section 5), first run

python scripts/benchmark_with_OT.py

Then the figures and tables can be reproduced with the spatial regression notebok.

Troubleshooting

Sometimes XGB within the darts library causes issues. XGB is not used in this repo, so as a hot fix, the import in site-packages/darts/models/__init__.py can be removed to ignore the issue.

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
data_submission		data_submission
geoemd		geoemd
notebooks		notebooks
preprocessing		preprocessing
scripts		scripts
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GeOT: A spatially explicit framework for evaluating spatio-temporal predictions

Installation:

Data download and preprocessing:

Training models

Evaluation

Reproduce results from the paper

Troubleshooting

About

Releases

Packages

Languages

License

mie-lab/geospatial_optimal_transport

Folders and files

Latest commit

History

Repository files navigation

GeOT: A spatially explicit framework for evaluating spatio-temporal predictions

Installation:

Data download and preprocessing:

Training models

Evaluation

Reproduce results from the paper

Troubleshooting

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages