This work is under review.
We introduce a large-scale spatial-temporal dataset for traffic forecasting, DynST, encompassing 20.35 billion data points covering approximately 20 years. Unlike traditional datasets that accumulate data over time on a fixed road network, ours provides rich spatial information with dynamically evolving road networks, alongside diverse temporal data.
This is the offical repo for DynST.
You can check the basic temporal statistical analysis results in charts/
.
As the PEMS official website is often down, we provide a snapshot of the regulation webpage.
- Number of Records: DynST record 12,220 sensors, over 20 years, totally 20.35 billion data points.
- Features: The main feature data has two files,
data
andstations
.data
is the processed data.stations
is the Sensor ID. They are stored in np.float16 and np.int32 format respectively. - Format: The main feature data is compress in
npz
farmat. The adjacency matrix and the metadata is stored ascsv
file. They are all compiled as zip package. - Size: Totally about 79GB.
You can download the dataset from the following OneDrive link. The password is BJTUcocacola
.
The directory tree is:
├── dataset/
│ ├── DynamicVersion/ # Dynamic version of DynST
│ │ ├── D03.zip # 9 districts data compiled as zip package
│ │ ├── D04.zip
│ │ ├── DXX.zip
│ ├── Metadata.zip # Metadata file
│ └── AdjacencyMatrix.zip # Adjacency matrix file
Download the dataset from OneDrive and unzip it. If you want to use DynST as traditional setting, e.g., Target-only and Transfer-static setting mentioned in our paper, you should run the merge.py
to gain the static version of DynST.
Basic usage of the dataset is as follows:
import pandas as pd
import numpy as np
metadata = pd.read_csv('metadata.csv')
data = np.load("data.npz")['data'].astype(np.float32) # the feature data is stored in np.float16
run the merge.py
python merge.py -i d03
generate ajacency table.
cd gen_adj_table
python gen.py --dataset D05
DynST is released under a CC BY-NC 4.0 International License. Our code implementation is released under the MIT License. Please obey the regulation of PEMS.
To help users better understand the temporal characteristics of the dataset, we have conducted several statistical analyses and generated corresponding charts. Each chart is linked below:
For each sub-dataset, we calculate the spatial average of the data at each time frame for further use.
Annual variations in average traffic flow. Initially, we calculated the daily average for each time segment, from which the annual average was subsequently derived.
Chart Link |
---|
Annual Patterns of Average Traffic Flow |
Annual Patterns of Average Occupancy |
Annual Patterns of Average Speed |
Difference between weekdays and weekends