Skip to content

cocacola-lab/DynST

Repository files navigation

DynST: Large-Scale Spatial-Temporal Dataset for Traffic Forecasting with Dynamic Road Networks

This work is under review.

Table of Contents

Introduction

We introduce a large-scale spatial-temporal dataset for traffic forecasting, DynST, encompassing 20.35 billion data points covering approximately 20 years. Unlike traditional datasets that accumulate data over time on a fixed road network, ours provides rich spatial information with dynamically evolving road networks, alongside diverse temporal data.

This is the offical repo for DynST.

You can check the basic temporal statistical analysis results in charts/.

As the PEMS official website is often down, we provide a snapshot of the regulation webpage.

Dataset Description

  • Number of Records: DynST record 12,220 sensors, over 20 years, totally 20.35 billion data points.
  • Features: The main feature data has two files, data and stations. data is the processed data. stations is the Sensor ID. They are stored in np.float16 and np.int32 format respectively.
  • Format: The main feature data is compress in npz farmat. The adjacency matrix and the metadata is stored as csv file. They are all compiled as zip package.
  • Size: Totally about 79GB.

Download Link

You can download the dataset from the following OneDrive link. The password is BJTUcocacola.

The directory tree is:

├── dataset/
│   ├── DynamicVersion/       # Dynamic version of DynST
│   │   ├── D03.zip           # 9 districts data compiled as zip package
│   │   ├── D04.zip
│   │   ├── DXX.zip
│   ├── Metadata.zip            # Metadata file
│   └── AdjacencyMatrix.zip     # Adjacency matrix file

Usage

Download the dataset from OneDrive and unzip it. If you want to use DynST as traditional setting, e.g., Target-only and Transfer-static setting mentioned in our paper, you should run the merge.py to gain the static version of DynST.

Basic usage of the dataset is as follows:

import pandas as pd
import numpy as np


metadata = pd.read_csv('metadata.csv')

data = np.load("data.npz")['data'].astype(np.float32) # the feature data is stored in np.float16 

run the merge.py

python merge.py -i d03

generate ajacency table.

cd gen_adj_table
python gen.py --dataset D05

License

DynST is released under a CC BY-NC 4.0 International License. Our code implementation is released under the MIT License. Please obey the regulation of PEMS.

Charts and Figures

Temporal Characteristics

To help users better understand the temporal characteristics of the dataset, we have conducted several statistical analyses and generated corresponding charts. Each chart is linked below:

For each sub-dataset, we calculate the spatial average of the data at each time frame for further use.

Year View

Annual variations in average traffic flow. Initially, we calculated the daily average for each time segment, from which the annual average was subsequently derived.

Chart Link
Annual Patterns of Average Traffic Flow
Annual Patterns of Average Occupancy
Annual Patterns of Average Speed

Week View

Difference between weekdays and weekends

Chart Link
2001 WeekView
2002 WeekView
2003 WeekView
2004 WeekView
2005 WeekView
2006 WeekView
2007 WeekView
2008 WeekView
2009 WeekView
2010 WeekView
2011 WeekView
2012 WeekView
2013 WeekView
2014 WeekView
2015 WeekView
2016 WeekView
2017 WeekView
2018 WeekView
2019 WeekView
2020 WeekView
2021 WeekView
2022 WeekView
2023 WeekView
2024 WeekView

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •