Authors:
Xiaowei Gao [📩 Email: [email protected]] (SpacetimeLab, UCL, UK)
Jinshuai Ma [📩 Email: [email protected]] (LSE Data Science Institute, UK)
Supervisors:
Dr. James Haworth, Associate Professor in Spatio-temporal Analytics, SpaceTimeLab, Department of Civil, Environmental and Geomatic Engineering, UCL
Dr. Stephen Law, Associate Professor in Urban Data Science, Department of Geography, UCL
Prof. Tao Cheng, Professor in GeoInformatics, SpaceTimeLab, Department of Civil, Environmental and Geomatic Engineering, UCL
🚸 py-stats19 is a Python package developed to support digital twin applications for spatio-temporal urban crash analysis. Inspired by the R stats19 package, this package provides a Python version tool to download and format the Road Safety Data from the official Road Safety Database published by the Department for Transport, UK, since 1979. Additionally, py-stats19 enhances the spatio-temporal data analysis by incorporating extra temporal information
and geometric details
.
The whole data set contains three tables: casualty
, collision
, and vehicle
. The data set is updated annually and contains detailed information about road traffic accidents in Great Britain.
🧰 The current py-stats19 package is under development and testing stages. It is available as a beta version for early access.
Install using pip:
$ pip install pystats19
Alternatively, download and install the latest release from Github, e.g. pystats19-0.1.0-py3-none-any.whl.
$ pip install pystats19-0.1.0-py3-none-any.whl
list_files()
can list all available stats19 dataset files, which can be simply filtered by passing year
and table
arguments.
Here, you could specify the table name as casualty
, collision
, or vehicle
. Those files could be merged by the accident_index
key.
from pystats19.read import list_files
# List all files contain year 2021 data
list_files(year=2021)
# ['dft-road-casualty-statistics-casualty-1979-latest-published-year.csv',
# 'dft-road-casualty-statistics-casualty-2021.csv',
# 'dft-road-casualty-statistics-casualty-last-5-years.csv',
# 'dft-road-casualty-statistics-collision-1979-latest-published-year.csv',
# 'dft-road-casualty-statistics-collision-2021.csv',
# 'dft-road-casualty-statistics-collision-last-5-years.csv',
# 'dft-road-casualty-statistics-vehicle-1979-latest-published-year.csv',
# 'dft-road-casualty-statistics-vehicle-2021.csv',
# 'dft-road-casualty-statistics-vehicle-e-scooter-2020-Latest-Published-Year.csv',
# 'dft-road-casualty-statistics-vehicle-last-5-years.csv']
# List all files contain year 2021 and table vehicle data
list_files(year=2021, table="vehicle")
# ['dft-road-casualty-statistics-vehicle-1979-latest-published-year.csv',
# 'dft-road-casualty-statistics-vehicle-2021.csv',
# 'dft-road-casualty-statistics-vehicle-last-5-years.csv']
pull()
requires filename
parameter, downloading the data file. filename
should be obtained using list_files()
.
Optionally, data_dir
can specify the location where the file will be stored.
from pystats19.source import pull_file
pull_file('dft-road-casualty-statistics-vehicle-2019.csv', data_dir="./data")
Data directory can also be configured globally by setting an environment variable PYSTATS19_DOWNLOAD_DIRECTORY
$ export PYSTATS19_DOWNLOAD_DIRECTORY=~/my_pystats19_data
load()
loads the data file as a pandas.DataFrame
or geopandas.GeoDataFrame
. Set auto_download=True
to automatically download the file if not exists.
Optionally,
set convert_code_to_label=True
to convert categorical data codes to text labels.
set add_temporal_info=True
to format datetime
and time
and add additional time information.
set add_geo_info=True
to add geo information. This will return a geopandas.GeoDataFrame
.
from pystats19.read import load
load(
'dft-road-casualty-statistics-collision-2021.csv',
auto_download=True,
convert_code_to_label=True,
add_temporal_info=True,
add_geo_info=True
)
# Removed 17 records due to missing Latitude or Longitude.
#
# accident_index ... geometry
# 0 2021010287148 ... POINT (521509.659 193079.41)
# 1 2021010287149 ... POINT (535380.824 180783.228)
# 2 2021010287151 ... POINT (529702.828 170398.085)
# 3 2021010287155 ... POINT (525313.658 178385.183)
# 4 2021010287157 ... POINT (512145.497 171526.072)
# ... ... ... ...
# 101082 2021991196247 ... POINT (325545.894 674547.399)
# 101083 2021991196607 ... POINT (271195.339 558271.954)
# 101084 2021991197944 ... POINT (357296.909 860766.24)
# 101085 2021991200639 ... POINT (326935.908 675924.391)
# 101086 2021991201030 ... POINT (270574.351 556367.939)
# [101070 rows x 39 columns]
Reference:
@article{gao2024sma,
title={SMA-Hyper: Spatiotemporal Multi-View Fusion Hypergraph Learning for Traffic Accident Prediction},
author={Gao, Xiaowei and Haworth, James and Ilyankou, Ilya and Zhang, Xianghui and Cheng, Tao and Law, Stephen and Chen, Huanfa},
journal={arXiv preprint arXiv:2407.17642},
year={2024}
}