Skip to content

A look at using pandas and numpy to see whether or not higher attendance Blue Jays home games cause increased accidents around the Rogers Centre post-game

Notifications You must be signed in to change notification settings

kcarmonamurphy/blue-jays-home-games

Repository files navigation

Blue Jays Home Games

A Data Science Project deliverable for the SCS3250-027 - Introduction to Data Science course @ the University of Toronto (Summer 2019)

Table of Contents

  1. Hypothesis
  2. Datasets
  3. Methodology
  4. Findings
  5. Jupyter Notebooks

Hypothesis

Does higher attendance at Blue Jays home games in Toronto translate to increased incidences of killed or seriously injured accidents in the immediate vicinity of the Rogers Centre? It turns out the answer isn't so straightforward. Here I present a basic Data Science analysis of this question using numpy and pandas libraries along with a few publicly available data sets.

Datasets

  1. The Toronto Police Service publishes a Killed and Seriously Injured (KSI) dataset which contains a compilation of all recorded incidents on Toronto roads from the years 2008 to 2018. This datset contains information about what vehicle types were involved in accidents, whether injuries were fatal or non-fatal, locations, and datetimes.

  2. Sports Reference provides us with a historically accurate list of all Toronto Blue Jays regularly scheduled games from the 1980s onwards which includes dates, durations, whether or not the game occured during the day or night, as well as attendance.

Methodology

Using both datasets, I filtered the KSI data using based on two criteria: a) specific time windows after Toronto Blue Jays home games at the Rogers Centre, b) distance from the Rogers Centre. The resulting filtered datasets were then be grouped by attendance quintiles. Dataframes for different values of criteria a) and b) were then created and plotted to attempt to prove or disprove the hypothesis.

Findings

5 hours and 5 kilometres

5 hours x 5 km

  • Here we see that the data correlates generally with the stated hyptothesis. As the attendance increases, the number of accidents increases.

5 hours and 3 kilometres

5 hours x 3 km

  • Reducing the radius from the Rogers Centre, an interesting trend emerges only for games with high attendance. It seems that the number of accidents drops. This is likely due to so much traffic flooding the nearby streets and causing so much congestion that the roads become safer.

2 hours and 3 kilometres

2 hours x 3 km

  • Reducing the time window after games shows the opposite trend from the first graph and correctly predicts our assumptions about why the second graph trend behaves the way it does.

Jupyter Notebooks

  1. Part 1 - Prepare Blue Jays Data Set
  2. Part 2 - Analyse KSI & Draw Conclusions

About

A look at using pandas and numpy to see whether or not higher attendance Blue Jays home games cause increased accidents around the Rogers Centre post-game

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published