A basic analysis of crash data for Bloomington, IN from the years 2003 - 2015.
The data were obtained from the following site: https://catalog.data.gov/dataset/traffic-data/resource/e46a5cc5-ed4d-4b8d-b750-18e6c9ec570e EDIT: Apparently this link no longer works, but of course, you're free to download the data from this page :)
The accompanying PDF explains the analyses that I did, which themselves can be found in the .py file in this repository.
I found that October tended to be the most dangerous month to drive in Bloomington. It consistently had the highest number of crashes year after year, with the principal reason being a failure to yield the right of way (followed by following too closely). Additionally, the overwhelming majority of the crashes occurred near the university, in particular, in the downtown area surrounding the entrance to the university. It was speculated that the prolonged period of no significant break between September and October, alongside the potential for Bloomington's snowy season to begin in October, may constribute to the rise in crashes in September and (peaking in) October.
Aside from the insights from the data, I had some practice here in working with pandas dataframes, especially extracting and splicing data (e.g., looking at the data as a whole versus only those crashes in October). I also had a bit of practice performing a basic ANOVA on the crash data. And, of course, I had some practice with building a variety of graphics from the data, as can be seen in the PDF. Mainly, this project served the dual purpose of getting the hang of pandas with real data, and practice in exploring a new dataset with essentially zero direction. I list a few questions in the beginning of the .py file that came to mind as I saw what variables were available to me, and as I explored those largely directionless questions, the pieces came together in the final recommendations I could make based on the data.