-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.markdown_github
73 lines (45 loc) · 2.95 KB
/
README.markdown_github
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
DC\_parking\_violations
================
`{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE)`
Analysis plan
-------------
Parking tickets. The dread of city dwellers everywhere. The Washington DC government has published a wealth of data about the tickets it hands out, which provides an interesting opportunity for data-minded people.
- Lots of data cleaning to get all of the files together
- Moran's I (it'll definitely be significant)
- Most common states by month
- Ticket type by time of day
- Ticket volume by time of day
- Tickets by time of month
- Most common places where tickets are issued
- Most common places where tickets are issued to non-DMV cars
<http://opendata.dc.gov/datasets/977602b156f74e41ae2dabbfaca42e20_3> <http://opendata.dc.gov/datasets/825f603d4cfc41fe9cf9ee74c359f893_2> <http://opendata.dc.gov/datasets/150cf72502b344448d900a4f1d779b3a_1> <http://opendata.dc.gov/datasets/7e688a52e65d49c0beef48289860f465_0> <http://opendata.dc.gov/datasets/2e967e9053144a309680fccea0f7b4e1_11> <http://opendata.dc.gov/datasets/9b040759c7264e59b8943fea0f081725_10>
Analysis
--------
When working with geospatial data, many people think every analysis should be a map. That approach can be useful, but shapefiles often contain valuable data.
\`\`\`{r setup} library(magrittr) library(ggplot2) library(scales) library(ggthemes) library(dplyr) library(rgdal) library(choroplethr)
library(lubridate)
Unzip the files
===============
unzip("./data/Parking\_Violations\_in\_April\_2016.zip", exdir = "./data", overwrite = TRUE)
Read shapefiles
===============
april16 <- readOGR(dsn = "./data" , layer = "Parking\_Violations\_in\_April\_2016")
\`\`\`
Let's dive into the data! Each parking ticket records the state of the offending car. As you'd expect, most ticketed cars come from the DC area, but nearby Maryland eclipses DC for first place. \`\`\`{r states}
count(april16$RP\_PLATE\_S) %>% tbl\_df %>% setNames(c("State", "Tickets")) %>% dplyr::arrange(desc(Tickets)) %>% top\_n(3) %>% ggplot(., aes(x = State, y = Tickets)) + geom\_bar(stat="identity") + ggtitle("Most frequent states") + theme\_bw()
state.name\[match(april16$RP\_PLATE\_S, state.abb)\] %>% tolower %>% table %>% data.frame %>% setNames(c("region", "value")) %>% state\_choropleth
\`\`\`
Default language (will delete later)
====================================
GitHub Documents
----------------
This is an R Markdown format used for publishing markdown documents to GitHub. When you click the **Knit** button all R code chunks are run and a markdown file (.md) suitable for publishing to GitHub is generated.
Including Code
--------------
You can include R code in the document as follows:
`{r cars} summary(cars)`
Including Plots
---------------
You can also embed plots, for example:
`{r pressure, echo=TRUE} plot(pressure)`
Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.