-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathData user guide.Rmd
358 lines (316 loc) · 16.5 KB
/
Data user guide.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
---
title: "Data user guide"
author: "Jenna"
date: "December 16, 2019"
output: rmarkdown::github_document
---
```{r setup, include=FALSE}
library(rmarkdown)
knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE)
library(here)
source(here("data pull.R"))
library(RColorBrewer)
library(kableExtra)
```
### Objective: Describe the data required for the Census 2020 planning maps in order to facilitate decisionmaking around presentation of the data.
#### Low Response Score
The Low Response Score (LRS) is an ordinarly least-squares regression model fitted on 25 variables that:
1. best predicted 2010 mail no-response rate, and
2. have practical interpretation for outreach (e.g., include population of single mothers, exclude MOEs).
Point (1) is especially important, as it means the LRS does not account for internet response. One potential risk in using this data for the use case of outreach is misappropriating resources to neighborhoods of highly internet-literate residents like young, mobile renters, who would be considered HTC according to mail return rate, but may not in fact be HTC with the advent of the internet census.
The LRS is a predicted mail non-response rate, so "an LRS value of 17.7 should be interpreted as 17.7% of households in that census tract are predicted to NOT self-respond to the decennial census." Percentage of renters, percentage of people aged 18-24, and percentage of households headed by unmarried females are the strongest predictors.
##### Threshold
I have not been able to find a recommended threshold for what LRS is most important in the research paper, FAQ, ROAM user guide, or NNIP partners' publications. A map in an article titles "Identifying Hard-to-Survey Populations Using Low Response Scores by Census Tract" on the Census website uses the breakdown of LRS 0 - 17.9, 18 - 21.9, 22 - 24.9, 25 - 28.9, 29 - 46.2.
The national non-response rate in 2010 was 20.7 percent, of course with large geographic variation. Even so, that number could serve as a good cutoff for determining HTC.
Below is a map of New Orleans tract-level LRS broken into the same groups the CB shows.
```{r}
LRS.plotdata <- LRS %>%
mutate(bin = factor(cut(LRS$Low_Response_Score, c(0, 18, 22, 25, 29, 46.2)), labels = c("0 - 17.9", "18 - 21.9", "22 - 24.9", "25 - 28.9", "29 - 46.2")))
df.polygon.LRS <- tracts.la %>%
mutate(GEOID = as.character(GEOID10)) %>%
left_join(LRS.plotdata, by="GEOID")
df.polygon.LRS %>%
filter(COUNTYFP10 %in% c("071")) %>%
ggplot() +
geom_sf(aes(fill = bin), color="white")+ # This is the main feature layer for this map
geom_sf(data = water.051, fill = "white", color="white") +
#geom_sf(data = water.057, fill = NA) +
geom_sf(data = water.071, fill = "white", color="white") +
#geom_sf(data = water.075, fill = NA) +
#geom_sf(data = water.087, fill = NA) +
#geom_sf(data = water.089, fill = NA) +
scale_fill_brewer(palette = "YlGnBu") + # color palette
#scale_fill_viridis_d(direction=-1) + # alternate palette
#coord_sf(xlim = c(-90.085683,-90.051281), ylim = c(29.946286, 29.986772)) + # manually specify the extent
#coord_sf(crs=sf::st_crs(2163)) + # manually specify the coordinate system
theme_minimal() +
theme(
axis.text = element_blank(),
axis.line = element_blank(),
axis.ticks = element_blank(),
panel.border = element_blank(),
panel.grid = element_line(colour = "white"),
axis.title = element_blank(),
panel.grid.major = element_line(colour = "white"),
panel.grid.minor = element_line(colour = "white"),
panel.background = element_blank(),
legend.title= element_blank(),
legend.position = "top") +
labs(title = "Low response score, Orleans Parish") +
ggsave(filename = "map_example.pdf", device = cairo_pdf, width = 24, height = 16, dpi = 300)
```
```{r}
LRS.plotdata <- LRS %>%
mutate(bin = factor(cut(LRS$Low_Response_Score, c(0, 18, 22, 25, 29, 46.2)), labels = c("0 - 17.9", "18 - 21.9", "22 - 24.9", "25 - 28.9", "29 - 46.2")))
df.polygon.LRS <- tracts.la %>%
mutate(GEOID = as.character(GEOID10)) %>%
left_join(LRS.plotdata, by="GEOID")
df.polygon.LRS %>%
filter(COUNTYFP10 == "051") %>%
ggplot() +
geom_sf(aes(fill = bin), color="white")+ # This is the main feature layer for this map
scale_fill_brewer(palette = "YlGnBu") + # color palette
#scale_fill_viridis_d(direction=-1) + # alternate palette
#coord_sf(xlim = c(-90.085683,-90.051281), ylim = c(29.946286, 29.986772)) + # manually specify the extent
#coord_sf(crs=sf::st_crs(2163)) + # manually specify the coordinate system
theme_minimal() +
theme(
axis.text = element_blank(),
axis.line = element_blank(),
axis.ticks = element_blank(),
panel.border = element_blank(),
panel.grid = element_line(colour = "white"),
axis.title = element_blank(),
panel.grid.major = element_line(colour = "white"),
panel.grid.minor = element_line(colour = "white"),
panel.background = element_blank(),
legend.title= element_blank()) +
labs(title = "Low response score")
```
The map below shows that a large majority of tracts in New Orleans are predicted to have a lower response rate in 2020 than the national response rate in 2010.
```{r}
df.polygon.LRS %>%
filter(COUNTYFP10 %in% c("071", "051") & Low_Response_Score > 25) %>%
ggplot() +
geom_sf(color="blue")+ # This is the main feature layer for this map
#
#scale_fill_brewer(palette = "YlGnBu") + # color palette
#scale_fill_viridis_d(direction=-1) + # alternate palette
#coord_sf(xlim = c(-90.085683,-90.051281), ylim = c(29.946286, 29.986772)) + # manually specify the extent
#coord_sf(crs=sf::st_crs(2163)) + # manually specify the coordinate system
theme_minimal() +
theme(
axis.text = element_blank(),
axis.line = element_blank(),
axis.ticks = element_blank(),
panel.border = element_blank(),
panel.grid = element_line(colour = "white"),
axis.title = element_blank(),
panel.grid.major = element_line(colour = "white"),
panel.grid.minor = element_line(colour = "white"),
panel.background = element_blank(),
legend.title= element_blank()) +
labs(title = "Tracts with LRS above 25")
```
```{r}
df.polygon.LRS %>%
filter(COUNTYFP10 == "087" & Low_Response_Score > 25) %>%
ggplot() +
geom_sf(color="blue")+ # This is the main feature layer for this map
#
#scale_fill_brewer(palette = "YlGnBu") + # color palette
#scale_fill_viridis_d(direction=-1) + # alternate palette
#coord_sf(xlim = c(-90.085683,-90.051281), ylim = c(29.946286, 29.986772)) + # manually specify the extent
#coord_sf(crs=sf::st_crs(2163)) + # manually specify the coordinate system
theme_minimal() +
theme(
axis.text = element_blank(),
axis.line = element_blank(),
axis.ticks = element_blank(),
panel.border = element_blank(),
panel.grid = element_line(colour = "white"),
axis.title = element_blank(),
panel.grid.major = element_line(colour = "white"),
panel.grid.minor = element_line(colour = "white"),
panel.background = element_blank(),
legend.title= element_blank()) +
labs(title = "Tracts with LRS above 25")
```
#### Young children
Below is a map of the same tracts mapped above (i.e. those predicted to have a lower response rate in 2020 than the national response rate in 2010), colored according to the number of children 4 and younger living there as of ACS 2013-2017.
Not sure if we're interested in using a threshold on these other data points as well or just providing the data for the HTC tracts. Thinking about what threshold would work well for choosing what tracts to target, I'm not sure there's a clear answer. Our conversations thus far have focused on the raw number of children in the tract. As far as I've seen, there is no specific intervention that happens once a tract has a certain number of children, for example. If we switched the variable to "households with two or more children 4 or under" or "children to households ratio," that may do more to identify what we're looking for, i.e. tracts whose children may be harder to count than other children. However, I think the original rationale of "more children = more hard-to-count people" makes plenty of sense.
```{r}
df.polygon.age <- tracts.la %>%
mutate(GEOID = as.character(GEOID10)) %>%
left_join(age, by="GEOID") %>%
mutate(bin = factor(cut(age$t4less, c(0, 150, 300, 450, 600, 700)), labels = c("0 - 150", "150 - 300", "300 - 450", "450 - 600", "600+"))) %>%
left_join(LRS, by = "GEOID")
df.polygon.age %>%
filter(COUNTYFP10 == "071" & Low_Response_Score >= 20.7) %>%
ggplot() +
geom_sf(aes(fill = bin), color="white")+ # This is the main feature layer for this map
scale_fill_brewer(palette = "YlGnBu") + # color palette
#scale_fill_viridis_c(direction=-1) + # alternate palette
#coord_sf(xlim = c(-90.085683,-90.051281), ylim = c(29.946286, 29.986772)) + # manually specify the extent
#coord_sf(crs=sf::st_crs(2163)) + # manually specify the coordinate system
theme_minimal() +
theme(
axis.text = element_blank(),
axis.line = element_blank(),
axis.ticks = element_blank(),
panel.border = element_blank(),
panel.grid = element_line(colour = "white"),
axis.title = element_blank(),
panel.grid.major = element_line(colour = "white"),
panel.grid.minor = element_line(colour = "white"),
panel.background = element_blank(),
legend.title= element_blank()) +
labs(title = "Number of children 4 and under")
```
Below is a map of the same tracts mapped above, colored according to the percent of children 4 and younger living there as of ACS 2013-2017.
```{r}
df.polygon.age %>%
filter(COUNTYFP10 == "071"& Low_Response_Score >= 20.7) %>%
ggplot() +
geom_sf(aes(fill = t4lesspct), color="white")+ # This is the main feature layer for this map
#scale_fill_brewer(palette = "YlGnBu") + # color palette
#scale_fill_viridis_c(direction=-1) + # alternate palette
#coord_sf(xlim = c(-90.085683,-90.051281), ylim = c(29.946286, 29.986772)) + # manually specify the extent
#coord_sf(crs=sf::st_crs(2163)) + # manually specify the coordinate system
theme_minimal() +
theme(
axis.text = element_blank(),
axis.line = element_blank(),
axis.ticks = element_blank(),
panel.border = element_blank(),
panel.grid = element_line(colour = "white"),
axis.title = element_blank(),
panel.grid.major = element_line(colour = "white"),
panel.grid.minor = element_line(colour = "white"),
panel.background = element_blank(),
legend.title= element_blank()) +
labs(title = "Percent of children 4 and under")
```
#### English as a barrier
The most straighforward measure of English ability is the English ability question from the ACS.
A paper published at the Census Bureau reports "that the English-ability question, despite being a self-assessment, does a good job of measuring English ability."
In Neighborhood Profiles, we group categories of English ability according to the following logic: "We’ve combined the categories of people who speak only English at home with those who speak another language at home but report that they speak English 'well.' This way we can focus on data about those individuals for whom speaking English is a barrier."
```{r}
df.polygon.lang <- tracts.la %>%
mutate(GEOID = as.character(GEOID10)) %>%
left_join(lang, by="GEOID")%>%
mutate(bin = factor(cut(lang$engnotwelltot, c(0, 100, 200, 300, 400)), labels = c("0 - 100", "100 - 200", "200 - 300", "400+"))) %>%
left_join(LRS, by = "GEOID")
df.polygon.lang %>%
filter(COUNTYFP10 == "071"& Low_Response_Score >= 20.7) %>%
ggplot() +
geom_sf(aes(fill = bin), color="white")+ # This is the main feature layer for this map
scale_fill_brewer(palette = "YlGnBu") + # color palette
#scale_fill_viridis_c(direction=-1) + # alternate palette
#coord_sf(xlim = c(-90.085683,-90.051281), ylim = c(29.946286, 29.986772)) + # manually specify the extent
#coord_sf(crs=sf::st_crs(2163)) + # manually specify the coordinate system
theme_minimal() +
theme(
axis.text = element_blank(),
axis.line = element_blank(),
axis.ticks = element_blank(),
panel.border = element_blank(),
panel.grid = element_line(colour = "white"),
axis.title = element_blank(),
panel.grid.major = element_line(colour = "white"),
panel.grid.minor = element_line(colour = "white"),
panel.background = element_blank(),
legend.title= element_blank()) +
labs(title = "Number of people for whom English is a barrier")
```
```{r}
df.polygon.lang %>%
filter(COUNTYFP10 == "071"& Low_Response_Score >= 20.7) %>%
ggplot() +
geom_sf(aes(fill = engnotwellpct), color="white")+ # This is the main feature layer for this map
#scale_fill_brewer(palette = "YlGnBu") + # color palette
#scale_fill_viridis_c(direction=-1) + # alternate palette
#coord_sf(xlim = c(-90.085683,-90.051281), ylim = c(29.946286, 29.986772)) + # manually specify the extent
#coord_sf(crs=sf::st_crs(2163)) + # manually specify the coordinate system
theme_minimal() +
theme(
axis.text = element_blank(),
axis.line = element_blank(),
axis.ticks = element_blank(),
panel.border = element_blank(),
panel.grid = element_line(colour = "white"),
axis.title = element_blank(),
panel.grid.major = element_line(colour = "white"),
panel.grid.minor = element_line(colour = "white"),
panel.background = element_blank(),
legend.title= element_blank()) +
labs(title = "Percent of people for whom English is a barrier")
```
The plot & table below are to get a feel for how many tracts are home to people for whom English is a barrier & what proportion of tract residents have English as a barrier. Excluding tracts that have 0% of their populations with English as a barrier, on average city-wide, 2.35% of tract residents have English as a barrier. The horizontal line on the plot represents that average.
```{r}
mn.ennw <- lang %>%
filter(parish == "071", is.nan(engnotwellpct) == FALSE, engnotwellpct != 0) %>%
summarise(mean.engnotwellpct = mean(engnotwellpct), med.engnotwellpct = median(engnotwellpct))
lang %>%
filter(parish == "071", is.nan(engnotwellpct) == FALSE, engnotwellpct != 0) %>%
mutate(`% w Eng as barrier` = round(engnotwellpct*100, digits = 2)) %>%
dplyr::select(tract, `% w Eng as barrier`) %>%
arrange(desc(`% w Eng as barrier`)) %>%
kable()
lang %>%
filter(parish == "071") %>%
ggplot(aes(x=tract, y = engnotwellpct*100)) +
geom_point() +
geom_hline(yintercept = 2.35) +
labs(title = "% of residents for whom English is a barrier, by tract",
y = "%")
```
The map below shows only the tracts for which greater than 2.35% of the residents have english as a barrier.
```{r}
df.polygon.lang %>%
filter(COUNTYFP10 == "071"& Low_Response_Score >= 20.7) %>%
filter(engnotwellpct > .07) %>%
ggplot() +
geom_sf(aes(fill = engnotwellpct*100), color="white")+ # This is the main feature layer for this map
#scale_fill_brewer(palette = "YlGnBu") + # color palette
#scale_fill_viridis_c(direction=-1) + # alternate palette
#coord_sf(xlim = c(-90.085683,-90.051281), ylim = c(29.946286, 29.986772)) + # manually specify the extent
#coord_sf(crs=sf::st_crs(2163)) + # manually specify the coordinate system
theme_minimal() +
theme(
axis.text = element_blank(),
axis.line = element_blank(),
axis.ticks = element_blank(),
panel.border = element_blank(),
panel.grid = element_line(colour = "white"),
axis.title = element_blank(),
panel.grid.major = element_line(colour = "white"),
panel.grid.minor = element_line(colour = "white"),
panel.background = element_blank(),
legend.title= element_blank()) +
labs(title = "Percent of people for whom English is a barrier")
```
```{r}
df.polygon.lang %>%
filter(COUNTYFP10 == "087") %>%
filter(engnotwellpct > .02) %>%
ggplot() +
geom_sf(aes(fill = engnotwellpct*100), color="white")+ # This is the main feature layer for this map
#scale_fill_brewer(palette = "YlGnBu") + # color palette
#scale_fill_viridis_c(direction=-1) + # alternate palette
#coord_sf(xlim = c(-90.085683,-90.051281), ylim = c(29.946286, 29.986772)) + # manually specify the extent
#coord_sf(crs=sf::st_crs(2163)) + # manually specify the coordinate system
theme_minimal() +
theme(
axis.text = element_blank(),
axis.line = element_blank(),
axis.ticks = element_blank(),
panel.border = element_blank(),
panel.grid = element_line(colour = "white"),
axis.title = element_blank(),
panel.grid.major = element_line(colour = "white"),
panel.grid.minor = element_line(colour = "white"),
panel.background = element_blank(),
legend.title= element_blank()) +
labs(title = "Percent of people for whom English is a barrier")
```