-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iss414 #447
base: version2025
Are you sure you want to change the base?
Iss414 #447
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks great. I've only reviewed the county code so far, but I don't believe I had any comments about logic errors--purely style suggestions and opportunities to reorganize content, along with a couple minor things for reproducibility (making sure folders exist in a freshly-cloned repo).
Two high-level comments:
- I checked quickly--it'd be worth double-checking--but I believe the codes for the needed variables are consistent over time (including in 2014) for the detailed tables. I would consider using the appropriate detailed table, rather than the data profile table, to both make this more robust for future updates and to reduce the amount of variable code- variable name - data year crosswalking. I would also consider shifting to
library(tidycensus)
if you go this route, which IMO has a much cleaner interface for ACS data. - I haven't significantly reviewed the city-level script, but I agree that this seems ripe for consolidation. I'd consider specifying a variable at the top of the file called
geography
, and using that to control any differential logic between county-level and place-level analyses. Then you could have a single script, with the great majority of the code encompassing workflows that apply to both geographic levels. This comes with some associated downsides (code's a bit more complex, and the combined script is longer than either of the two stand-alone scripts), but I'd argue it's well worth it. It's by no means perfect, but an example of that approach is here: https://github.com/UI-Research/mobility-from-poverty/blob/f0d79cf3ec3785c4602f94f02b35d6be44cadfaf/02_housing/ratio_housing_affordable_available.qmd.
message: false | ||
--- | ||
|
||
# Racial/ethnic Exposure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This intro/overview is excellent.
source(here::here("functions", "testing", "evaluate_final_data.R")) | ||
``` | ||
|
||
## 2021 ACS 5-Year Estimates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## 2021 ACS 5-Year Estimates | |
## ACS 5-Year Estimates |
knitr::include_graphics(here::here("06_neighborhoods", "www", "images", "race-ethnicity.png")) | ||
``` | ||
|
||
We pull all of the race/ethnicity counts for 2021 using `library(censusapi)`. **Note:** This will require a [Census API key](https://api.census.gov/data/key_signup.html). Add the key to `census_api_key-template.R` and then delete then delete "template". It is sourced above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We pull all of the race/ethnicity counts for 2021 using `library(censusapi)`. **Note:** This will require a [Census API key](https://api.census.gov/data/key_signup.html). Add the key to `census_api_key-template.R` and then delete then delete "template". It is sourced above. | |
We pull all of the race/ethnicity counts using `library(censusapi)`. **Note:** This will require a [Census API key](https://api.census.gov/data/key_signup.html). Add the key to `census_api_key-template.R` and then delete then delete "template". It is sourced above. |
|
||
The variable codes for population by race/ethnicity are different for different years. Therefore, we construct a tribble with the variable-year combinations | ||
|
||
```{r} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you wrap this in logic that checks if the data's already available locally, and only if not queries the API?
```{r} | ||
#| label: load-tract-data | ||
|
||
# Tribble with year-specific ACS codes and corresponding labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be missing something obvious, but the detailed table codes are consistent over time (including for 2014, I believe)--perhaps use those instead? Table B03002
.
I'd also consider using library(tidycensus)
if we're working with 5-year ACS estimates--it'll just afford a simpler/cleaner workflow than library(censusapi)
, though understand if it's not worth the hassle of switching.
#' | ||
#' @return A numeric data quality flag | ||
#' | ||
set_quality <- function(cv, exposure) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
return(quality) | ||
} | ||
|
||
county_data <- county_data %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
across()
|
||
```{r} | ||
#| label: collapse-race-ethnicity-categories | ||
tracts <- tracts %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My de-rigeur comment about not naming over objects; prefer tracts2 = tracts1 %>% mutate(...)
``` | ||
|
||
```{r} | ||
tracts <- tracts %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do? / can you negative-select columns more concisely, if that's the goal?
facet_wrap(. ~ year) | ||
``` | ||
|
||
### 6. Add Data Quality Flags |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense in isolation, but I'm not sure why we suppress based on counts here but then don't suppress based on CVs in the subsequent section. Or, perhaps more consistently, why we don't uniformly suppress based on CV, which should address most/all of the observations with very small counts. (I know that's the guidance, but wonder if it makes sense to check about suppressing above a certain CV threshold?)
Mobility metric pull request template
Please include the following points in your PR:
A link to the issue that this PR relates to. You can bring up a list of suggested issues and pull requests within the repository by typing Update racial diversity data for 2023, consider confidence intervals, and backfill years #414 .
A description of the content in this pull request.
The following 12 places are missing from the places level metric file: