Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iss414 #447

Draft
wants to merge 2 commits into
base: version2025
Choose a base branch
from
Draft

Iss414 #447

wants to merge 2 commits into from

Conversation

malcalakovalski
Copy link
Contributor

Mobility metric pull request template

Please include the following points in your PR:

  1. A link to the issue that this PR relates to. You can bring up a list of suggested issues and pull requests within the repository by typing Update racial diversity data for 2023, consider confidence intervals, and backfill years #414 .

  2. A description of the content in this pull request.

  • What was changed?
    • Added 2016 and 2023 data. Since the ACS codes for race/ethnicity population changed for these years, I added a function that uses year specific ACS codes instead.
    • Added crosswalk for 2016 and 2018 data in the places file
    • Added testing on final files
  • What should the reviewer be focusing on?
    • Focus on whether the crosswalks in the place file seem correct
    • Recommendations on how to consolidate both files or reduce redundancies (there's a significant amount of code overlap in both files)
    • Check whether 2014 could be added (I couldn't find codes for non-hispanic other alone or non-hispanic two or more races, but a double check would be great)
    • There are some missing places in the places file, detailed below
  • Is there a logical order to review the files in?
    • I recommend starting with the county level script which is simpler and then turning to the place level script
  1. Detail on any issues or flags that the metric reviewer/data-team should be aware of.

The following 12 places are missing from the places level metric file:

    year state_name  place_name        
   <dbl> <chr>       <chr>             
 1  2016 California  Jurupa Valley city
 2  2016 Georgia     Macon-Bibb County 
 3  2018 California  Jurupa Valley city
 4  2018 Georgia     Macon-Bibb County 
 5  2018 Georgia     South Fulton city 
 6  2023 Connecticut Bridgeport city   
 7  2023 Connecticut Danbury city      
 8  2023 Connecticut Hartford city     
 9  2023 Connecticut New Haven city    
10  2023 Connecticut Norwalk city      
11  2023 Connecticut Stamford city     
12  2023 Connecticut Waterbury city    

@cdsolari cdsolari requested a review from ridhi96 January 16, 2025 20:01
@malcalakovalski malcalakovalski changed the base branch from main to version2025 January 16, 2025 21:02
@cdsolari cdsolari requested review from wcurrangroome and removed request for ridhi96 January 23, 2025 01:02
@cdsolari cdsolari assigned wcurrangroome and unassigned ridhi96 Jan 23, 2025
Copy link
Collaborator

@wcurrangroome wcurrangroome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks great. I've only reviewed the county code so far, but I don't believe I had any comments about logic errors--purely style suggestions and opportunities to reorganize content, along with a couple minor things for reproducibility (making sure folders exist in a freshly-cloned repo).

Two high-level comments:

  • I checked quickly--it'd be worth double-checking--but I believe the codes for the needed variables are consistent over time (including in 2014) for the detailed tables. I would consider using the appropriate detailed table, rather than the data profile table, to both make this more robust for future updates and to reduce the amount of variable code- variable name - data year crosswalking. I would also consider shifting to library(tidycensus) if you go this route, which IMO has a much cleaner interface for ACS data.
  • I haven't significantly reviewed the city-level script, but I agree that this seems ripe for consolidation. I'd consider specifying a variable at the top of the file called geography, and using that to control any differential logic between county-level and place-level analyses. Then you could have a single script, with the great majority of the code encompassing workflows that apply to both geographic levels. This comes with some associated downsides (code's a bit more complex, and the combined script is longer than either of the two stand-alone scripts), but I'd argue it's well worth it. It's by no means perfect, but an example of that approach is here: https://github.com/UI-Research/mobility-from-poverty/blob/f0d79cf3ec3785c4602f94f02b35d6be44cadfaf/02_housing/ratio_housing_affordable_available.qmd.

message: false
---

# Racial/ethnic Exposure
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This intro/overview is excellent.

source(here::here("functions", "testing", "evaluate_final_data.R"))
```

## 2021 ACS 5-Year Estimates
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## 2021 ACS 5-Year Estimates
## ACS 5-Year Estimates

knitr::include_graphics(here::here("06_neighborhoods", "www", "images", "race-ethnicity.png"))
```

We pull all of the race/ethnicity counts for 2021 using `library(censusapi)`. **Note:** This will require a [Census API key](https://api.census.gov/data/key_signup.html). Add the key to `census_api_key-template.R` and then delete then delete "template". It is sourced above.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We pull all of the race/ethnicity counts for 2021 using `library(censusapi)`. **Note:** This will require a [Census API key](https://api.census.gov/data/key_signup.html). Add the key to `census_api_key-template.R` and then delete then delete "template". It is sourced above.
We pull all of the race/ethnicity counts using `library(censusapi)`. **Note:** This will require a [Census API key](https://api.census.gov/data/key_signup.html). Add the key to `census_api_key-template.R` and then delete then delete "template". It is sourced above.


The variable codes for population by race/ethnicity are different for different years. Therefore, we construct a tribble with the variable-year combinations

```{r}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you wrap this in logic that checks if the data's already available locally, and only if not queries the API?

```{r}
#| label: load-tract-data

# Tribble with year-specific ACS codes and corresponding labels
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something obvious, but the detailed table codes are consistent over time (including for 2014, I believe)--perhaps use those instead? Table B03002.

I'd also consider using library(tidycensus) if we're working with 5-year ACS estimates--it'll just afford a simpler/cleaner workflow than library(censusapi), though understand if it's not worth the hassle of switching.

#'
#' @return A numeric data quality flag
#'
set_quality <- function(cv, exposure) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

return(quality)
}

county_data <- county_data %>%
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

across()


```{r}
#| label: collapse-race-ethnicity-categories
tracts <- tracts %>%
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My de-rigeur comment about not naming over objects; prefer tracts2 = tracts1 %>% mutate(...)

```

```{r}
tracts <- tracts %>%
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do? / can you negative-select columns more concisely, if that's the goal?

facet_wrap(. ~ year)
```

### 6. Add Data Quality Flags
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense in isolation, but I'm not sure why we suppress based on counts here but then don't suppress based on CVs in the subsequent section. Or, perhaps more consistently, why we don't uniformly suppress based on CV, which should address most/all of the observations with very small counts. (I know that's the guidance, but wonder if it makes sense to check about suppressing above a certain CV threshold?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants