Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Water Insecurity Data #798

Merged
merged 8 commits into from
Jan 19, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ Our over-arching goal for TidyTuesday is to provide real-world datasets so that
| 1 | `2025-01-07` | Bring your own data to start the year! | | |
| 2 | `2025-01-14` | [posit::conf talks](data/2025/2025-01-14/readme.md) | [posit::conf attendee portal 2023](https://reg.conf.posit.co/flow/posit/positconf23/attendee-portal/page/sessioncatalog), [posit::conf attendee portal 2024](https://reg.conf.posit.co/flow/posit/positconf24/attendee-portal/page/sessioncatalog) | [posit::conf(2025) in-person registration is now open!](https://posit.co/blog/positconf2025-in-person-registration-is-now-open/) |
| 3 | `2025-01-21` | [The History of Himalayan Mountaineering Expeditions](data/2025/2025-01-21/readme.md) | [The Himalayan Database](https://www.himalayandatabase.com/downloads.html) | [The Expedition Archives of Elizabeth Hawley](https://www.himalayandatabase.com/index.html) |
| 4 | `2025-01-28` | [Water Insecurity](data/2025/2025-01-28/readme.md) | [US Census Data from tidycensus](https://cran.r-project.org/package=tidycensus) | [Mapping water insecurity in R with tidycensus](https://waterdata.usgs.gov/blog/acs-maps/) |

***

Expand Down
19 changes: 19 additions & 0 deletions data/2025/2025-01-28/meta.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
title: "Water Insecurity"
article:
title: "Mapping water insecurity in R with tidycensus"
url: "https://waterdata.usgs.gov/blog/acs-maps/"
data_source:
title: "US Census Data from tidycensus"
url: "https://cran.r-project.org/package=tidycensus"
images:
# Please include at least one image, and up to three images
- file: "tidycensus-intro-banner.png"
alt: >
Three choropleth maps of the United States west of the Mississippi River, using 2022 U.S. Census Bureau Data, entitled Mapping water insecurity in R with tidycensus. The first choropleth is labeled Percent Hispanic, 2022, and shows the highest percentages of Hispanic people near the US-Mexico border, with scattered high percentages, such as in the state of Washington. The second choropleth is labeled Median gross rent, 2022, and shows the highest rents in California, Washington state, and Colorado. The third choropleth is labeled Average household size, 2022, and has scattered areas of large household size, with the highest averages in South Dakota, Utah, southern California, and southern Texas. The image also includes the hex logo of the tidycensus R package, with an indistinct choropleth map in shades of green.
credit:
# We want to thank you for curating this dataset! If you do not want a
# particular type of credit, please delete the related line.
post: "Niha Pereira"
bluesky: "https://bsky.app/profile/nnpereira"
linkedin: "https://www.linkedin.com/in/niha-pereira"
github: "https://github.com/nnpereira"
141 changes: 141 additions & 0 deletions data/2025/2025-01-28/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
# Water Insecurity

This week we're exploring water insecurity data featured in the article [Mapping water insecurity in R with tidycensus](https://waterdata.usgs.gov/blog/acs-maps/)!

> Water insecurity can be influenced by number of social vulnerability indicators—from demographic characteristics to living conditions and socioeconomic status —that vary spatially across the U.S. This blog shows how the tidycensus package for R can be used to access U.S. Census Bureau data, including the American Community Surveys, as featured in the “Unequal Access to Water ” data visualization from the USGS Vizlab. It offers reproducible code examples demonstrating use of tidycensus for easy exploration and visualization of social vulnerability indicators in the Western U.S.

- How does the lack of complete indoor plumbing compare between the 2023 and 2022 Census data?
- What counties have the greatest percent of households lacking plumbing?
- Are there differences in indoor plumbing availability between Western U.S and Eastern U.S counties?

Thank you to [Niha Pereira](https://github.com/nnpereira) for curating this week's dataset.

## The Data

```r
# Option 1: tidytuesdayR package
## install.packages("tidytuesdayR")

tuesdata <- tidytuesdayR::tt_load('2025-01-28')
## OR
tuesdata <- tidytuesdayR::tt_load(2025, week = 4)

water_insecurity_2022 <- tuesdata$water_insecurity_2022
water_insecurity_2023 <- tuesdata$water_insecurity_2023

# Option 2: Read directly from GitHub

water_insecurity_2022 <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-01-28/water_insecurity_2022.csv')
water_insecurity_2023 <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-01-28/water_insecurity_2023.csv')
```

## How to Participate

- [Explore the data](https://r4ds.hadley.nz/), watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about **causation** in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
- Create a visualization, a model, a [shiny app](https://shiny.posit.co/), or some other piece of data-science-related output, using R or another programming language.
- [Share your output and the code used to generate it](../../../sharing.md) on social media with the #TidyTuesday hashtag.
- [Submit your own dataset!](../../../.github/pr_instructions.md)

### Data Dictionary

# `water_insecurity_2022.csv`

|variable |class |description |
|:------------------------|:----------------|:-------------------------------------|
|geoid |character |The U.S. Census Bureau ACS county id. |
|name |character |The U.S. Census Bureau ACS county name. |
|year |character |The year of U.S. Census Bureau ACS sample. |
|geometry |sfc_MULTIPOLYGON |The county geographic boundaries. |
|total_pop |double |The total population. |
|plumbing |double |The total owner occupied households lacking plumbing facilities. |
|percent_lacking_plumbing |double |The percent of population lacking plumbing facilities. |

# `water_insecurity_2023.csv`

|variable |class |description |
|:------------------------|:----------------|:-------------------------------------|
|geoid |character |The U.S. Census Bureau ACS county id. |
|name |character |The U.S. Census Bureau ACS county name. |
|year |character |The year of U.S. Census Bureau ACS sample. |
|geometry |sfc_MULTIPOLYGON |The county geographic boundaries. |
|total_pop |double |The total population. |
|plumbing |double |The total owner occupied households lacking plumbing facilities. |
|percent_lacking_plumbing |double |The percent of population lacking plumbing facilities. |

### Cleaning Script

```r
# Clean data compiled from code referenced in article (https://waterdata.usgs.gov/blog/acs-maps/).
# Code was revised to pull data for all US counties for years 2022 - 2023.

# Load packages -----
library(tidycensus)
library(sf)
library(janitor)
library(tidyverse)

# Helper functions -----
get_census_data <- function(geography, var_names, year, proj, survey_var) {
df <- get_acs(
geography = geography,
variable = var_names,
year = year,
geometry = TRUE,
survey = survey_var) |>
clean_names() |>
st_transform(proj) |>
mutate(year = year)

return(df)
}

# Grab relevant variables - B01003_001: total population, B25049_004: households lacking plumbing----
vars <- c("B01003_001", "B25049_004")

# Pull data for 2023 and 2022 for all US counties ------
water_insecurity_2023 <- get_census_data(
geography = 'county',
var_names = vars,
year = "2023",
proj = "EPSG:5070",
survey_var = "acs1"
) |>
mutate(
variable_long = case_when(
variable == "B01003_001" ~ "total_pop",
variable == "B25049_004" ~ "plumbing",
.default = NA_character_
)
) |>
select(geoid, name, variable_long, estimate, geometry, year) |>
pivot_wider(
names_from = variable_long,
values_from = estimate
) |>
mutate(
percent_lacking_plumbing = (plumbing / total_pop) * 100
)

water_insecurity_2022 <- get_census_data(
geography = 'county',
var_names = vars,
year = "2022",
proj = "EPSG:5070",
survey_var = "acs1"
) |>
mutate(
variable_long = case_when(
variable == "B01003_001" ~ "total_pop",
variable == "B25049_004" ~ "plumbing",
.default = NA_character_
)
) |>
select(geoid, name, variable_long, estimate, geometry, year) |>
pivot_wider(
names_from = variable_long,
values_from = estimate
) |>
mutate(
percent_lacking_plumbing = (plumbing / total_pop) * 100
)
```
Binary file added data/2025/2025-01-28/tidycensus-intro-banner.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading