diff --git a/README.md b/README.md index 658362d..4126f30 100644 --- a/README.md +++ b/README.md @@ -7,7 +7,7 @@ module-2-data-cleaning - -### Question 3 - -Next, we’ll clean our data by renaming variables: - -``` r -snapCounties %>% - rename( - total_pop = B19058_001E, - total_pop_moe = B19058_001M, - snap = B19058_002E, - snap_moe = B19058_002M - ) -> snapCounties -``` - -## Part 2 - -### Question 4 - -Next, we’ll download the relevant ACS data for Medicaid using -`get_acs()`: - -``` r -medicaidCounties <- get_acs(geography = "county", year = 2019, state = 29, - variables = c("C27007_002", "C27007_012"), - output = "wide", geometry = FALSE) -``` - - ## Getting data from the 2015-2019 5-year ACS - -Now we have the number of male and female Medicaid recipients. - -### Question 5 - -Next, we’ll tidy up the demographic data, including by renaming -variables and summing our male and female Medicaid estimates: - -``` r -medicaidCounties %>% - rename( - medicaid_male = C27007_002E, - medicaid_male_moe = C27007_002M, - medicaid_female = C27007_012E, - medicaid_female_moe = C27007_012M - ) %>% - mutate(medicaid = medicaid_male + medicaid_female) %>% - select(-NAME) -> medicaidCounties -``` - -Now our data are ready to join with our SNAP recipiency data! - -## Part 3 - -### Question 6 - -Finally, we’ll combine our data: - -``` r -services <- left_join(snapCounties, medicaidCounties, by = "GEOID") -``` - -To make sure things went correctly, we’ll preview our data again: - -``` r -mapview(services, zcol = "medicaid") -``` - -![](lab-05_files/figure-gfm/preview-counties-1.png) - -Our data map correctly! diff --git a/assignments/lab-05-replication/docs/lab-05_files/figure-gfm/preview-counties-1.png b/assignments/lab-05-replication/docs/lab-05_files/figure-gfm/preview-counties-1.png deleted file mode 100644 index 04ffd1c..0000000 Binary files a/assignments/lab-05-replication/docs/lab-05_files/figure-gfm/preview-counties-1.png and /dev/null differ diff --git a/assignments/lab-05-replication/docs/lab-05_files/figure-gfm/preview-snap-counties-1.png b/assignments/lab-05-replication/docs/lab-05_files/figure-gfm/preview-snap-counties-1.png deleted file mode 100644 index 97640e0..0000000 Binary files a/assignments/lab-05-replication/docs/lab-05_files/figure-gfm/preview-snap-counties-1.png and /dev/null differ diff --git a/assignments/lab-05-replication/.gitignore b/assignments/lab-2-2-replication/.gitignore similarity index 100% rename from assignments/lab-05-replication/.gitignore rename to assignments/lab-2-2-replication/.gitignore diff --git a/assignments/lab-05-replication/README.md b/assignments/lab-2-2-replication/README.md similarity index 100% rename from assignments/lab-05-replication/README.md rename to assignments/lab-2-2-replication/README.md diff --git a/assignments/lab-05-replication/data/MO_SNAP_Households/MO_SNAP_Households.dbf b/assignments/lab-2-2-replication/data/MO_SNAP_Households/MO_SNAP_Households.dbf similarity index 100% rename from assignments/lab-05-replication/data/MO_SNAP_Households/MO_SNAP_Households.dbf rename to assignments/lab-2-2-replication/data/MO_SNAP_Households/MO_SNAP_Households.dbf diff --git a/assignments/lab-05-replication/data/MO_SNAP_Households/MO_SNAP_Households.prj b/assignments/lab-2-2-replication/data/MO_SNAP_Households/MO_SNAP_Households.prj similarity index 100% rename from assignments/lab-05-replication/data/MO_SNAP_Households/MO_SNAP_Households.prj rename to assignments/lab-2-2-replication/data/MO_SNAP_Households/MO_SNAP_Households.prj diff --git a/assignments/lab-05-replication/data/MO_SNAP_Households/MO_SNAP_Households.shp b/assignments/lab-2-2-replication/data/MO_SNAP_Households/MO_SNAP_Households.shp similarity index 100% rename from assignments/lab-05-replication/data/MO_SNAP_Households/MO_SNAP_Households.shp rename to assignments/lab-2-2-replication/data/MO_SNAP_Households/MO_SNAP_Households.shp diff --git a/assignments/lab-05-replication/data/MO_SNAP_Households/MO_SNAP_Households.shx b/assignments/lab-2-2-replication/data/MO_SNAP_Households/MO_SNAP_Households.shx similarity index 100% rename from assignments/lab-05-replication/data/MO_SNAP_Households/MO_SNAP_Households.shx rename to assignments/lab-2-2-replication/data/MO_SNAP_Households/MO_SNAP_Households.shx diff --git a/assignments/lab-05-replication/data/STL_SNAP_Households/STL_SNAP_Households.dbf b/assignments/lab-2-2-replication/data/STL_SNAP_Households/STL_SNAP_Households.dbf similarity index 100% rename from assignments/lab-05-replication/data/STL_SNAP_Households/STL_SNAP_Households.dbf rename to assignments/lab-2-2-replication/data/STL_SNAP_Households/STL_SNAP_Households.dbf diff --git a/assignments/lab-05-replication/data/STL_SNAP_Households/STL_SNAP_Households.prj b/assignments/lab-2-2-replication/data/STL_SNAP_Households/STL_SNAP_Households.prj similarity index 100% rename from assignments/lab-05-replication/data/STL_SNAP_Households/STL_SNAP_Households.prj rename to assignments/lab-2-2-replication/data/STL_SNAP_Households/STL_SNAP_Households.prj diff --git a/assignments/lab-05-replication/data/STL_SNAP_Households/STL_SNAP_Households.shp b/assignments/lab-2-2-replication/data/STL_SNAP_Households/STL_SNAP_Households.shp similarity index 100% rename from assignments/lab-05-replication/data/STL_SNAP_Households/STL_SNAP_Households.shp rename to assignments/lab-2-2-replication/data/STL_SNAP_Households/STL_SNAP_Households.shp diff --git a/assignments/lab-05-replication/data/STL_SNAP_Households/STL_SNAP_Households.shx b/assignments/lab-2-2-replication/data/STL_SNAP_Households/STL_SNAP_Households.shx similarity index 100% rename from assignments/lab-05-replication/data/STL_SNAP_Households/STL_SNAP_Households.shx rename to assignments/lab-2-2-replication/data/STL_SNAP_Households/STL_SNAP_Households.shx diff --git a/assignments/lab-05-replication/docs/lab-05.Rmd b/assignments/lab-2-2-replication/docs/lab-2-2.Rmd similarity index 94% rename from assignments/lab-05-replication/docs/lab-05.Rmd rename to assignments/lab-2-2-replication/docs/lab-2-2.Rmd index 566a289..65c6ab2 100644 --- a/assignments/lab-05-replication/docs/lab-05.Rmd +++ b/assignments/lab-2-2-replication/docs/lab-2-2.Rmd @@ -1,10 +1,11 @@ --- -title: "Lab-05 Replication Notebook" +title: "Lab 2-2 Replication Notebook" author: "Christopher Prener, Ph.D." date: '(`r format(Sys.time(), "%B %d, %Y")`)' output: github_document: default html_notebook: default +always_allow_html: true --- ```{r setup} @@ -12,7 +13,7 @@ knitr::opts_chunk$set(cache = FALSE) ``` ## Introduction -This is the replication notebook for Lab-05 from the course SOC 4650/5650: Introduction to GISc. +This is the replication notebook for Lab 2-2 from the course SOC 4650/5650: Introduction to GISc. ## Load Dependencies The following code loads the package dependencies for our analysis: diff --git a/assignments/lab-2-2-replication/docs/lab-2-2.md b/assignments/lab-2-2-replication/docs/lab-2-2.md new file mode 100644 index 0000000..251d0f3 --- /dev/null +++ b/assignments/lab-2-2-replication/docs/lab-2-2.md @@ -0,0 +1,178 @@ +Lab 2-2 Replication Notebook +================ +Christopher Prener, Ph.D. +(February 21, 2022) + +``` r +knitr::opts_chunk$set(cache = FALSE) +``` + +## Introduction + +This is the replication notebook for Lab 2-2 from the course SOC +4650/5650: Introduction to GISc. + +## Load Dependencies + +The following code loads the package dependencies for our analysis: + +``` r +# tidyverse packages +library(dplyr) # data wrangling +``` + + ## + ## Attaching package: 'dplyr' + + ## The following objects are masked from 'package:stats': + ## + ## filter, lag + + ## The following objects are masked from 'package:base': + ## + ## intersect, setdiff, setequal, union + +``` r +# spatial packages +library(mapview) # preview spatial data +library(sf) # spatial data tools +``` + + ## Linking to GEOS 3.8.1, GDAL 3.2.1, PROJ 7.2.1 + +``` r +library(tidycensus) # data wrangling +library(tigris) # data wrangling +``` + + ## To enable + ## caching of data, set `options(tigris_use_cache = TRUE)` in your R script or .Rprofile. + + ## + ## Attaching package: 'tigris' + + ## The following object is masked from 'package:tidycensus': + ## + ## fips_codes + +``` r +# other packages +library(here) # file path tools +``` + + ## here() starts at /Users/prenercg/GitHub/slu-soc5650/module-2-combine-sources/assignments/lab-2-2-replication + +## Part 1 + +### Question 1 + +First, we’ll download and preview the variables using the +`load_variables()` function from `tidycensus`. + +``` r +acs <- load_variables(2019, "acs5", cache = TRUE) +``` + +The variables we need represent: + +- `"PUBLIC ASSISTANCE INCOME OR FOOD STAMPS/SNAP IN THE PAST 12 MONTHS FOR HOUSEHOLDS"` +- `"MEDICAID/MEANS-TESTED PUBLIC COVERAGE BY SEX BY AGE"` + +### Question 2 + +First, we’ll download the relevant ACS data using `get_acs()`. We get +the data for all counties by specifying `"county"` as the geography: + +``` r +snapCounties <- get_acs(geography = "county", year = 2019, state = 29, + variables = c("B19058_001", "B19058_002"), + output = "wide", geometry = TRUE) +``` + + ## Getting data from the 2015-2019 5-year ACS + + ## Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`. + + ## | | | 0% | | | 1% | |= | 1% | |= | 2% | |== | 2% | |== | 3% | |== | 4% | |=== | 4% | |=== | 5% | |==== | 5% | |==== | 6% | |===== | 6% | |===== | 7% | |===== | 8% | |====== | 8% | |====== | 9% | |======= | 10% | |======= | 11% | |======== | 11% | |======== | 12% | |========= | 13% | |========== | 14% | |========== | 15% | |=========== | 15% | |=========== | 16% | |============ | 17% | |============ | 18% | |============= | 18% | |============= | 19% | |============== | 20% | |=============== | 21% | |=============== | 22% | |================ | 23% | |================= | 24% | |================= | 25% | |================== | 25% | |================== | 26% | |=================== | 27% | |====================== | 31% | |======================= | 33% | |======================== | 34% | |======================== | 35% | |========================= | 35% | |========================= | 36% | |========================== | 37% | |============================ | 40% | |============================= | 41% | |============================= | 42% | |============================== | 42% | |============================== | 43% | |============================== | 44% | |=============================== | 44% | |=============================== | 45% | |================================ | 45% | |================================ | 46% | |================================= | 47% | |================================= | 48% | |================================== | 48% | |================================== | 49% | |=================================== | 49% | |=================================== | 50% | |==================================== | 51% | |==================================== | 52% | |===================================== | 52% | |===================================== | 53% | |====================================== | 54% | |====================================== | 55% | |======================================= | 55% | |======================================= | 56% | |======================================== | 56% | |======================================== | 57% | |======================================== | 58% | |========================================= | 58% | |========================================= | 59% | |========================================== | 59% | |========================================== | 60% | |========================================== | 61% | |=========================================== | 61% | |============================================= | 64% | |====================================================== | 78% | |================================================================ | 92% | |=================================================================== | 96% | |==================================================================== | 97% | |==================================================================== | 98% | |===================================================================== | 98% | |===================================================================== | 99% | |======================================================================| 100% + +We can preview our geometric data with `mapview`: + +``` r +mapview(snapCounties) +``` + + ## PhantomJS not found. You can install it with webshot::install_phantomjs(). If it is installed, please make sure the phantomjs executable can be found via the PATH variable. + +
+ + +### Question 3 + +Next, we’ll clean our data by renaming variables: + +``` r +snapCounties %>% + rename( + total_pop = B19058_001E, + total_pop_moe = B19058_001M, + snap = B19058_002E, + snap_moe = B19058_002M + ) -> snapCounties +``` + +## Part 2 + +### Question 4 + +Next, we’ll download the relevant ACS data for Medicaid using +`get_acs()`: + +``` r +medicaidCounties <- get_acs(geography = "county", year = 2019, state = 29, + variables = c("C27007_002", "C27007_012"), + output = "wide", geometry = FALSE) +``` + + ## Getting data from the 2015-2019 5-year ACS + +Now we have the number of male and female Medicaid recipients. + +### Question 5 + +Next, we’ll tidy up the demographic data, including by renaming +variables and summing our male and female Medicaid estimates: + +``` r +medicaidCounties %>% + rename( + medicaid_male = C27007_002E, + medicaid_male_moe = C27007_002M, + medicaid_female = C27007_012E, + medicaid_female_moe = C27007_012M + ) %>% + mutate(medicaid = medicaid_male + medicaid_female) %>% + select(-NAME) -> medicaidCounties +``` + +Now our data are ready to join with our SNAP recipiency data! + +## Part 3 + +### Question 6 + +Finally, we’ll combine our data: + +``` r +services <- left_join(snapCounties, medicaidCounties, by = "GEOID") +``` + +To make sure things went correctly, we’ll preview our data again: + +``` r +mapview(services, zcol = "medicaid") +``` + + + + +Our data map correctly! diff --git a/assignments/lab-05-replication/docs/lab-05.nb.html b/assignments/lab-2-2-replication/docs/lab-2-2.nb.html similarity index 83% rename from assignments/lab-05-replication/docs/lab-05.nb.html rename to assignments/lab-2-2-replication/docs/lab-2-2.nb.html index 96258cb..08feed4 100644 --- a/assignments/lab-05-replication/docs/lab-05.nb.html +++ b/assignments/lab-2-2-replication/docs/lab-2-2.nb.html @@ -12,19 +12,28 @@ -We rarely want to download these one at a time. Instead, we want to download them at one time into a single data frame. The table number for these data is P003
- we take the first four characters from the name
variable.
cityRace00 <- get_decennial(geography = "tract", year = 2000, state = 29,
+ county = "510", table = "P003", output = "wide")
+
We’ve used the FIPS codes for both Missouri (29
) and St. Louis City (29510
) here - you can find a full list of Missouri counties here.
The tidycensus
package also includes tools for downloading the geometries for these data as well. For instance, we can add geometric data to our previous call for City of St. Louis tract-level data on race by adding the geometry = TRUE
argument:
## download
+cityRace00 <- get_decennial(geography = "tract", year = 2000, state = 29,
+ county = "510", table = "P003", output = "wide",
+ geometry = TRUE)
+
+
+Getting data from the 2000 decennial Census
+Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
+Loading SF1 variables for 2000 from table P003. To cache this dataset for faster access to Census tables in the future, run this function with `cache_table = TRUE`. You only need to do this once per Census dataset.
+Using Census Summary File 1
+Using Census Summary File 1
+
+
+
+ |
+ | | 0%
+ |
+ |== | 1%
+ |
+ |=== | 2%
+ |
+ |==== | 3%
+ |
+ |===== | 3%
+ |
+ |===== | 4%
+ |
+ |====== | 4%
+ |
+ |======= | 5%
+ |
+ |======== | 5%
+ |
+ |========= | 6%
+ |
+ |========== | 7%
+ |
+ |=========== | 7%
+ |
+ |=========== | 8%
+ |
+ |============ | 8%
+ |
+ |============= | 9%
+ |
+ |============== | 9%
+ |
+ |============== | 10%
+ |
+ |=============== | 10%
+ |
+ |================ | 11%
+ |
+ |================= | 11%
+ |
+ |================= | 12%
+ |
+ |================== | 12%
+ |
+ |================== | 13%
+ |
+ |=================== | 13%
+ |
+ |==================== | 13%
+ |
+ |==================== | 14%
+ |
+ |====================== | 15%
+ |
+ |======================= | 16%
+ |
+ |======================== | 16%
+ |
+ |========================== | 17%
+ |
+ |=========================== | 19%
+ |
+ |============================= | 20%
+ |
+ |=============================== | 21%
+ |
+ |================================= | 22%
+ |
+ |================================== | 23%
+ |
+ |==================================== | 25%
+ |
+ |====================================== | 26%
+ |
+ |======================================== | 27%
+ |
+ |========================================== | 28%
+ |
+ |============================================ | 30%
+ |
+ |============================================== | 31%
+ |
+ |=============================================== | 32%
+ |
+ |================================================ | 32%
+ |
+ |================================================= | 33%
+ |
+ |=================================================== | 35%
+ |
+ |==================================================== | 36%
+ |
+ |======================================================= | 37%
+ |
+ |======================================================== | 38%
+ |
+ |========================================================== | 40%
+ |
+ |============================================================ | 41%
+ |
+ |=============================================================== | 43%
+ |
+ |=================================================================== | 45%
+ |
+ |===================================================================== | 47%
+ |
+ |====================================================================== | 48%
+ |
+ |========================================================================= | 49%
+ |
+ |========================================================================== | 50%
+ |
+ |============================================================================== | 53%
+ |
+ |================================================================================= | 55%
+ |
+ |==================================================================================== | 57%
+ |
+ |===================================================================================== | 58%
+ |
+ |======================================================================================= | 59%
+ |
+ |=========================================================================================== | 62%
+ |
+ |============================================================================================ | 63%
+ |
+ |================================================================================================ | 65%
+ |
+ |================================================================================================== | 67%
+ |
+ |=================================================================================================== | 67%
+ |
+ |======================================================================================================= | 70%
+ |
+ |========================================================================================================== | 72%
+ |
+ |============================================================================================================== | 75%
+ |
+ |================================================================================================================= | 77%
+ |
+ |==================================================================================================================== | 79%
+ |
+ |===================================================================================================================== | 80%
+ |
+ |======================================================================================================================= | 81%
+ |
+ |========================================================================================================================= | 82%
+ |
+ |=========================================================================================================================== | 84%
+ |
+ |============================================================================================================================ | 84%
+ |
+ |================================================================================================================================ | 87%
+ |
+ |================================================================================================================================== | 89%
+ |
+ |===================================================================================================================================== | 90%
+ |
+ |========================================================================================================================================= | 93%
+ |
+ |============================================================================================================================================== | 97%
+ |
+ |===================================================================================================================================================| 100%
+
+
+## preview
+mapview(cityRace00, zcol = "P003005")
+
+
+
+
+
+
+
+
+
+
+
+
Notice how I used the zcol
argument for mapview()
to preview a specific set of data as a thematic layer on the map! These data are not normalized, but we do get a quick preview of the distribution of Asian residents in St. Louis City.
To get a preview of variables available in the get_acs()
function, we can use the load_variables()
function again. We’ll use "acs5"
for our dataset and, for this example, we’ll pull from the most recent 2019 ACS year:
census <- load_variables(year = 2019, dataset = "acs5")
+
Try searching for the table B19013
, the median household income table.
We’ll illustrate get_acs()
by using the data in table B19019
. First, we’ll download these data as a full table for all counties in Missouri:
## download
+countyIncome <- get_acs(geography = "county", year = 2019, state = 29,
+ table = "B19019", output = "wide", geometry = TRUE)
+
+
+Getting data from the 2015-2019 5-year ACS
+Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
+Loading ACS5 variables for 2019 from table B19019. To cache this dataset for faster access to ACS tables in the future, run this function with `cache_table = TRUE`. You only need to do this once per ACS dataset.
+
+
+
+ |
+ | | 0%
+ |
+ |= | 1%
+ |
+ |=== | 2%
+ |
+ |==== | 2%
+ |
+ |===== | 3%
+ |
+ |====== | 4%
+ |
+ |======= | 5%
+ |
+ |========= | 6%
+ |
+ |========== | 7%
+ |
+ |=========== | 7%
+ |
+ |============ | 8%
+ |
+ |============= | 9%
+ |
+ |============== | 9%
+ |
+ |============== | 10%
+ |
+ |=============== | 10%
+ |
+ |=============== | 11%
+ |
+ |================= | 12%
+ |
+ |================== | 12%
+ |
+ |=================== | 13%
+ |
+ |==================== | 13%
+ |
+ |==================== | 14%
+ |
+ |====================== | 15%
+ |
+ |======================= | 16%
+ |
+ |======================== | 16%
+ |
+ |========================= | 17%
+ |
+ |========================== | 18%
+ |
+ |=========================== | 18%
+ |
+ |============================ | 19%
+ |
+ |============================= | 20%
+ |
+ |============================== | 20%
+ |
+ |=============================== | 21%
+ |
+ |================================= | 23%
+ |
+ |================================== | 23%
+ |
+ |==================================== | 25%
+ |
+ |===================================== | 25%
+ |
+ |====================================== | 26%
+ |
+ |======================================= | 26%
+ |
+ |========================================= | 28%
+ |
+ |=========================================== | 29%
+ |
+ |============================================== | 31%
+ |
+ |================================================= | 33%
+ |
+ |=================================================== | 35%
+ |
+ |==================================================== | 35%
+ |
+ |===================================================== | 36%
+ |
+ |======================================================= | 38%
+ |
+ |========================================================= | 39%
+ |
+ |============================================================ | 41%
+ |
+ |============================================================= | 42%
+ |
+ |=============================================================== | 43%
+ |
+ |=================================================================== | 45%
+ |
+ |===================================================================== | 47%
+ |
+ |======================================================================= | 48%
+ |
+ |========================================================================= | 49%
+ |
+ |========================================================================== | 51%
+ |
+ |=========================================================================== | 51%
+ |
+ |============================================================================= | 52%
+ |
+ |============================================================================== | 53%
+ |
+ |=============================================================================== | 53%
+ |
+ |=============================================================================== | 54%
+ |
+ |================================================================================ | 54%
+ |
+ |================================================================================= | 55%
+ |
+ |=================================================================================== | 56%
+ |
+ |==================================================================================== | 57%
+ |
+ |====================================================================================== | 59%
+ |
+ |======================================================================================= | 59%
+ |
+ |========================================================================================= | 61%
+ |
+ |========================================================================================== | 61%
+ |
+ |=========================================================================================== | 62%
+ |
+ |============================================================================================ | 62%
+ |
+ |============================================================================================ | 63%
+ |
+ |============================================================================================= | 63%
+ |
+ |=============================================================================================== | 65%
+ |
+ |================================================================================================= | 66%
+ |
+ |=================================================================================================== | 68%
+ |
+ |===================================================================================================== | 69%
+ |
+ |======================================================================================================= | 70%
+ |
+ |========================================================================================================= | 71%
+ |
+ |=========================================================================================================== | 73%
+ |
+ |============================================================================================================= | 74%
+ |
+ |============================================================================================================== | 75%
+ |
+ |================================================================================================================= | 77%
+ |
+ |==================================================================================================================== | 79%
+ |
+ |====================================================================================================================== | 81%
+ |
+ |======================================================================================================================== | 81%
+ |
+ |=========================================================================================================================== | 84%
+ |
+ |============================================================================================================================= | 85%
+ |
+ |=============================================================================================================================== | 87%
+ |
+ |================================================================================================================================= | 88%
+ |
+ |=================================================================================================================================== | 89%
+ |
+ |===================================================================================================================================== | 90%
+ |
+ |======================================================================================================================================= | 92%
+ |
+ |========================================================================================================================================== | 94%
+ |
+ |============================================================================================================================================= | 96%
+ |
+ |================================================================================================================================================ | 98%
+ |
+ |================================================================================================================================================== | 99%
+ |
+ |===================================================================================================================================================| 100%
+
+
+## preview
+mapview(countyIncome, zcol = "B19019_001E")
+
+
+
+
+
+
+
+
+
+
+
+
Notice how we needed to specify _001E
for zcol
. That references the specific variable we want to map - variable 1 in the table’s estimate (or E
). The M
values refer to the margin of the error - we expect this estimate to be off by some amount within +/- this value.
We can also download a specific column, like the median income for one-person households (B19019_002
):
## download
+countyIncome <- get_acs(geography = "county", year = 2019, state = 29,
+ variables = "B19019_002", output = "wide",
+ geometry = TRUE)
+
+
+Getting data from the 2015-2019 5-year ACS
+Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
+
+
+## preview
+mapview(countyIncome, zcol = "B19019_002E")
+
+
+
+
+
+
+
+
+
+
+
+
@@ -359,11 +850,46 @@ Perhaps we have a range of data that we want to include. For this example, we’ll download data on median income and the proportion of women in tracts in Boone County, Missouri. We’ll download the income data with geometry = TRUE
and the sex data with geometry = FALSE
:
## download
+booneIncome <- get_acs(geography = "tract", year = 2019, state = 29,
+ county = "019", variables = "B19019_001",
+ output = "wide", geometry = TRUE) %>%
+ rename(median_income = B19019_001E) %>%
+ select(GEOID, median_income)
+
+
+Getting data from the 2015-2019 5-year ACS
+Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
+
+
+
+ |
+ | | 0%
+ |
+ |======================== | 17%
+ |
+ |===================================================================================================================================================| 100%
+
+
+## download
+booneSex <- get_acs(geography = "tract", year = 2019, state = 29,
+ county = "019", variables = c("B01001_001", "B01001_026"),
+ output = "wide") %>%
+ mutate(pct_women = B01001_026E/B01001_001E*100) %>%
+ select(GEOID, pct_women)
+
+
+Getting data from the 2015-2019 5-year ACS
+
To combine these data, we’ll use left_join()
from dplyr
. Our sf
object should always be the first object in the join (the x
data) and our non-sf data should be the second data (the y
data):
boone <- left_join(booneIncome, booneSex, by = "GEOID")
+
Three common issues arise:
@@ -381,6 +907,18 @@We can download a generalized version, which smooths out state boundaries so that the overall image is both smaller in disk size and (sometimes) easier to read. This is particularly helpful if you are making small scale maps of the entire United States. We’ll get these data at the “20m” resolution using the states()
function:
states <- states(cb = TRUE, resolution = "20m")
+
+
+
+ |
+ | | 0%
+ |
+ |================================================================================================================================ | 87%
+ |
+ |===================================================================================================================================================| 100%
+
@@ -389,6 +927,32 @@ Now, we’ll get more detailed data - all of the county boundaries for Missouri. We’ll use the counties()
function using a slightly less generalized resolution, “5m”:
moCounties <- counties(cb = TRUE, resolution = "5m")
+
+
+
+ |
+ | | 0%
+ |
+ |== | 1%
+ |
+ |========================== | 18%
+ |
+ |============================================== | 32%
+ |
+ |============================================================== | 42%
+ |
+ |======================================================================== | 49%
+ |
+ |=========================================================================== | 51%
+ |
+ |=================================================================================================== | 67%
+ |
+ |==================================================================================================== | 68%
+ |
+ |===================================================================================================================================================| 100%
+
@@ -397,13 +961,147 @@ Now, we’ll get even more detailed data - all of the tract boundaries for St. Charles County, Missouri. We’ll use the tracts()
function with cb = FALSE
by default:
stCharlesTracts <- tracts(state = 29, county = 183)
+
+
+
+ |
+ | | 0%
+ |
+ |= | 1%
+ |
+ |========== | 7%
+ |
+ |============== | 9%
+ |
+ |=============== | 10%
+ |
+ |================ | 11%
+ |
+ |================= | 11%
+ |
+ |=================== | 13%
+ |
+ |===================== | 14%
+ |
+ |====================== | 15%
+ |
+ |=========================== | 19%
+ |
+ |=============================== | 21%
+ |
+ |================================= | 22%
+ |
+ |================================= | 23%
+ |
+ |===================================== | 25%
+ |
+ |====================================== | 26%
+ |
+ |=========================================== | 29%
+ |
+ |=============================================== | 32%
+ |
+ |================================================ | 32%
+ |
+ |================================================ | 33%
+ |
+ |================================================== | 34%
+ |
+ |====================================================== | 36%
+ |
+ |======================================================== | 38%
+ |
+ |========================================================= | 39%
+ |
+ |============================================================== | 42%
+ |
+ |=============================================================== | 43%
+ |
+ |================================================================== | 45%
+ |
+ |==================================================================== | 46%
+ |
+ |====================================================================== | 48%
+ |
+ |========================================================================= | 49%
+ |
+ |============================================================================= | 52%
+ |
+ |============================================================================= | 53%
+ |
+ |================================================================================ | 54%
+ |
+ |================================================================================ | 55%
+ |
+ |================================================================================= | 55%
+ |
+ |=================================================================================== | 56%
+ |
+ |=================================================================================== | 57%
+ |
+ |==================================================================================== | 57%
+ |
+ |===================================================================================== | 58%
+ |
+ |======================================================================================== | 60%
+ |
+ |========================================================================================= | 60%
+ |
+ |========================================================================================= | 61%
+ |
+ |=========================================================================================== | 62%
+ |
+ |============================================================================================ | 62%
+ |
+ |============================================================================================== | 64%
+ |
+ |================================================================================================= | 66%
+ |
+ |=================================================================================================== | 68%
+ |
+ |====================================================================================================== | 70%
+ |
+ |========================================================================================================= | 72%
+ |
+ |=========================================================================================================== | 73%
+ |
+ |============================================================================================================= | 74%
+ |
+ |================================================================================================================= | 77%
+ |
+ |==================================================================================================================== | 79%
+ |
+ |======================================================================================================================= | 81%
+ |
+ |========================================================================================================================== | 83%
+ |
+ |============================================================================================================================= | 85%
+ |
+ |=============================================================================================================================== | 86%
+ |
+ |================================================================================================================================ | 87%
+ |
+ |=================================================================================================================================== | 89%
+ |
+ |===================================================================================================================================== | 91%
+ |
+ |========================================================================================================================================= | 93%
+ |
+ |============================================================================================================================================ | 95%
+ |
+ |================================================================================================================================================= | 99%
+ |
+ |===================================================================================================================================================| 100%
+
-knitr::opts_chunk$set(cache = FALSE)
+
+
+
+This notebook illustrates data access through both tigris
and tidycensus
as well as joins using dplyr
.
This notebook requires the following packages:
+ + + +# tidyverse packages
+library(dplyr) # data wrangling
+
+
+
+Attaching package: ‘dplyr’
+
+The following objects are masked from ‘package:stats’:
+
+ filter, lag
+
+The following objects are masked from ‘package:base’:
+
+ intersect, setdiff, setequal, union
+
+
+# spatial packages
+library(mapview) # preview geometric data
+
+
+Registered S3 method overwritten by 'htmlwidgets':
+ method from
+ print.htmlwidget tools:rstudio
+
+
+library(sf) # spatial tools
+
+
+Linking to GEOS 3.8.1, GDAL 3.2.1, PROJ 7.2.1
+
+
+library(tidycensus) # demographic data
+library(tigris) # tiger/line data
+
+
+To enable
+caching of data, set `options(tigris_use_cache = TRUE)` in your R script or .Rprofile.
+
+Attaching package: ‘tigris’
+
+The following object is masked from ‘package:tidycensus’:
+
+ fips_codes
+
+
+# other packages
+library(here) # file path management
+
+
+here() starts at /Users/prenercg/GitHub/slu-soc5650/module-2-combine-sources
+
+
+
+Before using tidycensus
, you need to install a census API key. Use the syntax below, copied into your console, to install the key you received via email.
census_api_key("KEY", install = TRUE)
+This is not a code chunk you will need in each notebook. As long as install = TRUE
, you will only have to do this once!
To get a preview of variables available in the get_decennial()
function, we can use the load_variables()
function:
census <- load_variables(year = 2000, dataset = "sf1")
+
+
+
+I find it useful to assign the output of this function to an object so that I can search through it. Try searching for the variable P0010001
, the total population of a geographic unit, in the census
object.
To download data, we can use use the get_decennial()
function to access, for example, population by state in 2000:
popStates <- get_decennial(geography = "state", year = 2000, variable = "P001001")
+
+
+
+A full list of the geographies available in tidycensus
can be found here.
Most variables in the decennial census are actually a part of a table. There are individual variables, for example, for race:
+ + + +census %>%
+ filter(concept == "P3. RACE [8]")
+
+
+We rarely want to download these one at a time. Instead, we want to download them at one time into a single data frame. The table number for these data is P003
- we take the first four characters from the name
variable.
cityRace00 <- get_decennial(geography = "tract", year = 2000, state = 29,
+ county = "510", table = "P003", output = "wide")
+
+
+
+We’ve used the FIPS codes for both Missouri (29
) and St. Louis City (29510
) here - you can find a full list of Missouri counties here.
The tidycensus
package also includes tools for downloading the geometries for these data as well. For instance, we can add geometric data to our previous call for City of St. Louis tract-level data on race by adding the geometry = TRUE
argument:
## download
+cityRace00 <- get_decennial(geography = "tract", year = 2000, state = 29,
+ county = "510", table = "P003", output = "wide",
+ geometry = TRUE)
+
+
+Getting data from the 2000 decennial Census
+Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
+Loading SF1 variables for 2000 from table P003. To cache this dataset for faster access to Census tables in the future, run this function with `cache_table = TRUE`. You only need to do this once per Census dataset.
+Using Census Summary File 1
+Using Census Summary File 1
+
+
+
+ |
+ | | 0%
+ |
+ |== | 1%
+ |
+ |=== | 2%
+ |
+ |==== | 3%
+ |
+ |===== | 3%
+ |
+ |===== | 4%
+ |
+ |====== | 4%
+ |
+ |======= | 5%
+ |
+ |======== | 5%
+ |
+ |========= | 6%
+ |
+ |========== | 7%
+ |
+ |=========== | 7%
+ |
+ |=========== | 8%
+ |
+ |============ | 8%
+ |
+ |============= | 9%
+ |
+ |============== | 9%
+ |
+ |============== | 10%
+ |
+ |=============== | 10%
+ |
+ |================ | 11%
+ |
+ |================= | 11%
+ |
+ |================= | 12%
+ |
+ |================== | 12%
+ |
+ |================== | 13%
+ |
+ |=================== | 13%
+ |
+ |==================== | 13%
+ |
+ |==================== | 14%
+ |
+ |====================== | 15%
+ |
+ |======================= | 16%
+ |
+ |======================== | 16%
+ |
+ |========================== | 17%
+ |
+ |=========================== | 19%
+ |
+ |============================= | 20%
+ |
+ |=============================== | 21%
+ |
+ |================================= | 22%
+ |
+ |================================== | 23%
+ |
+ |==================================== | 25%
+ |
+ |====================================== | 26%
+ |
+ |======================================== | 27%
+ |
+ |========================================== | 28%
+ |
+ |============================================ | 30%
+ |
+ |============================================== | 31%
+ |
+ |=============================================== | 32%
+ |
+ |================================================ | 32%
+ |
+ |================================================= | 33%
+ |
+ |=================================================== | 35%
+ |
+ |==================================================== | 36%
+ |
+ |======================================================= | 37%
+ |
+ |======================================================== | 38%
+ |
+ |========================================================== | 40%
+ |
+ |============================================================ | 41%
+ |
+ |=============================================================== | 43%
+ |
+ |=================================================================== | 45%
+ |
+ |===================================================================== | 47%
+ |
+ |====================================================================== | 48%
+ |
+ |========================================================================= | 49%
+ |
+ |========================================================================== | 50%
+ |
+ |============================================================================== | 53%
+ |
+ |================================================================================= | 55%
+ |
+ |==================================================================================== | 57%
+ |
+ |===================================================================================== | 58%
+ |
+ |======================================================================================= | 59%
+ |
+ |=========================================================================================== | 62%
+ |
+ |============================================================================================ | 63%
+ |
+ |================================================================================================ | 65%
+ |
+ |================================================================================================== | 67%
+ |
+ |=================================================================================================== | 67%
+ |
+ |======================================================================================================= | 70%
+ |
+ |========================================================================================================== | 72%
+ |
+ |============================================================================================================== | 75%
+ |
+ |================================================================================================================= | 77%
+ |
+ |==================================================================================================================== | 79%
+ |
+ |===================================================================================================================== | 80%
+ |
+ |======================================================================================================================= | 81%
+ |
+ |========================================================================================================================= | 82%
+ |
+ |=========================================================================================================================== | 84%
+ |
+ |============================================================================================================================ | 84%
+ |
+ |================================================================================================================================ | 87%
+ |
+ |================================================================================================================================== | 89%
+ |
+ |===================================================================================================================================== | 90%
+ |
+ |========================================================================================================================================= | 93%
+ |
+ |============================================================================================================================================== | 97%
+ |
+ |===================================================================================================================================================| 100%
+
+
+## preview
+mapview(cityRace00, zcol = "P003005")
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Notice how I used the zcol
argument for mapview()
to preview a specific set of data as a thematic layer on the map! These data are not normalized, but we do get a quick preview of the distribution of Asian residents in St. Louis City.
To get a preview of variables available in the get_acs()
function, we can use the load_variables()
function again. We’ll use "acs5"
for our dataset and, for this example, we’ll pull from the most recent 2019 ACS year:
census <- load_variables(year = 2019, dataset = "acs5")
+
+
+
+Try searching for the table B19013
, the median household income table.
We’ll illustrate get_acs()
by using the data in table B19019
. First, we’ll download these data as a full table for all counties in Missouri:
## download
+countyIncome <- get_acs(geography = "county", year = 2019, state = 29,
+ table = "B19019", output = "wide", geometry = TRUE)
+
+
+Getting data from the 2015-2019 5-year ACS
+Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
+Loading ACS5 variables for 2019 from table B19019. To cache this dataset for faster access to ACS tables in the future, run this function with `cache_table = TRUE`. You only need to do this once per ACS dataset.
+
+
+
+ |
+ | | 0%
+ |
+ |= | 1%
+ |
+ |=== | 2%
+ |
+ |==== | 2%
+ |
+ |===== | 3%
+ |
+ |====== | 4%
+ |
+ |======= | 5%
+ |
+ |========= | 6%
+ |
+ |========== | 7%
+ |
+ |=========== | 7%
+ |
+ |============ | 8%
+ |
+ |============= | 9%
+ |
+ |============== | 9%
+ |
+ |============== | 10%
+ |
+ |=============== | 10%
+ |
+ |=============== | 11%
+ |
+ |================= | 12%
+ |
+ |================== | 12%
+ |
+ |=================== | 13%
+ |
+ |==================== | 13%
+ |
+ |==================== | 14%
+ |
+ |====================== | 15%
+ |
+ |======================= | 16%
+ |
+ |======================== | 16%
+ |
+ |========================= | 17%
+ |
+ |========================== | 18%
+ |
+ |=========================== | 18%
+ |
+ |============================ | 19%
+ |
+ |============================= | 20%
+ |
+ |============================== | 20%
+ |
+ |=============================== | 21%
+ |
+ |================================= | 23%
+ |
+ |================================== | 23%
+ |
+ |==================================== | 25%
+ |
+ |===================================== | 25%
+ |
+ |====================================== | 26%
+ |
+ |======================================= | 26%
+ |
+ |========================================= | 28%
+ |
+ |=========================================== | 29%
+ |
+ |============================================== | 31%
+ |
+ |================================================= | 33%
+ |
+ |=================================================== | 35%
+ |
+ |==================================================== | 35%
+ |
+ |===================================================== | 36%
+ |
+ |======================================================= | 38%
+ |
+ |========================================================= | 39%
+ |
+ |============================================================ | 41%
+ |
+ |============================================================= | 42%
+ |
+ |=============================================================== | 43%
+ |
+ |=================================================================== | 45%
+ |
+ |===================================================================== | 47%
+ |
+ |======================================================================= | 48%
+ |
+ |========================================================================= | 49%
+ |
+ |========================================================================== | 51%
+ |
+ |=========================================================================== | 51%
+ |
+ |============================================================================= | 52%
+ |
+ |============================================================================== | 53%
+ |
+ |=============================================================================== | 53%
+ |
+ |=============================================================================== | 54%
+ |
+ |================================================================================ | 54%
+ |
+ |================================================================================= | 55%
+ |
+ |=================================================================================== | 56%
+ |
+ |==================================================================================== | 57%
+ |
+ |====================================================================================== | 59%
+ |
+ |======================================================================================= | 59%
+ |
+ |========================================================================================= | 61%
+ |
+ |========================================================================================== | 61%
+ |
+ |=========================================================================================== | 62%
+ |
+ |============================================================================================ | 62%
+ |
+ |============================================================================================ | 63%
+ |
+ |============================================================================================= | 63%
+ |
+ |=============================================================================================== | 65%
+ |
+ |================================================================================================= | 66%
+ |
+ |=================================================================================================== | 68%
+ |
+ |===================================================================================================== | 69%
+ |
+ |======================================================================================================= | 70%
+ |
+ |========================================================================================================= | 71%
+ |
+ |=========================================================================================================== | 73%
+ |
+ |============================================================================================================= | 74%
+ |
+ |============================================================================================================== | 75%
+ |
+ |================================================================================================================= | 77%
+ |
+ |==================================================================================================================== | 79%
+ |
+ |====================================================================================================================== | 81%
+ |
+ |======================================================================================================================== | 81%
+ |
+ |=========================================================================================================================== | 84%
+ |
+ |============================================================================================================================= | 85%
+ |
+ |=============================================================================================================================== | 87%
+ |
+ |================================================================================================================================= | 88%
+ |
+ |=================================================================================================================================== | 89%
+ |
+ |===================================================================================================================================== | 90%
+ |
+ |======================================================================================================================================= | 92%
+ |
+ |========================================================================================================================================== | 94%
+ |
+ |============================================================================================================================================= | 96%
+ |
+ |================================================================================================================================================ | 98%
+ |
+ |================================================================================================================================================== | 99%
+ |
+ |===================================================================================================================================================| 100%
+
+
+## preview
+mapview(countyIncome, zcol = "B19019_001E")
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Notice how we needed to specify _001E
for zcol
. That references the specific variable we want to map - variable 1 in the table’s estimate (or E
). The M
values refer to the margin of the error - we expect this estimate to be off by some amount within +/- this value.
We can also download a specific column, like the median income for one-person households (B19019_002
):
## download
+countyIncome <- get_acs(geography = "county", year = 2019, state = 29,
+ variables = "B19019_002", output = "wide",
+ geometry = TRUE)
+
+
+Getting data from the 2015-2019 5-year ACS
+Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
+
+
+## preview
+mapview(countyIncome, zcol = "B19019_002E")
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Perhaps we have a range of data that we want to include. For this example, we’ll download data on median income and the proportion of women in tracts in Boone County, Missouri. We’ll download the income data with geometry = TRUE
and the sex data with geometry = FALSE
:
## download
+booneIncome <- get_acs(geography = "tract", year = 2019, state = 29,
+ county = "019", variables = "B19019_001",
+ output = "wide", geometry = TRUE) %>%
+ rename(median_income = B19019_001E) %>%
+ select(GEOID, median_income)
+
+
+Getting data from the 2015-2019 5-year ACS
+Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
+
+
+
+ |
+ | | 0%
+ |
+ |======================== | 17%
+ |
+ |===================================================================================================================================================| 100%
+
+
+## download
+booneSex <- get_acs(geography = "tract", year = 2019, state = 29,
+ county = "019", variables = c("B01001_001", "B01001_026"),
+ output = "wide") %>%
+ mutate(pct_women = B01001_026E/B01001_001E*100) %>%
+ select(GEOID, pct_women)
+
+
+Getting data from the 2015-2019 5-year ACS
+
+
+
+To combine these data, we’ll use left_join()
from dplyr
. Our sf
object should always be the first object in the join (the x
data) and our non-sf data should be the second data (the y
data):
boone <- left_join(booneIncome, booneSex, by = "GEOID")
+
+
+
+Three common issues arise:
+by = c("GEOID" = "geoid")
booneIncome <- mutate(GEOID = as.numeric(GEOID))
sf
objects: st_geometry(booneSEX) <- NULL
To get data from the TIGER/line database, we can use the tigris
package. You can see a full list of the data available here.
We can download a generalized version, which smooths out state boundaries so that the overall image is both smaller in disk size and (sometimes) easier to read. This is particularly helpful if you are making small scale maps of the entire United States. We’ll get these data at the “20m” resolution using the states()
function:
states <- states(cb = TRUE, resolution = "20m")
+
+
+
+ |
+ | | 0%
+ |
+ |================================================================================================================================ | 87%
+ |
+ |===================================================================================================================================================| 100%
+
+
+
+Now, we’ll get more detailed data - all of the county boundaries for Missouri. We’ll use the counties()
function using a slightly less generalized resolution, “5m”:
moCounties <- counties(cb = TRUE, resolution = "5m")
+
+
+
+ |
+ | | 0%
+ |
+ |== | 1%
+ |
+ |========================== | 18%
+ |
+ |============================================== | 32%
+ |
+ |============================================================== | 42%
+ |
+ |======================================================================== | 49%
+ |
+ |=========================================================================== | 51%
+ |
+ |=================================================================================================== | 67%
+ |
+ |==================================================================================================== | 68%
+ |
+ |===================================================================================================================================================| 100%
+
+
+
+Now, we’ll get even more detailed data - all of the tract boundaries for St. Charles County, Missouri. We’ll use the tracts()
function with cb = FALSE
by default:
stCharlesTracts <- tracts(state = 29, county = 183)
+
+
+
+ |
+ | | 0%
+ |
+ |= | 1%
+ |
+ |========== | 7%
+ |
+ |============== | 9%
+ |
+ |=============== | 10%
+ |
+ |================ | 11%
+ |
+ |================= | 11%
+ |
+ |=================== | 13%
+ |
+ |===================== | 14%
+ |
+ |====================== | 15%
+ |
+ |=========================== | 19%
+ |
+ |=============================== | 21%
+ |
+ |================================= | 22%
+ |
+ |================================= | 23%
+ |
+ |===================================== | 25%
+ |
+ |====================================== | 26%
+ |
+ |=========================================== | 29%
+ |
+ |=============================================== | 32%
+ |
+ |================================================ | 32%
+ |
+ |================================================ | 33%
+ |
+ |================================================== | 34%
+ |
+ |====================================================== | 36%
+ |
+ |======================================================== | 38%
+ |
+ |========================================================= | 39%
+ |
+ |============================================================== | 42%
+ |
+ |=============================================================== | 43%
+ |
+ |================================================================== | 45%
+ |
+ |==================================================================== | 46%
+ |
+ |====================================================================== | 48%
+ |
+ |========================================================================= | 49%
+ |
+ |============================================================================= | 52%
+ |
+ |============================================================================= | 53%
+ |
+ |================================================================================ | 54%
+ |
+ |================================================================================ | 55%
+ |
+ |================================================================================= | 55%
+ |
+ |=================================================================================== | 56%
+ |
+ |=================================================================================== | 57%
+ |
+ |==================================================================================== | 57%
+ |
+ |===================================================================================== | 58%
+ |
+ |======================================================================================== | 60%
+ |
+ |========================================================================================= | 60%
+ |
+ |========================================================================================= | 61%
+ |
+ |=========================================================================================== | 62%
+ |
+ |============================================================================================ | 62%
+ |
+ |============================================================================================== | 64%
+ |
+ |================================================================================================= | 66%
+ |
+ |=================================================================================================== | 68%
+ |
+ |====================================================================================================== | 70%
+ |
+ |========================================================================================================= | 72%
+ |
+ |=========================================================================================================== | 73%
+ |
+ |============================================================================================================= | 74%
+ |
+ |================================================================================================================= | 77%
+ |
+ |==================================================================================================================== | 79%
+ |
+ |======================================================================================================================= | 81%
+ |
+ |========================================================================================================================== | 83%
+ |
+ |============================================================================================================================= | 85%
+ |
+ |=============================================================================================================================== | 86%
+ |
+ |================================================================================================================================ | 87%
+ |
+ |=================================================================================================================================== | 89%
+ |
+ |===================================================================================================================================== | 91%
+ |
+ |========================================================================================================================================= | 93%
+ |
+ |============================================================================================================================================ | 95%
+ |
+ |================================================================================================================================================= | 99%
+ |
+ |===================================================================================================================================================| 100%
+
+
+
+
+knitr::opts_chunk$set(cache = FALSE)
-
-
-
-This notebook illustrates data access through both tigris
and tidycensus
as well as joins using dplyr
.
This notebook requires the following packages:
- - - -# tidyverse packages
-library(dplyr) # data wrangling
-
-# spatial packages
-library(mapview) # preview geometric data
-library(sf) # spatial tools
-library(tidycensus) # demographic data
-library(tigris) # tiger/line data
-
-# other packages
-library(here) # file path management
-
-
-
-Before using tidycensus
, you need to install a census API key. Use the syntax below, copied into your console, to install the key you received via email.
census_api_key("KEY", install = TRUE)
-This is not a code chunk you will need in each notebook. As long as install = TRUE
, you will only have to do this once!
To get a preview of variables available in the get_decennial()
function, we can use the load_variables()
function:
census <- load_variables(year = 2000, dataset = "sf1")
-
-
-
-I find it useful to assign the output of this function to an object so that I can search through it. Try searching for the variable P0010001
, the total population of a geographic unit, in the census
object.
To download data, we can use use the get_decennial()
function to access, for example, population by state in 2000:
popStates <- get_decennial(geography = "state", year = 2000, variable = "P001001")
-
-
-
-A full list of the geographies available in tidycensus
can be found here.
Most variables in the decennial census are actually a part of a table. There are individual variables, for example, for race:
- - - - -We rarely want to download these one at a time. Instead, we want to download them at one time into a single data frame. The table number for these data is P003
- we take the first four characters from the name
variable.
We’ve used the FIPS codes for both Missouri (29
) and St. Louis City (29510
) here - you can find a full list of Missouri counties here.
The tidycensus
package also includes tools for downloading the geometries for these data as well. For instance, we can add geometric data to our previous call for City of St. Louis tract-level data on race by adding the geometry = TRUE
argument:
Notice how I used the zcol
argument for mapview()
to preview a specific set of data as a thematic layer on the map! These data are not normalized, but we do get a quick preview of the distribution of Asian residents in St. Louis City.
To get a preview of variables available in the get_acs()
function, we can use the load_variables()
function again. We’ll use "acs5"
for our dataset and, for this example, we’ll pull from the most recent 2019 ACS year:
Try searching for the table B19013
, the median household income table.
We’ll illustrate get_acs()
by using the data in table B19019
. First, we’ll download these data as a full table for all counties in Missouri:
Notice how we needed to specify _001E
for zcol
. That references the specific variable we want to map - variable 1 in the table’s estimate (or E
). The M
values refer to the margin of the error - we expect this estimate to be off by some amount within +/- this value.
We can also download a specific column, like the median income for one-person households (B19019_002
):
Perhaps we have a range of data that we want to include. For this example, we’ll download data on median income and the proportion of women in tracts in Boone County, Missouri. We’ll download the income data with geometry = TRUE
and the sex data with geometry = FALSE
:
To combine these data, we’ll use left_join()
from dplyr
. Our sf
object should always be the first object in the join (the x
data) and our non-sf data should be the second data (the y
data):
Three common issues arise:
-by = c("GEOID" = "geoid")
booneIncome <- mutate(GEOID = as.numeric(GEOID))
sf
objects: st_geometry(booneSEX) <- NULL
To get data from the TIGER/line database, we can use the tigris
package. You can see a full list of the data available here.
We can download a generalized version, which smooths out state boundaries so that the overall image is both smaller in disk size and (sometimes) easier to read. This is particularly helpful if you are making small scale maps of the entire United States. We’ll get these data at the “20m” resolution using the states()
function:
Now, we’ll get more detailed data - all of the county boundaries for Missouri. We’ll use the counties()
function using a slightly less generalized resolution, “5m”:
Now, we’ll get even more detailed data - all of the tract boundaries for St. Charles County, Missouri. We’ll use the tracts()
function with cb = FALSE
by default:
This notebook illustrates data access through both tigris
and tidycensus
as well as joins using dplyr
.
This notebook requires the following packages:
- - - -# tidyverse packages
-library(dplyr) # data wrangling
-
-# spatial packages
-library(mapview) # preview geometric data
-library(sf) # spatial tools
-library(tidycensus) # demographic data
-library(tigris) # tiger/line data
-
-# other packages
-library(here) # file path management
-
-
-
-Before using tidycensus
, you need to install a census API key. Use the syntax below, copied into your console, to install the key you received via email.
census_api_key("KEY", install = TRUE)
-This is not a code chunk you will need in each notebook. As long as install = TRUE
, you will only have to do this once!
To get a preview of variables available in the get_decennial()
function, we can use the load_variables()
function:
I find it useful to assign the output of this function to an object so that I can search through it. Try searching for the variable P0010001
, the total population of a geographic unit, in the census
object.
To download data, we can use use the get_decennial()
function to access, for example, population by state in 2000:
A full list of the geographies available in tidycensus
can be found here.
Most variables in the decennial census are actually a part of a table. There are individual variables, for example, for race:
- - - -census %>%
- filter(concept == "P3. RACE [8]")
-
-
-
-We rarely want to download these one at a time. Instead, we want to download them at one time into a single data frame. The table number for these data is P003
- we take the first four characters from the name
variable.
We’ve used the FIPS codes for both Missouri (29
) and St. Louis City (29510
) here - you can find a full list of Missouri counties here.
The tidycensus
package also includes tools for downloading the geometries for these data as well. For instance, we can add geometric data to our previous call for City of St. Louis tract-level data on race by adding the geometry = TRUE
argument:
Notice how I used the zcol
argument for mapview()
to preview a specific set of data as a thematic layer on the map! These data are not normalized, but we do get a quick preview of the distribution of Asian residents in St. Louis City.
To get a preview of variables available in the get_acs()
function, we can use the load_variables()
function again. We’ll use "acs5"
for our dataset and, for this example, we’ll pull from the most recent 2019 ACS year:
Try searching for the table B19013
, the median household income table.
We’ll illustrate get_acs()
by using the data in table B19019
. First, we’ll download these data as a full table for all counties in Missouri:
## download
-
-
-## preview
-mapview(countyIncome, zcol = "")
-
-
-
-Notice how we needed to specify _001E
for zcol
. That references the specific variable we want to map - variable 1 in the table’s estimate (or E
). The M
values refer to the margin of the error - we expect this estimate to be off by some amount within +/- this value.
We can also download a specific column, like the median income for one-person households (B19019_002
):
## download
-
-
-## preview
-mapview(countyIncome, zcol = "")
-
-
-
-Perhaps we have a range of data that we want to include. For this example, we’ll download data on median income and the proportion of women in tracts in Boone County, Missouri. We’ll download the income data with geometry = TRUE
and the sex data with geometry = FALSE
:
## download
-booneIncome <- get_acs(geography = "tract", year = 2019, state = 29,
- county = "019", variables = "B19019_001",
- output = "wide", geometry = TRUE) %>%
- rename(median_income = B19019_001E) %>%
- select(GEOID, median_income)
-
-## download
-booneSex <- get_acs(geography = "tract", year = 2019, state = 29,
- county = "019", variables = c("B01001_001", "B01001_026"),
- output = "wide") %>%
- mutate(pct_women = B01001_026E/B01001_001E*100) %>%
- select(GEOID, pct_women)
-
-
-
-To combine these data, we’ll use left_join()
from dplyr
. Our sf
object should always be the first object in the join (the x
data) and our non-sf data should be the second data (the y
data):
Three common issues arise:
-by = c("GEOID" = "geoid")
booneIncome <- mutate(GEOID = as.numeric(GEOID))
sf
objects: st_geometry(booneSEX) <- NULL
To get data from the TIGER/line database, we can use the tigris
package. You can see a full list of the data available here.
We can download a generalized version, which smooths out state boundaries so that the overall image is both smaller in disk size and (sometimes) easier to read. This is particularly helpful if you are making small scale maps of the entire United States. We’ll get these data at the “20m” resolution using the states()
function:
Now, we’ll get more detailed data - all of the county boundaries for Missouri. We’ll use the counties()
function using a slightly less generalized resolution, “5m”:
Now, we’ll get even more detailed data - all of the tract boundaries for St. Charles County, Missouri. We’ll use the tracts()
function with cb = FALSE
by default: