This project contains scripts which were used to create environment scores for research participants in the Penn BBL. After importing a Penn BBL address data set, where each row contains the permanent and local address of a research subject for a given point in time, environment risk scores were calculated using the following procedure:
The steps were completed separately for permanent and local addresses
- Link a given address to lattitude and longitude coordinates using the geo function from the tidygeocoder package. Remove any rows which failed to link an address to a lattitude/longitude pair (sometimes people will only provide a permanent address without the local address, so there's no address to link to geographical coordinates). Remove an address if 2 or more fields from the following list are left blank: address, city, state, and zip code. This rule is implemented to counteract how "aggressive" the geo function is - if you only provided the letters "PA" for Pennsylvania, the geo function will provide a guess as to the desired coordinates.
- Use the call_geolocator_latlon function from the tigris package to link each lattitude/longitude coordinate pair to the Census Block from the 2010 Census (Census blocks haven't changed since 2010). If the same address appears multiple times within the same year, remove the repeats. This step is important to ensure that there aren't duplicate rows when we merge the Census data with the addresses.
- After obtaining the Census Block and year for every viable address in the original data set, link the Census block and year information to data from the 5-year ACS survey using the get_acs function in the tidycensus package. Since the 5-year ACS survey data is available from 2013-2019, the data used is centered around the provided year to the greatest extent possible. For instance, if a person submitted their address in 2015, the 2013-2017 ACS survey is employed. However, if an address is from 2010, the 2009-2013 data set is used, since that is the earliest data available. After linking each viable address to ACS survey data, an environment score is calculated.