New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Initial political participation metric (voter turnout) updates for 2016 with questions (added as comments in the markdown file) #433

Draft

ridhi96 wants to merge 1 commit into version2025 from iss408

ridhi96 commented Dec 5, 2024

In the 2020 update, NJ and KY were flagged as having data quality issues. I am unsure how to interpret the VEST data and figure out which states for 2016 don't have reliable data. I have more concerns around the total votes from the presidential election calculated with the VEST data vs the total votes reported on Wikipedia.


          added questions

032e1ca

ridhi96 changed the title ~~Initial updates for 2016 with questions (added as comments in the markdown file)~~ Initial political participation metric (voter turnout) updates for 2016 with questions (added as comments in the markdown file)

ridhi96 assigned ridhi96 and jwalsh28 and unassigned jwalsh28 and ridhi96

ridhi96 requested a review from jwalsh28

December 5, 2024 08:37

ridhi96 unassigned jwalsh28

ridhi96 marked this pull request as draft

December 11, 2024 16:28

cdsolari assigned jwalsh28

jwalsh28 approved these changes

View reviewed changes

Collaborator

jwalsh28 left a comment

@ridhi96 Overall I think this update is mostly there. A few major items:

I put a recommendation for the quality issue (allocation). For 2016 I don't believe there were any major allocation issues like there were in 2020 due to COVID. Double-check this. My suggestion is to set all allocation quality to 1.
MIT vs VEST comparison - there is a comment from the 2020 code that suggests flagging counties in the top quartile of difference from the MIT counts. If time permits I think this would be a good addition to our quality method but you would have to make the update to both years.
Make sure when you read this file out you update the years to 2016 and please run the evalution function.

05_local-governance/voter-turnout/voter-turnout-city-2016.Rmd


		```{r}

		### TO DELETE

Collaborator

jwalsh28 Jan 16, 2025

I'm assuming this was just a test - please delete during this round of review.

05_local-governance/voter-turnout/voter-turnout-city-2016.Rmd

+              The total number of votes reported in this data is 136,519,876. This is 149,361 votes less than [the national vote total in 2016 of 136,669,237](https://en.wikipedia.org/wiki/2016_United_States_presidential_election#Results_by_state).
+              ```{r compare-votes, message = FALSE}
+              # total number of votes is 136,519,876
+              # QUESTION (Ridhi): The difference in votes is concerning

Collaborator

jwalsh28 Jan 16, 2025

Took a quick read through VEST's technical article on how they put this data together. It seems like some discrepancies are expected due to data privacy protection from certain states:
" Sometimes discrepancies are by design. States may censor small vote tallies to protect voters’ confidentiality and the secret ballot. The North Carolina State Board of Elections adds a small amount of noise to their state’s precinct results per state law whenever a candidate receives one hundred percent of the vote within a reporting unit and voters’ choices would be revealed."

05_local-governance/voter-turnout/voter-turnout-city-2016.Rmd

+              ```{r save-data}
+              joined_data %>%
+                mutate(year = 2020) %>%

Collaborator

jwalsh28 Jan 23, 2025

Needs to be updated for 2016

05_local-governance/voter-turnout/voter-turnout-city-2016.Rmd


		```

Collaborator

jwalsh28 Jan 23, 2025

Prior to saving the data please test this dataframe using the final evaluation function

05_local-governance/voter-turnout/voter-turnout-city-2016.Rmd

+              Compare the VEST precinct-level election returns used in our interpolation to the MIT Election Lab precinct-level returns
+              ```{r}
+              # QUESTION (Ridhi): Should I remove this section?

Collaborator

jwalsh28 Jan 23, 2025

Yes I think this is fine to delete

05_local-governance/voter-turnout/voter-turnout-city-2016.Rmd


		if (!file.exists(mit_data)) {

		download.file(

Collaborator

jwalsh28 Jan 23, 2025

I'm having trouble with the way R is downloading this file. The CSV that results from this download leads to a fatal error tha makes R abort when I use read_csv. Not sure if this is just an issue with my version of R

05_local-governance/voter-turnout/voter-turnout-city-2016.Rmd


		## 2. The process used to create VEST data

		QUESTION (Ridhi): There were a lot of precincts/counties where votes were apportioned using weights, boundary conflicts etc. I am not sure how to take this into account for quality here. Below text is for the 2020 metric:

Collaborator

jwalsh28 Jan 23, 2025

I looked around at the information VEST provides online and found nothing about significant interpolation happening in 2016 the way they did for the 2020 data due to COVID. Maybe double-check VEST resources to see if you can find anything but I think allocation_quality in 2016 data would all be 1.

05_local-governance/voter-turnout/voter-turnout-city-2016.Rmd

+                st_read(tempfile) %>%
+                  dplyr::mutate(total_votes = across(starts_with("G16PRE")) %>% rowSums) %>%
+                  # updated code: added "NAME", "COUNTYFP", "STATEFP" to reflect columns in 2016 data
+                  # QUESTION (Ridhi): DO I NEED AN IDENTIFYING COLUMN FOR EACH STATE FILE?

Collaborator

jwalsh28 Jan 23, 2025

I don't think you do, it looks like the identifying columns are appended later in the code using spaital joins.

05_local-governance/voter-turnout/voter-turnout-city-2016.Rmd

+              # # Kalawao County, HI is missing
+              # quantile(test_result$pct_diff, na.rm = TRUE)
+              #
+              # # XX Maybe identify counties in the top quartile of percentage difference and flag each place in those counties for quality? Will be annoying because places cross county lines

Collaborator

jwalsh28 Jan 23, 2025

This would be one major area for improvement if you have time. I believe the idea here is to compare MIT count-level votes with VEST aggregates and take the top quartile (though play around with what makes sense) of difference in terms of counts. Then identifying which places are in those counties and give them a 3 for quality. This would be an addition to our quality method. Again, I don't think this is crucial but if you have the time it could be a nice change.

05_local-governance/voter-turnout/voter-turnout-city-2016.Rmd

+              # Total precinct area is 2,924,296,049,648 m^2
+              total_precinct_area <- precinct_sf_area %>%
+                summarize(total_precinct = sum(precinct_original_area)) %>%
+                # QUESTION (Ridhi): What is the purpose of `total_precinct_meters` when area is already in meters squared unit?

Collaborator

jwalsh28 Jan 23, 2025

It looks like this step converts the area calculation into square feet (international foot definition (i.e., 1 foot = 0.3048 meter exactly)). I'm not sure what the purpose is but maybe update that variable name to reflect feet instead of meters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet