Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial political participation metric (voter turnout) updates for 2016 with questions (added as comments in the markdown file) #433

Draft
wants to merge 1 commit into
base: version2025
Choose a base branch
from

Conversation

ridhi96
Copy link

@ridhi96 ridhi96 commented Dec 5, 2024

In the 2020 update, NJ and KY were flagged as having data quality issues. I am unsure how to interpret the VEST data and figure out which states for 2016 don't have reliable data. I have more concerns around the total votes from the presidential election calculated with the VEST data vs the total votes reported on Wikipedia.

@ridhi96 ridhi96 changed the title Initial updates for 2016 with questions (added as comments in the markdown file) Initial political participation metric (voter turnout) updates for 2016 with questions (added as comments in the markdown file) Dec 5, 2024
@ridhi96 ridhi96 assigned ridhi96 and jwalsh28 and unassigned jwalsh28 and ridhi96 Dec 5, 2024
@ridhi96 ridhi96 requested a review from jwalsh28 December 5, 2024 08:37
@ridhi96 ridhi96 marked this pull request as draft December 11, 2024 16:28
Copy link
Collaborator

@jwalsh28 jwalsh28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ridhi96 Overall I think this update is mostly there. A few major items:

  1. I put a recommendation for the quality issue (allocation). For 2016 I don't believe there were any major allocation issues like there were in 2020 due to COVID. Double-check this. My suggestion is to set all allocation quality to 1.
  2. MIT vs VEST comparison - there is a comment from the 2020 code that suggests flagging counties in the top quartile of difference from the MIT counts. If time permits I think this would be a good addition to our quality method but you would have to make the update to both years.
  3. Make sure when you read this file out you update the years to 2016 and please run the evalution function.


```{r}

### TO DELETE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this was just a test - please delete during this round of review.

The total number of votes reported in this data is 136,519,876. This is 149,361 votes less than [the national vote total in 2016 of 136,669,237](https://en.wikipedia.org/wiki/2016_United_States_presidential_election#Results_by_state).
```{r compare-votes, message = FALSE}
# total number of votes is 136,519,876
# QUESTION (Ridhi): The difference in votes is concerning
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a quick read through VEST's technical article on how they put this data together. It seems like some discrepancies are expected due to data privacy protection from certain states:
" Sometimes discrepancies are by design. States may censor small vote tallies to protect voters’ confidentiality and the secret ballot. The North Carolina State Board of Elections adds a small amount of noise to their state’s precinct results per state law whenever a candidate receives one hundred percent of the vote within a reporting unit and voters’ choices would be revealed."


```{r save-data}
joined_data %>%
mutate(year = 2020) %>%
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be updated for 2016


```


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior to saving the data please test this dataframe using the final evaluation function

Compare the VEST precinct-level election returns used in our interpolation to the MIT Election Lab precinct-level returns

```{r}
# QUESTION (Ridhi): Should I remove this section?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think this is fine to delete


if (!file.exists(mit_data)) {

download.file(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble with the way R is downloading this file. The CSV that results from this download leads to a fatal error tha makes R abort when I use read_csv. Not sure if this is just an issue with my version of R


## 2. The process used to create VEST data

QUESTION (Ridhi): There were a lot of precincts/counties where votes were apportioned using weights, boundary conflicts etc. I am not sure how to take this into account for quality here. Below text is for the 2020 metric:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked around at the information VEST provides online and found nothing about significant interpolation happening in 2016 the way they did for the 2020 data due to COVID. Maybe double-check VEST resources to see if you can find anything but I think allocation_quality in 2016 data would all be 1.

st_read(tempfile) %>%
dplyr::mutate(total_votes = across(starts_with("G16PRE")) %>% rowSums) %>%
# updated code: added "NAME", "COUNTYFP", "STATEFP" to reflect columns in 2016 data
# QUESTION (Ridhi): DO I NEED AN IDENTIFYING COLUMN FOR EACH STATE FILE?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you do, it looks like the identifying columns are appended later in the code using spaital joins.

# # Kalawao County, HI is missing
# quantile(test_result$pct_diff, na.rm = TRUE)
#
# # XX Maybe identify counties in the top quartile of percentage difference and flag each place in those counties for quality? Will be annoying because places cross county lines
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be one major area for improvement if you have time. I believe the idea here is to compare MIT count-level votes with VEST aggregates and take the top quartile (though play around with what makes sense) of difference in terms of counts. Then identifying which places are in those counties and give them a 3 for quality. This would be an addition to our quality method. Again, I don't think this is crucial but if you have the time it could be a nice change.

# Total precinct area is 2,924,296,049,648 m^2
total_precinct_area <- precinct_sf_area %>%
summarize(total_precinct = sum(precinct_original_area)) %>%
# QUESTION (Ridhi): What is the purpose of `total_precinct_meters` when area is already in meters squared unit?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this step converts the area calculation into square feet (international foot definition (i.e., 1 foot = 0.3048 meter exactly)). I'm not sure what the purpose is but maybe update that variable name to reflect feet instead of meters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants