-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial political participation metric (voter turnout) updates for 2016 with questions (added as comments in the markdown file) #433
base: version2025
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ridhi96 Overall I think this update is mostly there. A few major items:
- I put a recommendation for the quality issue (allocation). For 2016 I don't believe there were any major allocation issues like there were in 2020 due to COVID. Double-check this. My suggestion is to set all allocation quality to 1.
- MIT vs VEST comparison - there is a comment from the 2020 code that suggests flagging counties in the top quartile of difference from the MIT counts. If time permits I think this would be a good addition to our quality method but you would have to make the update to both years.
- Make sure when you read this file out you update the years to 2016 and please run the evalution function.
|
||
```{r} | ||
|
||
### TO DELETE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming this was just a test - please delete during this round of review.
The total number of votes reported in this data is 136,519,876. This is 149,361 votes less than [the national vote total in 2016 of 136,669,237](https://en.wikipedia.org/wiki/2016_United_States_presidential_election#Results_by_state). | ||
```{r compare-votes, message = FALSE} | ||
# total number of votes is 136,519,876 | ||
# QUESTION (Ridhi): The difference in votes is concerning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a quick read through VEST's technical article on how they put this data together. It seems like some discrepancies are expected due to data privacy protection from certain states:
" Sometimes discrepancies are by design. States may censor small vote tallies to protect voters’ confidentiality and the secret ballot. The North Carolina State Board of Elections adds a small amount of noise to their state’s precinct results per state law whenever a candidate receives one hundred percent of the vote within a reporting unit and voters’ choices would be revealed."
|
||
```{r save-data} | ||
joined_data %>% | ||
mutate(year = 2020) %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be updated for 2016
|
||
``` | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prior to saving the data please test this dataframe using the final evaluation function
Compare the VEST precinct-level election returns used in our interpolation to the MIT Election Lab precinct-level returns | ||
|
||
```{r} | ||
# QUESTION (Ridhi): Should I remove this section? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think this is fine to delete
|
||
if (!file.exists(mit_data)) { | ||
|
||
download.file( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having trouble with the way R is downloading this file. The CSV that results from this download leads to a fatal error tha makes R abort when I use read_csv
. Not sure if this is just an issue with my version of R
|
||
## 2. The process used to create VEST data | ||
|
||
QUESTION (Ridhi): There were a lot of precincts/counties where votes were apportioned using weights, boundary conflicts etc. I am not sure how to take this into account for quality here. Below text is for the 2020 metric: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked around at the information VEST provides online and found nothing about significant interpolation happening in 2016 the way they did for the 2020 data due to COVID. Maybe double-check VEST resources to see if you can find anything but I think allocation_quality in 2016 data would all be 1.
st_read(tempfile) %>% | ||
dplyr::mutate(total_votes = across(starts_with("G16PRE")) %>% rowSums) %>% | ||
# updated code: added "NAME", "COUNTYFP", "STATEFP" to reflect columns in 2016 data | ||
# QUESTION (Ridhi): DO I NEED AN IDENTIFYING COLUMN FOR EACH STATE FILE? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you do, it looks like the identifying columns are appended later in the code using spaital joins.
# # Kalawao County, HI is missing | ||
# quantile(test_result$pct_diff, na.rm = TRUE) | ||
# | ||
# # XX Maybe identify counties in the top quartile of percentage difference and flag each place in those counties for quality? Will be annoying because places cross county lines |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be one major area for improvement if you have time. I believe the idea here is to compare MIT count-level votes with VEST aggregates and take the top quartile (though play around with what makes sense) of difference in terms of counts. Then identifying which places are in those counties and give them a 3 for quality. This would be an addition to our quality method. Again, I don't think this is crucial but if you have the time it could be a nice change.
# Total precinct area is 2,924,296,049,648 m^2 | ||
total_precinct_area <- precinct_sf_area %>% | ||
summarize(total_precinct = sum(precinct_original_area)) %>% | ||
# QUESTION (Ridhi): What is the purpose of `total_precinct_meters` when area is already in meters squared unit? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this step converts the area calculation into square feet (international foot definition (i.e., 1 foot = 0.3048 meter exactly)). I'm not sure what the purpose is but maybe update that variable name to reflect feet instead of meters.
In the 2020 update, NJ and KY were flagged as having data quality issues. I am unsure how to interpret the VEST data and figure out which states for 2016 don't have reliable data. I have more concerns around the total votes from the presidential election calculated with the VEST data vs the total votes reported on Wikipedia.