Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Colorado Governor and US Senate district is 000, not STATEWIDE #5

Open
NickCrews opened this issue Mar 4, 2024 · 2 comments
Open

Comments

@NickCrews
Copy link

NickCrews commented Mar 4, 2024

The rest of the states use the value STATEWIDE to encode a statewide election.

@NickCrews NickCrews changed the title Colorado Governors district is 000, not STATEWIDE Colorado Governor and US Senate district is 000, not STATEWIDE Mar 4, 2024
@sbaltzmit
Copy link
Contributor

Thanks for the feedback! We haven't yet standardized the method for representing that an office is not district-based across states in the 2022 precinct data. Note that it should be standardized across offices/candidates/precincts within a state, but not yet across the states. In previous years we used the designator STATEWIDE for statewide offices, and null for non-statewide non-district-based offices, but I've come to believe that this convention is more confusing than helpful, so for the 2022 data my intention is to make it always null for all non-district-based offices in every state (and I agree that 0, which I think is just a quirk of how we cleaned Colorado in particular this year, and in fact I think is likely a data conversion error from inadvertendly coercing an empty string into an integer, is a somewhat odd way to designate the district not existing). Our cross-state standardization effort, which will culminate in migrating the full data to the Harvard Dataverse, is on my schedule for roughly April 1 through May 31, so by the end of May hopefully all non-district-based offices will have an empty district field. My personal read at the moment is that this issue in Colorado has very little risk of really screwing anything up because it is the same for every row within the office (right?), is in an office where the district field should just be ignored, and is not misleadingly the real value of an actual numbered district, so I am going to make a note to put it at the front of the queue when I start checking the standardization across states next month rather than bumping anything else down the priorities list to take on district field standardization right now. But if you disagree let me know and I'm happy to consider bumping it up in the order of priorities if there's a real risk it is messing up anyone's analysis. And if I don't follow up in this issue by the end of May please feel free to ping me.

@NickCrews
Copy link
Author

First, thanks for all the work on this dataset. There is a huge need for this sort of thing.

In previous years we used the designator STATEWIDE for statewide offices, and null for non-statewide non-district-based offices, but I've come to believe that this convention is more confusing than helpful

I think this makes sense, and I think what I would expect. What was the confusing part? e.g. for GOVERNOR races people might expect NULL, but actually got STATEWIDE? I would like to interpret "district" as "the geography/constituency that this person represents", and I think that implementation is consistent with that definition?

my intention is to make it always null for all non-district-based offices in every state

How can we encode missing data? eg #7. If we can make it so there is NO missing data, then this sounds good to me. Don't know how possible this is.

it is the same for every row within the office (right?)

Yes, this is true. However, what I am trying to do is "for every state senate and state house race, compare that race to the top-of-ticket race". This requires me to find the "top-of-ticket" race for every precinct/district. So I need the statewide races to be encoded more consistently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants