Representation of overvoted / invalid ballots #52

danwallach · 2021-01-13T20:50:34Z

danwallach
Jan 13, 2021

Consider the following: If a voter using an optical scan ballot enters a ballot with an overvote, what's supposed to happen next?

In at least some cases, including central tabulation scanners, the answer is apparently that the CVR for that ballot includes the overvote. That means that the proper way to tabulate those CVRs isn't simply to add up each column! This leads to some really unpleasant questions of how ElectionGuard should represent those ballots.

Options?

Error. Reject the ballot. Force some human process to deal with it.
"Interpret" the overvoted contest into an undervote (which is how it's properly to be tabulated), and that's the resulting ciphertext.
"Interpret" the overvoted contest, as above, but leave some additional metadata (perhaps conventionally encrypted) that includes the voter's "actual" intent.
Make the cryptography way more complicated, so that it accumulates as if it were an undervote, but decrypts as if it were an overvote.

I'm kinda leaning toward one of the "interpret" solutions, possibly with auxiliary metadata that would only show up if you did an individual decryption.

beausmith · 2021-01-13T21:02:42Z

beausmith
Jan 13, 2021

Various jurisdictions may have laws which influence implementation here.

The "Error" option would account for the widest implementation.

With the "Error" scenario, adjudication boards could create a replacement ballot — with the overvotes omitted or corrected — to submit for scanning.

0 replies

rc-ms · 2021-01-22T17:56:59Z

rc-ms
Jan 22, 2021

Right now ElectionGuard errors by design; we no longer have a well-formed ballot so the proofs are undermined. We rely on the implementing system to handle overvotes.

The "default" setting for ElectionGuard is the use case of end-to-end verifiability, which In this use case would assume precinct scan, not central tabulation. In that context, the ballot is either held or rejected and the voter is presented with a choice to correct the ballot.

If we were to assume a central tabulation use case, I'm of the interpretation route as well; by default, though, I would not want those votes to be included in the initial tabulation, since there is likely the need for adjudication or other separate processes

0 replies

danwallach · 2021-01-22T18:16:09Z

danwallach
Jan 22, 2021
Author

We've had the idea floating around for a while now that we need some auxiliary "conventionally encrypted" arbitrary text as a way to handle write-in votes. My proposal is that we generalize this a bit and make it a JSON structure of some sort that we can haggle around. For now, I'm assuming one text blob per contest, which means on a contest with three normal selections and one write-in selection, where the IDs are selection1 through selection4, the text field might then be:

{
    "selection4": "Lizard People"
}

Or, it could be something fancier, like:

{
    "selection1": 1,
    "selection2": 0,
    "selection3": 1,
    "selection4": "Lizard People"
}

The latter case supposes a voter who filled in bubbles for two normal candidates as well as the write-in candidate, and then put some text in the write-in field which the scanner somehow magically recognized. Of course, we could make this even fancier in several dimensions. Similarly, we might decide that we want to have one auxiliary text blob for the entire ballot, or for every contest, or even for every selection. And we could fancy it up even more, by representing the full generality of the election markup. I'm not particular sure I'd recommend that, initially, but going with a JSON dictionary as the base datatype here allows for future extensibility.

Anyway, for the arlo-e2e use case, we're dealing with central tabulators which just directly output these overvotes, so this isn't just a fun hypothetical. It's a concrete problem we need to solve. Similarly, write-in votes on touch-screen voting machines are a very real problem. A general-purpose JSON structure, "conventionally encrypted", would seem to address all these needs.

But how do you do the conventional encryption so it doesn't leak anything by virtue of its length?

We can certainly create a "session key", encrypted with the public key of the election, and then use that session key to run a conventional AES-GCM (or equivalent) machine. We can mitigate variable-length JSON strings by just making a decision, up front, that we'll require a specific number of characters for the write-in fields. That may mean that we output "Lizard People " or something along those lines with extra whitespace and a rule for truncating longer strings.

1 reply

rc-ms Jan 22, 2021

Agreed. We've been knocking something similar around. Are you around next week for a discussion? Calling dr. @benaloh

benaloh · 2021-01-22T18:59:15Z

benaloh
Jan 22, 2021
Collaborator

Yuck. My first thought was to mark the ballot as spoiled. We currently say that all ballots -- including spoiled ballots -- should pass validation. But it would be OK to have invalid spoiled ballots in the system. However, it seems that this is not sufficient since I presume that a cast ballot with an overvote in one contest should still be counted normally in other contests.

My next thought is that we might need to add a one-bit flag to each contest on each ballot which indicates whether or not there is an overvote in the contest. Perhaps a more elegant way to do this would be to add an integer value to each contest on each ballot which indicates the total number of selections made. In either case, since this flag/integer would not be encrypted, the tallying logic could simply say that encrypted votes are not tallied when the number of votes in a contest exceeds the limit. This allows an overvote ballot to pass validation and still have the overvote not be tallied. The unfortunate thing is that this does require some code changes.

1 reply

danwallach Jan 22, 2021
Author

You're right, that we might get one contest as an overvote while the rest are fine.

I don't really like a plaintext flag, since that potentially violates ballot secrecy in a bunch of nasty ways. I much prefer some sort of encrypted metadata, reflecting the original "uninterpreted" ballot, which then allows the homomorphic tallying to continue without any special cases.

danwallach · 2021-01-22T18:59:28Z

danwallach
Jan 22, 2021
Author

Also, one more thing:

Consider the case where there are enough write-in candidates that we have at least the possibility that a write-in candidate is among the winners. (Murkowski famously won this way: https://www.reuters.com/article/us-usa-elections-murkowski/senator-lisa-murkowski-wins-alaska-write-in-campaign-idUSTRE6AG51C20101118)

A fancy thing we could do is a reencryption mixnet of some sort, allowing us to shuffle up all the JSON strings while provably not damaging any of them. Then decrypt conventionally. Then tally conventionally. @benaloh should chime in here, but I think this would mean that we can't use conventional encryption any more, but rather that we need to shoehorn the write-in string into ElementModQ, limiting us to 31 bytes of plaintext.

So, it's entirely possible that we should consider a per write-in ElementModP to hold the write-in result, and thus something suitable for mixnet reencryption. So far as I can tell, this issue is orthogonal to the need for a general-purpose JSON structure, which we'd use as part of RLAs.

0 replies

benaloh · 2021-01-22T19:19:16Z

benaloh
Jan 22, 2021
Collaborator

We've thus far chosen to punt on write-ins beyond just counting the number of write-ins. Doing this right involves a lot of new code to implement a MixNet -- as well as new verification steps to verify correct mixing.

We would need to do entirely different encryption for write-ins. We couldn't even shoehorn things into the 31-32 bytes available mod q because we're using exponential ElGamal and need to compute discrete logs to decrypt. It's a bit strange, but even though we have a (nearly) 32-byte space to work with, we can't reasonably use more than about 4 bytes. This is ample for holding an integer tally, but not much more.

At one point, I started including details on how to use ordinary (non-exponential) ElGamal to convey a 32-byte key share in the key generation phase. It became increasingly painful and after several pages of ugly documentation, we decided to pull it all out and just use RSA.

While exponential ElGamal works really nicely, ordinary integer ElGamal is so painful that I couldn't find any instances of its use. Ordinary elliptic-curve ElGamal is used widely, but that also requires lots more code and even more pain for verifiers.

2 replies

danwallach Jan 22, 2021
Author

So how about this? We acknowledge that it's possible for write-ins to win, but unlikely.

That means that we have a conventionally encrypted version of the write-in text available to us, if necessary, but we expect to never actually need it. "Break glass in case of emergency."

This keeps the code really simple, and pushes the complexity onto the election official. To some extent, this is necessary, regardless, due to most states' requirement for understanding the "intent of the voter", which means that humans are going to need to map from whatever the user entered to known human beings.

Fun fact: in Texas, you must register if you wish to stand for a race as a write-in candidate. If no candidate does so, then the election officials don't have to give the write-in line to the voters. The bar for registration is much lower, but it still requires getting some signatures, doing paperwork, etc.

danwallach Jan 22, 2021
Author

And, now that I think about it, one way an election official could deal with "oy vey, a write-in might have won" is to bust out all the human readable ballots and have lots of humans just make piles for each expected candidate, with an "other" pile for all the Lizard People votes. As such, they're highly unlikely to ever need to decrypt all of the write-in fields. They'll only want to have them around for RLAs, where conventional crypto is just fine.

benaloh · 2021-01-22T19:31:16Z

benaloh
Jan 22, 2021
Collaborator

Yes. There is no harm in having a conventional encryption that is opened only if necessary. For that use E2E-verification just covers the number of write-ins but not their contents.

For the Texas case, we could create an entry for each registered write-in. This doesn't imply that write-in candidates need to be included in the UI, but it gives us a place to mark a named write-in candidate in our structure whenever it is recognized as a voter selection without our having to do anything special with the EG code.

4 replies

danwallach Jan 22, 2021
Author

We discussed this at some point in the STAR-Vote meetings, and I believe the decision was that you could never put it anywhere in the UI, because politics.

"Interpretation" from a typed string to a registered write-in candidate requires human involvement, because misspellings and the like are guaranteed to be all over the place, and in fact, the courts may get involved. (Hypothetical: if a voter enters "LBJ", is that a valid expression of voter intent for "Lyndon Baines Johnson"?)

Since interpretation requires humans, and it's only going to happen on relatively rare occasions, I'm going to argue that we should make the engineering decision to punt that interpretation to something that happens external to ElectionGuard. We've got the data. We can decrypt it and get it to you if you really need it. But then it's up to you to figure out what to do with it.

rc-ms Jan 22, 2021

i agree; for a variety of reasons (not just the use cases of interpretation and overvoting, but also future use cases and implementations of EG); certainly right now, though, we only want to capture the data and NOT interpret it as part of the core tally functonality.

benaloh Jan 22, 2021
Collaborator

I'm happy to punt this as well. FWIW, I note that some BMD/DRE devices require write-ins to be entered electronically. I think that there are at least some Hart devices with a wheel for entering alphanumerics. In a case like this, it would be possible for a vendor to say, "If write-in value = XYZZY, then mark previously registered candidate XYZZY". However, this has nothing to do with EG code, so we can pass.

rc-ms Jan 22, 2021

certainly capturing the data from DREs/BMDs was contemplated for electronic voting; there's also the potential for image scans and other types of data entirely (or pointers to anyway). being able to access metadata of the ballot without perturbing in any way the ballot itself will is a foundational capability for any kind of ballot or data linking

AddressXception · 2021-02-03T18:47:34Z

AddressXception
Feb 3, 2021

currently we have the following on the ContestDescription object:

    # Number of candidates that are elected in the contest ("n" of n-of-m).
    # Note: a referendum is considered a specific case of 1-of-m in ElectionGuard
    number_elected: int

    # Maximum number of votes/write-ins per voter in this contest. Used in cumulative voting
    # to indicate how many total votes a voter can spread around. In n-of-m elections, this will
    # be None.
    votes_allowed: Optional[int]

and the following in encrypt_contest:

    # TODO: ISSUE #33: support other cases such as cumulative voting
    # (individual selections being an encryption of > 1)
    if (
        contest_description.votes_allowed is not None
        and selection_count < contest_description.votes_allowed
    ):
        log_warning(
            "mismatching selection count: only n-of-m style elections are currently supported"
        )

perhaps one solution here would be to generate 2 sets of proofs, one for the number_elected (which is what we do now) and another for the votes_allowed? In the current code we would expect the proof of selection limit to fail, but the proof of votes allowed to pass for the use cases mentioned above. It would be up to the consuming application to determine how to interpret this.

in any case, we'll need more thought around this if we are going to support other types of voting that support values different from 0 or 1 for any individual contest.

food for thought

0 replies

benaloh · 2021-02-03T20:05:47Z

benaloh
Feb 3, 2021
Collaborator

So there may be a pretty easy solution if we take a step up. If a voter overvotes a contest, we can encode that as though the voter left that contest blank. Everything else will go just fine as long as it's clear to everyone that this is the expected behavior and not ElectionGuard somehow deleting votes.

In the alternative, it is certainly possible to encode the ballot as it is presented and count only votes in contests where the selection limit is not exceeded. But doing this without publicly revealing which ballots were excluded requires substantially more complex zero-knowledge proofs and computations. The problem is that now everything is linear and we're just adding (encrypted) ballots. If we introduce what amounts to an encrypted flag or integer that is used to determine whether or not a vote is to be counted, we are introducing multiplication. (Instead of computing V1+V2+V3+..., we'd be computing C1V1+C2V2+C3V3+... where each C is either a zero or one to indicate whether or not the corresponding V should be counted.) This would be A LOT of work -- perhaps even requiring an entirely different method of encryption.

2 replies

danwallach Feb 3, 2021
Author

After all this discussion, I'm increasingly coming down to my third option (up at the top of the thread), which is to "interpret" the ballot into the ciphertext but have the original plaintext "conventionally encrypted", allowing for cases like an RLA where you truly need to see the original voter intent. Of course, an RLA would also verify that the interpretation was correct for every ballot it touches.

nickboucher Feb 4, 2021

Here's a (potentially crazy) thought -- what if contests containing an overvote on a ballot were included in the homomorphic summation, but then were subsequently "subtracted" away by dividing the resulting exponential ElGamal ciphertexts. This would of course introduce a divide by zero risk, but it would be very statistically unlikely and perhaps some creative thinking would be able to find a way to protect against this. Following this would allow avoiding the non-linear behavior that @josh mentioned.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Representation of overvoted / invalid ballots #52

{{title}}

Replies: 9 comments 10 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Representation of overvoted / invalid ballots #52

Replies: 9 comments · 10 replies

danwallach Jan 22, 2021 Author

benaloh Jan 22, 2021 Collaborator

danwallach Jan 22, 2021 Author

danwallach Jan 22, 2021 Author

benaloh Jan 22, 2021 Collaborator

danwallach Jan 22, 2021 Author

danwallach Jan 22, 2021 Author

benaloh Jan 22, 2021 Collaborator

danwallach Jan 22, 2021 Author

benaloh Jan 22, 2021 Collaborator

benaloh Feb 3, 2021 Collaborator

danwallach Feb 3, 2021 Author

Replies: 9 comments 10 replies

danwallach
Jan 22, 2021
Author

benaloh
Jan 22, 2021
Collaborator

danwallach Jan 22, 2021
Author

danwallach
Jan 22, 2021
Author

benaloh
Jan 22, 2021
Collaborator

danwallach Jan 22, 2021
Author

danwallach Jan 22, 2021
Author

benaloh
Jan 22, 2021
Collaborator

danwallach Jan 22, 2021
Author

benaloh Jan 22, 2021
Collaborator

benaloh
Feb 3, 2021
Collaborator

danwallach Feb 3, 2021
Author