Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 2.1.6! New custom serializer. #11

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 47 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -244,7 +244,9 @@ nicknames for these methods, respectively:
`ballots` should be an iterable containing individual ballots.
A ballot is a `dict` mapping the candidate to that ballot's
score for that candidate. The candidate can be any hashable
Python value; the score must be an `int`.
Python value; the score must be an `int`. (Note that
tiebreakers may add additional restrictions to the candidate
and score values.)

`maximum_score` specifies the maximum score allowed for
any vote on any ballot.
Expand All @@ -261,9 +263,12 @@ or an instance of a tiebreaker class. See the
tiebreakers.

`verbosity` specifies how much output you want.
The current supported values are `0` (no output)
and `1` (output); values higher than `1` are
currently equivalent to `1`.
In most contexts, the supported values are `0`
(no output) and `1` (output); some contexts
support higher verbosity values to mean
"print more information", e.g.
`hashed_ballots_tiebreaker` produces incremental
output for verbosity levels `2` and `3`.

`print` lets you specify your own printing function.
By default `election` will use `builtins.print`;
Expand Down Expand Up @@ -355,29 +360,37 @@ in `candidates=None`.

#### `hashed_ballots_tiebreaker`

The preferred tiebreaker for **starvote** is
`hashed_ballots_tiebreaker`.
**starvote**'s preferred--and default--tiebreaker
is `hashed_ballots_tiebreaker`.
This is a class; you should instantiate it
and pass in the instance as the `tiebreaker`
argument when you run the election.

`hashed_ballots_tiebreaker` is the preferred
tiebreaker for **starvote** because it is
tiebreaker for **starvote** because it's

* impossible to usefully control externally,
* impossible to predict, yet
* completely deterministic.

Here's how it works. At initialization time,
Note that the default serializer used by
`hashed_ballots_tiebreaker` requires all candidates
to be `str` objects, and all votes have to be `int`
objects. If you don't change any defaults, you must
restrict yourself to these types (which you probably
were already doing anyway).

Here's `hashed_ballots_tiebreaker`
it works. At initialization time,
this tiebreaker:

* computes a list of all candidates, then
* sorts the list of candidates and stores this list, then
* sorts each ballot, then
* sorts a list of all the sorted ballots, then
* converts this sorted list of sorted ballots
into a binary string (using `marshal.dumps`
by default).
into a binary string (using a custom binary
serializer by default).

Then, when it's asked to break a tie, it

Expand Down Expand Up @@ -1084,6 +1097,30 @@ or otherwise freely redistributable.

## Changelog

**2.1.6** - *2024/12/13*

* Bugfix: previously, `hashed_ballots_tiebreaker` used
`marshal.dumps` as its binary serializer, because I
assumed given identical objects it would always
produce an identical bytes string. This is not true!
(And thanks to Petr Viktorin for pointing it out!)
We also apparently can't rely on `pickle.dumps`
to be deterministic, for similar reasons.

So, **starvote** now has its own bespoke--and
completely deterministic--simple binary serializer,
called `starvote_custom_serializer`. It's tailor-made
for the needs of **starvote** and isn't useful for
anybody else. But it does guarantee that
`hashed_ballots_tiebreaker` will now produce
identical results across all supported Python
versions, across all architectures, regardless of
optimization level.

(There's also a matching deserializer, naturally called
`starvote_custom_deserializer`. You shouldn't need
to use it either.)

**2.1.5** - *2024/11/22*

* New tiebreaker: The `hashed_ballots_tiebreaker`.
Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ classifiers = [
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Programming Language :: Python :: 3.13",
]
dynamic = [
'version',
Expand Down
227 changes: 220 additions & 7 deletions starvote/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@

__doc__ = "An election tabulator for the STAR electoral system, and others"

__version__ = "2.1.5"
__version__ = "2.1.6"

__all__ = [
'Allocated_Score_Voting', # Method
Expand Down Expand Up @@ -92,7 +92,7 @@
import enum
import hashlib
import itertools
import marshal
import io
from math import floor, log10
import os
import pathlib
Expand Down Expand Up @@ -430,6 +430,220 @@ def __call__(self, options, tie, desired, exception):
return result


_start_of_heading = b'\x01'
_start_of_text = b'\x02'
_end_of_text = b'\x03'
_control_character_escape = b'\x1a'
_group_separator = b'\x1d'
_record_separator = b'\x1e'
_unit_separator = b'\x1f'

_int_marker = 'int'
_serialized_int_marker = b'int'

_ballot_marker = 'ballots'
_serialized_ballot_marker = b'ballots'

class _writer(list):

def __call__(self, o):
self.append(o)

def write(self, o):
if isinstance(o, bytes):
self.append(o)
return

if isinstance(o, int):
b = str(o).encode('ascii')
self.append(b)
return

assert isinstance(o, str)
for c in o:
c = c.encode('utf-8')
if c[0] < 32:
self.append(b'\x1a')
self.append(c)

def render(self):
return b''.join(self)


def starvote_custom_serializer(o):
"""
starvote's custom binary serializer for objects.
Only used by the "hashed ballot" tiebreaker.

Only knows how to serialize two types of objects:
* an int, or
* a sorted list of sorted ballot lists.

(A "sorted ballot list" is a ballot dict, converted
to a list via list(ballot_dict.items()), and sorted.)

Returns a binary string containing the serialized form of o.
"""

buffer = _writer()
write = buffer.write

write(_start_of_heading)

if isinstance(o, int):
i = o

write(_serialized_int_marker)
write(_start_of_text)
write(i)

else:
ballots = o

if not isinstance(ballots, list):
raise TypeError("ballots must be a list")

write(_serialized_ballot_marker)
write(_unit_separator)
write(int(len(ballots)))
write(_start_of_text)

for ballot_number, ballot in enumerate(ballots):
if ballot_number:
write(_group_separator)

if not isinstance(ballot, list):
raise TypeError(f"each ballot in ballots must be a list, ballots[{ballot_number}] is type {type(ballot)}")

for entry_number, t in enumerate(ballot):
if not (isinstance(t, tuple) and (len(t) == 2)):
raise TypeError(f"each vote in each ballot in ballots must be a tuple of length 2, ballots[{ballot_number}][{entry_number}] is type {type(t)}")
candidate, vote = t
if not isinstance(candidate, str):
raise TypeError(f"candidate must be str, but {candidate!r} is {type(candidate)}")
if not isinstance(vote, int):
raise TypeError(f"vote must be int, but {vote!r} is {type(vote)}")

if entry_number:
write(_record_separator)

write(candidate)
write(_unit_separator)

write(vote)

write(_end_of_text)

return buffer.render()


class _reader(io.BytesIO):

def __init__(self, b):
super().__init__(b)
self.waiting = None

def __next__(self):
"""
Returns one byte from the stream.
"""
if self.waiting:
c = self.waiting
self.waiting = None
return c
x = self.read1(1)
return x

def __call__(self):
return self.__next__()

def read_str(self):
buffer = []
append = buffer.append
while True:
c = self()
if c < b' ':
if c == _control_character_escape:
c = self()
else:
self.waiting = c
break
append(c)
s = b''.join(buffer).decode('utf-8')
return s

def read_int(self):
s = self.read_str()
return int(s)

def read_marker(self, c):
"Reads a byte from self, which must be c."
got = self()
if got != c:
raise ValueError(f"expected {c!r}, got {got!r}")
return True


def starvote_custom_deserializer(b):
"""
A deserializer for starvote_custom_serializer.
Only used by starvote's test suite.

Only knows how to deserialize the two types of objects
supported by starvote_custom_serializer.
"""

r = _reader(b)

r.read_marker(_start_of_heading)

s = r.read_str()

if s == _int_marker:
r.read_marker(_start_of_text)
i = r.read_int()
r.read_marker(_end_of_text)
return i

assert s == _ballot_marker

o = ballots = []

r.read_marker(_unit_separator)
expected_ballots = r.read_int()

i = 0
ballot = None

while True:
marker = r()

if marker in (_end_of_text, _group_separator):
assert ballot
ballots.append(ballot)
i += 1

if marker == _end_of_text:
assert i == expected_ballots
break

ballot = []
elif marker == _start_of_text:
assert i == 0
ballot = []
elif marker != _record_separator:
raise ValueError(f"expected start of text, end of text, group separator, or record separator, got {marker!r}")

candidate = r.read_str()
r.read_marker(_unit_separator)

vote = r.read_int()

ballot.append((candidate, vote))

return ballots


@_add_tiebreaker
class hashed_ballots_tiebreaker(Tiebreaker):
"""
Expand All @@ -444,8 +658,8 @@ class hashed_ballots_tiebreaker(Tiebreaker):
* sorts each ballot, then
* sorts a list of all the sorted ballots, then
* converts this sorted list of sorted ballots
into a binary string (using "marshal.dumps"
by default), then
into a binary string (using a custom binary
serializer by default), then
* hashes a serialized monotonically increasing counter
(1 by default, incremented after every tiebreaker)
followed by that binary string, using a
Expand Down Expand Up @@ -475,7 +689,7 @@ class hashed_ballots_tiebreaker(Tiebreaker):
"""
def __init__(self, *,
counter=1, hash='sha3_512', Random=random.Random,
serializer=marshal.dumps, shuffles=3,
serializer=starvote_custom_serializer, shuffles=3,
):
self.counter = counter
self.hash = hash
Expand Down Expand Up @@ -530,8 +744,7 @@ def __call__(self, options, tie, desired, exception):
c = self.serializer(self.counter)

digester = hashlib.new(self.hash)
for o in (c, self.serialized_ballots):
b = self.serializer(o)
for b in (c, self.serialized_ballots):
digester.update(b)
seed = digester.digest()

Expand Down
Loading
Loading