Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guaranteeing Mux statistics / behavior #81

Closed
ejhumphrey opened this issue Apr 12, 2017 · 17 comments
Closed

Guaranteeing Mux statistics / behavior #81

ejhumphrey opened this issue Apr 12, 2017 · 17 comments
Assignees
Labels
Milestone

Comments

@ejhumphrey
Copy link
Collaborator

tl;dr: I think our Mux tests need to be more rigorous to make sure we're getting back the statistics one would expect from the parameters used to initialize the object.

I was looking into #79, and decided to test that this issue was resolved in the midst of a broader refactor. While things mostly worked, I observed some behavior I didn't immediately understand. Basically, if I ended up with a bad RNG state, I'd end up with two duplicate streams of one sub-mux, and nothing from the other.

Okay, user error -- increasing the oversampling (k=2 -> 10, max_iter=10->100) "fixes" it, but this is a dumb hack and so I tried to do the correct thing, i.e. follow the docstring:

If with_replacement is False, setting revive=Truewill re-insert previously exhausted streams into the candidate set. This configuration allows a stream to be active at most once at any time.

However, this also didn't give me the behavior I wanted (at least, not always). Checking the tests, it seems that test_mux_revive doesn't inspect the stream it gets back, and I'm suspicious that Mux might not always be doing the thing we expect.

I'm going to explore with more testing, but in parallel -- is this perhaps surprising or consistent with others' observations? @bmcfee, @cjacoby?

@ejhumphrey
Copy link
Collaborator Author

kay, few quick pre-bedtime findings:

  1. Mux-of-Mux issues seem to be an issue of repeat iteration.

For example, this causes problems:

abc = pescador.Streamer('abc')
xyz = pescador.Streamer('xyz')
mux1 = pescador.Mux([abc, xyz], k=2, rate=None, revive=True,
                    with_replacement=False, prune_empty_streams=False)
samples1 = mux1.iterate(max_iter=1000)  # Fine

n123 = pescador.Streamer('123')
n456 = pescador.Streamer('456')
mux2 = pescador.Mux([n123, n456], k=2, rate=None, revive=True,
                    with_replacement=False, prune_empty_streams=False)
samples2 = mux2.iterate(max_iter=1000)  # Fine

mux3 = pescador.Mux([mux1, mux2], k=2, rate=None,
                    with_replacement=False, revive=True,
                    prune_empty_streams=False)
samples3 = mux3.iterate(max_iter=1000)  # ...hangs?

But this is fine:

abc = pescador.Streamer('abc')
xyz = pescador.Streamer('xyz')
mux1 = pescador.Mux([abc, xyz], k=2, rate=None, revive=True,
                    with_replacement=False, prune_empty_streams=False)

n123 = pescador.Streamer('123')
n456 = pescador.Streamer('456')
mux2 = pescador.Mux([n123, n456], k=2, rate=None, revive=True,
                    with_replacement=False, prune_empty_streams=False)

mux3 = pescador.Mux([mux1, mux2], k=2, rate=None,
                    with_replacement=False, revive=True,
                    prune_empty_streams=False)
samples3 = mux3.iterate(max_iter=1000000)  # success!

In lieu of fixing this (or identifying what's up), it seems like best practice is "don't iterate over streamers prematurely."

  1. Global distributions seem fine.
>>> import collections
>>> print(collections.Counter(samples3))
Counter({'5': 83404, '4': 83404, '6': 83403, 'z': 83343, 'y': 83343, 'x': 83343, 
'a': 83305, 'b': 83305, 'c': 83304, '1': 83282, '3': 83282, '2': 83282})
>>> print(samples[:10])
['1', '4', 'x', 'a', '2', 'y', 'z', '5', '3', '6']

There are within-bag correlations, but note that this is because there's no shuffling in the iterables used to initialize the underlying streamers.

  1. Mux is pretty flexible, and we'll want to have some good examples of how to use / not use the parameters. I (mostly) know what I'm doing, and it seems easy to configure the mux with statistically bad settings.

@bmcfee
Copy link
Collaborator

bmcfee commented Apr 12, 2017

Basically, if I ended up with a bad RNG state, I'd end up with two duplicate streams of one sub-mux,

This should only happen if you have with_replacement=True, in which case, this is expected behavior, not bad state.

As for your second post, it's gonna take me a while to digest.

@cjacoby
Copy link
Collaborator

cjacoby commented Apr 15, 2017

I definitely have seen problems like this in the past month or so. 👍 to identifying the problem with tests and then fixing the behavior.

Ditto what @bmcfee said on digesting. Gonna have to review that a couple times.

@ejhumphrey
Copy link
Collaborator Author

ejhumphrey commented Apr 17, 2017

continuing on with my inquiry into the behavior of mux. I've added a test in a PR, but I'll reproduce the guts of it here for ease of reference:

def _choice(vals):
    while True:
        yield random.choice(vals)

ab = pescador.Streamer(_choice, 'ab')
cd = pescador.Streamer(_choice, 'cd')
ef = pescador.Streamer(_choice, 'ef')
mux1 = pescador.Mux([ab, cd, ef], k=1, rate=2,
                    with_replacement=False, revive=True)

gh = pescador.Streamer(_choice, 'gh')
ij = pescador.Streamer(_choice, 'ij')
kl = pescador.Streamer(_choice, 'kl')

mux2 = pescador.Mux([gh, ij, kl], k=1, rate=2,
                    with_replacement=False, revive=True)

mux3 = pescador.Mux([mux1, mux2], k=2, rate=None,
                    with_replacement=False, revive=True)
samples = list(mux3.iterate(max_iter=10000))
count = collections.Counter(samples)
max_count, min_count = max(count.values()), min(count.values())
assert (max_count - min_count) / max_count < 0.2
print(count)
assert set('abcdefghijkl') == set(count.keys())

Interestingly, the counter after 10k iterations looks like this:

Counter({'c': 880, 'g': 875, 'j': 874, 'h': 854, 'e': 842, 'l': 837, 
         'i': 835, 'k': 814, 'f': 809, 'b': 804, 'd': 791, 'a': 785})

which has a max discrepancy ( (max - min) / max) of just around 10%. As we'd hope, the sampling becomes more uniform as max_iter increases ... empirically, the bias goes down to 3% and 1% for 100k and 1M iterations, respectively.

I'm feeling more confident about this, but curious what folks think?

@cjacoby
Copy link
Collaborator

cjacoby commented Apr 17, 2017

👍

A while back (a month or two), I saw some weird behavior wrt this, but then I lost it, and I haven't run into it since. I made some experimental tests much like this at that time, and saw basically the same result, but I generally speaking would be pro having tests like this in the codebase. (Possibly with random seeds for reproducibility).

Not sure if I'm helping with your question.

@bmcfee
Copy link
Collaborator

bmcfee commented Apr 18, 2017

so.... is the consensus then that we're fine here? I don't quite have the bandwidth to think about all the details here, but it seems good to me.

EDIT: for comparison purposes, it would be handy to have a baseline that uses a flat mux, all seeds active, no replacement, and no rate parameter, so that it's really just sampling uniformly from the entire pool. That way we can check the discrepancy in the convergence to uniform between the hierarchical mux and the flat mux.

@ejhumphrey
Copy link
Collaborator Author

I would say we're mostly there as of #88, but missing a the test you describe (which I don't fully grok?) and one that runs multiple streams run to exhaustion, verifying that all samples recovered.

@bmcfee
Copy link
Collaborator

bmcfee commented Apr 19, 2017

Oh, I wasn't proposing a test, just a simplified comparison like the one you described above but with fewer moving parts.

@ejhumphrey ejhumphrey added this to the 1.1.0 milestone Jun 13, 2017
@ejhumphrey ejhumphrey self-assigned this Jun 13, 2017
@ejhumphrey
Copy link
Collaborator Author

"Definition of done" includes state-based testing, and inspecting internal mux states

@bmcfee
Copy link
Collaborator

bmcfee commented Aug 21, 2017

After merging #100 , is there anything left for this wrt internal states? Or is that blocked by #94 ?

@ejhumphrey
Copy link
Collaborator Author

ya, i think we've gotten as far as we can without #94.

@bmcfee
Copy link
Collaborator

bmcfee commented Aug 21, 2017

Ok, so should we close this out? Punt it up to 2.0?

@bmcfee bmcfee modified the milestones: 2.0.0, 1.1.0 Aug 24, 2017
@bmcfee
Copy link
Collaborator

bmcfee commented Aug 24, 2017

Made the executive decision to punt the issue upward.

@bmcfee
Copy link
Collaborator

bmcfee commented Jan 19, 2018

Update wrt #106 , I cooked up a notebook here to empirically measure the statistics (entropy) produced by a poissonmux. Everything seems to behave as expected -- I don't think we necessarily need to add unit tests here. What do yall think?

@cjacoby
Copy link
Collaborator

cjacoby commented Jan 19, 2018

Perhaps we could create another repo in the pescadores team to contain things like this (regression tests, kinda)? Perhaps have it save actual numbers to a .json file as well, so you can compare current vs saved.

@bmcfee
Copy link
Collaborator

bmcfee commented Jan 24, 2018

Since @ejhumphrey has chimed in on #94, and clarified that the desired functionality is really about modifying the weights in-flight on StochasticMux, I don't think #94 is a blocker on this anymore.

Revisiting why this issue is still open, I think the paper in progress #32 and the tests merged with #100 sufficiently address the points. The tests could always be expanded -- i don't think we need/want a separate repo for that -- but i don't think that should block this issue. Any objection to closing?

@cjacoby
Copy link
Collaborator

cjacoby commented Jan 24, 2018

No objection

@bmcfee bmcfee closed this as completed Jan 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants