Guaranteeing Mux statistics / behavior #81

ejhumphrey · 2017-04-12T03:24:09Z

tl;dr: I think our Mux tests need to be more rigorous to make sure we're getting back the statistics one would expect from the parameters used to initialize the object.

I was looking into #79, and decided to test that this issue was resolved in the midst of a broader refactor. While things mostly worked, I observed some behavior I didn't immediately understand. Basically, if I ended up with a bad RNG state, I'd end up with two duplicate streams of one sub-mux, and nothing from the other.

Okay, user error -- increasing the oversampling (k=2 -> 10, max_iter=10->100) "fixes" it, but this is a dumb hack and so I tried to do the correct thing, i.e. follow the docstring:

If with_replacement is False, setting revive=Truewill re-insert previously exhausted streams into the candidate set. This configuration allows a stream to be active at most once at any time.

However, this also didn't give me the behavior I wanted (at least, not always). Checking the tests, it seems that test_mux_revive doesn't inspect the stream it gets back, and I'm suspicious that Mux might not always be doing the thing we expect.

I'm going to explore with more testing, but in parallel -- is this perhaps surprising or consistent with others' observations? @bmcfee, @cjacoby?

The text was updated successfully, but these errors were encountered:

ejhumphrey · 2017-04-12T04:32:46Z

kay, few quick pre-bedtime findings:

Mux-of-Mux issues seem to be an issue of repeat iteration.

For example, this causes problems:

abc = pescador.Streamer('abc')
xyz = pescador.Streamer('xyz')
mux1 = pescador.Mux([abc, xyz], k=2, rate=None, revive=True,
                    with_replacement=False, prune_empty_streams=False)
samples1 = mux1.iterate(max_iter=1000)  # Fine

n123 = pescador.Streamer('123')
n456 = pescador.Streamer('456')
mux2 = pescador.Mux([n123, n456], k=2, rate=None, revive=True,
                    with_replacement=False, prune_empty_streams=False)
samples2 = mux2.iterate(max_iter=1000)  # Fine

mux3 = pescador.Mux([mux1, mux2], k=2, rate=None,
                    with_replacement=False, revive=True,
                    prune_empty_streams=False)
samples3 = mux3.iterate(max_iter=1000)  # ...hangs?

But this is fine:

abc = pescador.Streamer('abc')
xyz = pescador.Streamer('xyz')
mux1 = pescador.Mux([abc, xyz], k=2, rate=None, revive=True,
                    with_replacement=False, prune_empty_streams=False)

n123 = pescador.Streamer('123')
n456 = pescador.Streamer('456')
mux2 = pescador.Mux([n123, n456], k=2, rate=None, revive=True,
                    with_replacement=False, prune_empty_streams=False)

mux3 = pescador.Mux([mux1, mux2], k=2, rate=None,
                    with_replacement=False, revive=True,
                    prune_empty_streams=False)
samples3 = mux3.iterate(max_iter=1000000)  # success!

In lieu of fixing this (or identifying what's up), it seems like best practice is "don't iterate over streamers prematurely."

Global distributions seem fine.

>>> import collections
>>> print(collections.Counter(samples3))
Counter({'5': 83404, '4': 83404, '6': 83403, 'z': 83343, 'y': 83343, 'x': 83343, 
'a': 83305, 'b': 83305, 'c': 83304, '1': 83282, '3': 83282, '2': 83282})
>>> print(samples[:10])
['1', '4', 'x', 'a', '2', 'y', 'z', '5', '3', '6']

There are within-bag correlations, but note that this is because there's no shuffling in the iterables used to initialize the underlying streamers.

Mux is pretty flexible, and we'll want to have some good examples of how to use / not use the parameters. I (mostly) know what I'm doing, and it seems easy to configure the mux with statistically bad settings.

bmcfee · 2017-04-12T13:03:44Z

Basically, if I ended up with a bad RNG state, I'd end up with two duplicate streams of one sub-mux,

This should only happen if you have with_replacement=True, in which case, this is expected behavior, not bad state.

As for your second post, it's gonna take me a while to digest.

cjacoby · 2017-04-15T00:28:06Z

I definitely have seen problems like this in the past month or so. 👍 to identifying the problem with tests and then fixing the behavior.

Ditto what @bmcfee said on digesting. Gonna have to review that a couple times.

ejhumphrey · 2017-04-17T19:28:29Z

continuing on with my inquiry into the behavior of mux. I've added a test in a PR, but I'll reproduce the guts of it here for ease of reference:

def _choice(vals):
    while True:
        yield random.choice(vals)

ab = pescador.Streamer(_choice, 'ab')
cd = pescador.Streamer(_choice, 'cd')
ef = pescador.Streamer(_choice, 'ef')
mux1 = pescador.Mux([ab, cd, ef], k=1, rate=2,
                    with_replacement=False, revive=True)

gh = pescador.Streamer(_choice, 'gh')
ij = pescador.Streamer(_choice, 'ij')
kl = pescador.Streamer(_choice, 'kl')

mux2 = pescador.Mux([gh, ij, kl], k=1, rate=2,
                    with_replacement=False, revive=True)

mux3 = pescador.Mux([mux1, mux2], k=2, rate=None,
                    with_replacement=False, revive=True)
samples = list(mux3.iterate(max_iter=10000))
count = collections.Counter(samples)
max_count, min_count = max(count.values()), min(count.values())
assert (max_count - min_count) / max_count < 0.2
print(count)
assert set('abcdefghijkl') == set(count.keys())

Interestingly, the counter after 10k iterations looks like this:

Counter({'c': 880, 'g': 875, 'j': 874, 'h': 854, 'e': 842, 'l': 837, 
         'i': 835, 'k': 814, 'f': 809, 'b': 804, 'd': 791, 'a': 785})

which has a max discrepancy ( (max - min) / max) of just around 10%. As we'd hope, the sampling becomes more uniform as max_iter increases ... empirically, the bias goes down to 3% and 1% for 100k and 1M iterations, respectively.

I'm feeling more confident about this, but curious what folks think?

cjacoby · 2017-04-17T20:40:30Z

👍

A while back (a month or two), I saw some weird behavior wrt this, but then I lost it, and I haven't run into it since. I made some experimental tests much like this at that time, and saw basically the same result, but I generally speaking would be pro having tests like this in the codebase. (Possibly with random seeds for reproducibility).

Not sure if I'm helping with your question.

bmcfee · 2017-04-18T19:49:01Z

so.... is the consensus then that we're fine here? I don't quite have the bandwidth to think about all the details here, but it seems good to me.

EDIT: for comparison purposes, it would be handy to have a baseline that uses a flat mux, all seeds active, no replacement, and no rate parameter, so that it's really just sampling uniformly from the entire pool. That way we can check the discrepancy in the convergence to uniform between the hierarchical mux and the flat mux.

ejhumphrey · 2017-04-18T21:50:58Z

I would say we're mostly there as of #88, but missing a the test you describe (which I don't fully grok?) and one that runs multiple streams run to exhaustion, verifying that all samples recovered.

bmcfee · 2017-04-19T12:52:21Z

Oh, I wasn't proposing a test, just a simplified comparison like the one you described above but with fewer moving parts.

ejhumphrey · 2017-06-15T14:24:11Z

"Definition of done" includes state-based testing, and inspecting internal mux states

bmcfee · 2017-08-21T15:30:45Z

After merging #100 , is there anything left for this wrt internal states? Or is that blocked by #94 ?

ejhumphrey · 2017-08-21T15:33:15Z

ya, i think we've gotten as far as we can without #94.

bmcfee · 2017-08-21T19:26:15Z

Ok, so should we close this out? Punt it up to 2.0?

bmcfee · 2017-08-24T20:41:14Z

Made the executive decision to punt the issue upward.

bmcfee · 2018-01-19T18:29:26Z

Update wrt #106 , I cooked up a notebook here to empirically measure the statistics (entropy) produced by a poissonmux. Everything seems to behave as expected -- I don't think we necessarily need to add unit tests here. What do yall think?

cjacoby · 2018-01-19T21:43:03Z

Perhaps we could create another repo in the pescadores team to contain things like this (regression tests, kinda)? Perhaps have it save actual numbers to a .json file as well, so you can compare current vs saved.

bmcfee · 2018-01-24T16:25:55Z

Since @ejhumphrey has chimed in on #94, and clarified that the desired functionality is really about modifying the weights in-flight on StochasticMux, I don't think #94 is a blocker on this anymore.

Revisiting why this issue is still open, I think the paper in progress #32 and the tests merged with #100 sufficiently address the points. The tests could always be expanded -- i don't think we need/want a separate repo for that -- but i don't think that should block this issue. Any objection to closing?

cjacoby · 2018-01-24T17:49:12Z

No objection

ejhumphrey mentioned this issue Apr 12, 2017

v1.1 -> v2.0 Semantics refactor #80

Closed

ejhumphrey added the testing label Apr 18, 2017

ejhumphrey added this to the 1.1.0 milestone Jun 13, 2017

ejhumphrey self-assigned this Jun 13, 2017

ejhumphrey mentioned this issue Jun 13, 2017

Muxes can get caught in infinite loop. #87

Closed

ejhumphrey mentioned this issue Jul 6, 2017

Tests for Mux statistics #100

Merged

bmcfee modified the milestones: 2.0.0, 1.1.0 Aug 24, 2017

bmcfee closed this as completed Jan 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guaranteeing Mux statistics / behavior #81

Guaranteeing Mux statistics / behavior #81

ejhumphrey commented Apr 12, 2017

ejhumphrey commented Apr 12, 2017

bmcfee commented Apr 12, 2017

cjacoby commented Apr 15, 2017

ejhumphrey commented Apr 17, 2017 •

edited

Loading

cjacoby commented Apr 17, 2017

bmcfee commented Apr 18, 2017 •

edited

Loading

ejhumphrey commented Apr 18, 2017

bmcfee commented Apr 19, 2017

ejhumphrey commented Jun 15, 2017

bmcfee commented Aug 21, 2017

ejhumphrey commented Aug 21, 2017

bmcfee commented Aug 21, 2017

bmcfee commented Aug 24, 2017

bmcfee commented Jan 19, 2018

cjacoby commented Jan 19, 2018

bmcfee commented Jan 24, 2018

cjacoby commented Jan 24, 2018

Guaranteeing Mux statistics / behavior #81

Guaranteeing Mux statistics / behavior #81

Comments

ejhumphrey commented Apr 12, 2017

ejhumphrey commented Apr 12, 2017

bmcfee commented Apr 12, 2017

cjacoby commented Apr 15, 2017

ejhumphrey commented Apr 17, 2017 • edited Loading

cjacoby commented Apr 17, 2017

bmcfee commented Apr 18, 2017 • edited Loading

ejhumphrey commented Apr 18, 2017

bmcfee commented Apr 19, 2017

ejhumphrey commented Jun 15, 2017

bmcfee commented Aug 21, 2017

ejhumphrey commented Aug 21, 2017

bmcfee commented Aug 21, 2017

bmcfee commented Aug 24, 2017

bmcfee commented Jan 19, 2018

cjacoby commented Jan 19, 2018

bmcfee commented Jan 24, 2018

cjacoby commented Jan 24, 2018

ejhumphrey commented Apr 17, 2017 •

edited

Loading

bmcfee commented Apr 18, 2017 •

edited

Loading