-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Guaranteeing Mux statistics / behavior #81
Comments
kay, few quick pre-bedtime findings:
For example, this causes problems: abc = pescador.Streamer('abc')
xyz = pescador.Streamer('xyz')
mux1 = pescador.Mux([abc, xyz], k=2, rate=None, revive=True,
with_replacement=False, prune_empty_streams=False)
samples1 = mux1.iterate(max_iter=1000) # Fine
n123 = pescador.Streamer('123')
n456 = pescador.Streamer('456')
mux2 = pescador.Mux([n123, n456], k=2, rate=None, revive=True,
with_replacement=False, prune_empty_streams=False)
samples2 = mux2.iterate(max_iter=1000) # Fine
mux3 = pescador.Mux([mux1, mux2], k=2, rate=None,
with_replacement=False, revive=True,
prune_empty_streams=False)
samples3 = mux3.iterate(max_iter=1000) # ...hangs? But this is fine: abc = pescador.Streamer('abc')
xyz = pescador.Streamer('xyz')
mux1 = pescador.Mux([abc, xyz], k=2, rate=None, revive=True,
with_replacement=False, prune_empty_streams=False)
n123 = pescador.Streamer('123')
n456 = pescador.Streamer('456')
mux2 = pescador.Mux([n123, n456], k=2, rate=None, revive=True,
with_replacement=False, prune_empty_streams=False)
mux3 = pescador.Mux([mux1, mux2], k=2, rate=None,
with_replacement=False, revive=True,
prune_empty_streams=False)
samples3 = mux3.iterate(max_iter=1000000) # success! In lieu of fixing this (or identifying what's up), it seems like best practice is "don't iterate over streamers prematurely."
>>> import collections
>>> print(collections.Counter(samples3))
Counter({'5': 83404, '4': 83404, '6': 83403, 'z': 83343, 'y': 83343, 'x': 83343,
'a': 83305, 'b': 83305, 'c': 83304, '1': 83282, '3': 83282, '2': 83282})
>>> print(samples[:10])
['1', '4', 'x', 'a', '2', 'y', 'z', '5', '3', '6'] There are within-bag correlations, but note that this is because there's no shuffling in the iterables used to initialize the underlying streamers.
|
This should only happen if you have As for your second post, it's gonna take me a while to digest. |
I definitely have seen problems like this in the past month or so. 👍 to identifying the problem with tests and then fixing the behavior. Ditto what @bmcfee said on digesting. Gonna have to review that a couple times. |
continuing on with my inquiry into the behavior of mux. I've added a test in a PR, but I'll reproduce the guts of it here for ease of reference: def _choice(vals):
while True:
yield random.choice(vals)
ab = pescador.Streamer(_choice, 'ab')
cd = pescador.Streamer(_choice, 'cd')
ef = pescador.Streamer(_choice, 'ef')
mux1 = pescador.Mux([ab, cd, ef], k=1, rate=2,
with_replacement=False, revive=True)
gh = pescador.Streamer(_choice, 'gh')
ij = pescador.Streamer(_choice, 'ij')
kl = pescador.Streamer(_choice, 'kl')
mux2 = pescador.Mux([gh, ij, kl], k=1, rate=2,
with_replacement=False, revive=True)
mux3 = pescador.Mux([mux1, mux2], k=2, rate=None,
with_replacement=False, revive=True)
samples = list(mux3.iterate(max_iter=10000))
count = collections.Counter(samples)
max_count, min_count = max(count.values()), min(count.values())
assert (max_count - min_count) / max_count < 0.2
print(count)
assert set('abcdefghijkl') == set(count.keys()) Interestingly, the counter after 10k iterations looks like this: Counter({'c': 880, 'g': 875, 'j': 874, 'h': 854, 'e': 842, 'l': 837,
'i': 835, 'k': 814, 'f': 809, 'b': 804, 'd': 791, 'a': 785}) which has a max discrepancy ( I'm feeling more confident about this, but curious what folks think? |
👍 A while back (a month or two), I saw some weird behavior wrt this, but then I lost it, and I haven't run into it since. I made some experimental tests much like this at that time, and saw basically the same result, but I generally speaking would be pro having tests like this in the codebase. (Possibly with random seeds for reproducibility). Not sure if I'm helping with your question. |
so.... is the consensus then that we're fine here? I don't quite have the bandwidth to think about all the details here, but it seems good to me. EDIT: for comparison purposes, it would be handy to have a baseline that uses a flat mux, all seeds active, no replacement, and no rate parameter, so that it's really just sampling uniformly from the entire pool. That way we can check the discrepancy in the convergence to uniform between the hierarchical mux and the flat mux. |
I would say we're mostly there as of #88, but missing a the test you describe (which I don't fully grok?) and one that runs multiple streams run to exhaustion, verifying that all samples recovered. |
Oh, I wasn't proposing a test, just a simplified comparison like the one you described above but with fewer moving parts. |
"Definition of done" includes state-based testing, and inspecting internal mux states |
ya, i think we've gotten as far as we can without #94. |
Ok, so should we close this out? Punt it up to 2.0? |
Made the executive decision to punt the issue upward. |
Perhaps we could create another repo in the |
Since @ejhumphrey has chimed in on #94, and clarified that the desired functionality is really about modifying the weights in-flight on Revisiting why this issue is still open, I think the paper in progress #32 and the tests merged with #100 sufficiently address the points. The tests could always be expanded -- i don't think we need/want a separate repo for that -- but i don't think that should block this issue. Any objection to closing? |
No objection |
tl;dr: I think our
Mux
tests need to be more rigorous to make sure we're getting back the statistics one would expect from the parameters used to initialize the object.I was looking into #79, and decided to test that this issue was resolved in the midst of a broader refactor. While things mostly worked, I observed some behavior I didn't immediately understand. Basically, if I ended up with a bad RNG state, I'd end up with two duplicate streams of one sub-mux, and nothing from the other.
Okay, user error -- increasing the oversampling (
k=2 -> 10, max_iter=10->100
) "fixes" it, but this is a dumb hack and so I tried to do the correct thing, i.e. follow the docstring:However, this also didn't give me the behavior I wanted (at least, not always). Checking the tests, it seems that
test_mux_revive
doesn't inspect the stream it gets back, and I'm suspicious that Mux might not always be doing the thing we expect.I'm going to explore with more testing, but in parallel -- is this perhaps surprising or consistent with others' observations? @bmcfee, @cjacoby?
The text was updated successfully, but these errors were encountered: