-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tests for Mux statistics #100
Conversation
addresses #81 |
tests/test_mux.py
Outdated
samples2 = list(flat_mux.iterate(max_iter=max_iter)) | ||
count1 = collections.Counter(samples1) | ||
count2 = collections.Counter(samples2) | ||
print(count1, count2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this print
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, make it a logger.debug()
/logger.info()
. But definitely not print
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gone
tests/test_mux.py
Outdated
ef = pescador.Streamer(_choice, 'ef') | ||
mux1 = pescador.Mux([ab, cd, ef], k=2, rate=2, | ||
with_replacement=False, revive=True) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these muxes should be seeded to avoid random test failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Looking good so far, thanks for this! I think we can make it more rigorous without much effort though (see review comments). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some non-binding nice-to-haves. Otherwise LGTM
tests/test_mux.py
Outdated
ef = pescador.Streamer(_choice, 'ef') | ||
mux1 = pescador.Mux([ab, cd, ef], k=2, rate=2, | ||
with_replacement=False, revive=True) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -310,3 +312,35 @@ def test_mux_inf_loop(): | |||
with_replacement=False, random_state=1234) | |||
|
|||
assert len(list(mux(max_iter=100))) == 0 | |||
|
|||
|
|||
def test_mux_stacked_uniform_convergence(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A 1-sentence summary docstring here might be nice. I like these for more complex tests (like this one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
tests/test_mux.py
Outdated
samples2 = list(flat_mux.iterate(max_iter=max_iter)) | ||
count1 = collections.Counter(samples1) | ||
count2 = collections.Counter(samples2) | ||
print(count1, count2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, make it a logger.debug()
/logger.info()
. But definitely not print
i can haz final signoff? plznthx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can haz final signoff? plznthx
It looks like te
i can haz final signoff? plznthx
I think you didn't address my questions about the statistical tests? https://github.com/pescadores/pescador/pull/100/files#diff-9afbc8da2d9eee8d9778ba62a79a821dR345
tests/test_mux.py
Outdated
assert set('abcdefghijkl') == set(count1.keys()) == set(count2.keys()) | ||
c1, c2 = [list(c.values()) for c in (count1, count2)] | ||
np.testing.assert_almost_equal( | ||
np.std(c1) / max_iter, np.std(c2) / max_iter, decimal=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test doesn't seem quite right to me, for a couple of reasons:
- Why compare the standard deviation of all value counts?
- Line 344 discards the keys and only keeps the values, which makes me nervous since
Counter
doesn't ensure sorting across python versions. - Why divide by
max_iter
? - Why compare stacked to flat? I think we can do better by checking the distribution of each against a reference distribution, which I assume would be uniform. This would probably mean splitting this test into two, computing some statistic (eg chi^2) against uniform for each.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- the two distributions should have similar statistics, i.e. approx equal means and variances.
- order of keys shouldn't matter; the
N
characters should occur with similar frequency, and I'm just checking the distributions. - normalizing out
max_iter
... otherwise it grows with count, no? - I happened to fix (or at least change) this since you've commented. Now it's a single uniformly sampled stream. Lemme know if you still take issue with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In reverse order:
-
Yeah, I think I still take issue (sorry! 😬) The problem I have is that you're comparing two potentially unknown quantities (flat and nested muxes) to each-other, rather than comparing each one to a known reference with the desired properties.
-
std(X)/N
just seems like a weird quantity for count data. I think a better test would be to normalize the count distribution first, so you have a vector of observation frequencies, and then compare it to the uniform distribution, eg by chi^2 or KL-divergence or whatever. Aggregating all counts before comparison seems wrong to me. -
See above: test should be coordinate-wise, not in aggregate.
-
mean and std are fine for gaussian data, but not generally sufficient for arbitrary distributions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kay, will fix
looks like we hit a race condition on edits versus comments... |
tests/test_mux.py
Outdated
|
||
counts = np.array(counter.values()) | ||
exp_count = float(max_iter / len(chars)) | ||
max_error = np.max(np.abs(counts - exp_count) / exp_count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be cleaner using scipy.stats
:
counts = np.array(counter.values())
assert scipy.stats.chisquare(counts).pvalue >= 0.99
(default for chisquare
is to compare against a uniform distribution)
thx for the protip, pub time in the UK now though...
…On Fri, Jul 7, 2017 at 6:42 PM Brian McFee ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In tests/test_mux.py
<#100 (comment)>:
>
max_iter = 50000
- samples1 = list(stacked_mux.iterate(max_iter=max_iter))
- samples2 = list(flat_mux.iterate(max_iter=max_iter))
- count1 = collections.Counter(samples1)
- count2 = collections.Counter(samples2)
+ chars = 'abcdefghijkl'
+ samples = list(stacked_mux.iterate(max_iter=max_iter))
+ counter = collections.Counter(samples)
+ assert set(chars) == set(counter.keys())
+
+ counts = np.array(counter.values())
+ exp_count = float(max_iter / len(chars))
+ max_error = np.max(np.abs(counts - exp_count) / exp_count)
I think this would be cleaner using scipy.stats:
counts = np.array(counter.values())assert scipy.stats.chisquare(counts).pvalue >= 0.99
(default for chisquare is to compare against a uniform distribution)
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#100 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA4iq_BHGRwCFXe5Ju66FdzzJz2_Gquoks5sLm3xgaJpZM4OPYyK>
.
|
I took the liberty of fixing this build error; should be good to merge? I'm not gonna be a stickler for the statistical tests. |
I noticed some stochastic failures on this one, since the I think this is all now in order, and ready to |
👍 lgtm |
Adds a test that stacked bootstrapped streams are approximately uniform at large N.
Note: looking over Mux tests, we've accumulated pretty decent coverage the the sampling statistics of the Mux are behaving as advertised. I think some of the sanding and polish that's been thrown at pescador in the last several weeks has really ironed out some of the subtle kinks that were empirically felt / observed. This PR is more symbolic as a seal of approval that we're confident in our work.