Tests for Mux statistics #100

ejhumphrey · 2017-07-06T09:34:22Z

Adds a test that stacked bootstrapped streams are approximately uniform at large N.

Note: looking over Mux tests, we've accumulated pretty decent coverage the the sampling statistics of the Mux are behaving as advertised. I think some of the sanding and polish that's been thrown at pescador in the last several weeks has really ironed out some of the subtle kinks that were empirically felt / observed. This PR is more symbolic as a seal of approval that we're confident in our work.

…orm at large N

ejhumphrey · 2017-07-06T09:34:39Z

addresses #81

bmcfee · 2017-07-06T14:22:14Z

tests/test_mux.py

+    samples2 = list(flat_mux.iterate(max_iter=max_iter))
+    count1 = collections.Counter(samples1)
+    count2 = collections.Counter(samples2)
+    print(count1, count2)


remove this print

Or, make it a logger.debug()/logger.info(). But definitely not print

bmcfee · 2017-07-06T14:24:44Z

tests/test_mux.py

+    ef = pescador.Streamer(_choice, 'ef')
+    mux1 = pescador.Mux([ab, cd, ef], k=2, rate=2,
+                        with_replacement=False, revive=True)
+


these muxes should be seeded to avoid random test failure.

bmcfee · 2017-07-06T14:33:54Z

Looking good so far, thanks for this! I think we can make it more rigorous without much effort though (see review comments).

cjacoby

Just some non-binding nice-to-haves. Otherwise LGTM

cjacoby · 2017-07-06T17:33:15Z

tests/test_mux.py

+    ef = pescador.Streamer(_choice, 'ef')
+    mux1 = pescador.Mux([ab, cd, ef], k=2, rate=2,
+                        with_replacement=False, revive=True)
+


cjacoby · 2017-07-06T17:34:35Z

tests/test_mux.py

@@ -310,3 +312,35 @@ def test_mux_inf_loop():
                       with_replacement=False, random_state=1234)

    assert len(list(mux(max_iter=100))) == 0
+
+
+def test_mux_stacked_uniform_convergence():


A 1-sentence summary docstring here might be nice. I like these for more complex tests (like this one).

cjacoby · 2017-07-06T17:35:35Z

tests/test_mux.py

+    samples2 = list(flat_mux.iterate(max_iter=max_iter))
+    count1 = collections.Counter(samples1)
+    count2 = collections.Counter(samples2)
+    print(count1, count2)


Or, make it a logger.debug()/logger.info(). But definitely not print

ejhumphrey · 2017-07-07T13:51:56Z

i can haz final signoff? plznthx

bmcfee

i can haz final signoff? plznthx

It looks like te

i can haz final signoff? plznthx

I think you didn't address my questions about the statistical tests? https://github.com/pescadores/pescador/pull/100/files#diff-9afbc8da2d9eee8d9778ba62a79a821dR345

bmcfee · 2017-07-06T14:33:19Z

tests/test_mux.py

+    assert set('abcdefghijkl') == set(count1.keys()) == set(count2.keys())
+    c1, c2 = [list(c.values()) for c in (count1, count2)]
+    np.testing.assert_almost_equal(
+        np.std(c1) / max_iter, np.std(c2) / max_iter, decimal=2)


This test doesn't seem quite right to me, for a couple of reasons:

Why compare the standard deviation of all value counts?

Line 344 discards the keys and only keeps the values, which makes me nervous since Counter doesn't ensure sorting across python versions.

Why divide by max_iter?

Why compare stacked to flat? I think we can do better by checking the distribution of each against a reference distribution, which I assume would be uniform. This would probably mean splitting this test into two, computing some statistic (eg chi^2) against uniform for each.

the two distributions should have similar statistics, i.e. approx equal means and variances.

order of keys shouldn't matter; the N characters should occur with similar frequency, and I'm just checking the distributions.

normalizing out max_iter ... otherwise it grows with count, no?

I happened to fix (or at least change) this since you've commented. Now it's a single uniformly sampled stream. Lemme know if you still take issue with it.

In reverse order:

Yeah, I think I still take issue (sorry! 😬) The problem I have is that you're comparing two potentially unknown quantities (flat and nested muxes) to each-other, rather than comparing each one to a known reference with the desired properties.

std(X)/N just seems like a weird quantity for count data. I think a better test would be to normalize the count distribution first, so you have a vector of observation frequencies, and then compare it to the uniform distribution, eg by chi^2 or KL-divergence or whatever. Aggregating all counts before comparison seems wrong to me.

See above: test should be coordinate-wise, not in aggregate.

mean and std are fine for gaussian data, but not generally sufficient for arbitrary distributions

kay, will fix

ejhumphrey · 2017-07-07T14:20:29Z

looks like we hit a race condition on edits versus comments...

bmcfee · 2017-07-07T17:42:09Z

tests/test_mux.py

+
+    counts = np.array(counter.values())
+    exp_count = float(max_iter / len(chars))
+    max_error = np.max(np.abs(counts - exp_count) / exp_count)


I think this would be cleaner using scipy.stats:

counts = np.array(counter.values()) assert scipy.stats.chisquare(counts).pvalue >= 0.99

(default for chisquare is to compare against a uniform distribution)

ejhumphrey · 2017-07-07T18:18:07Z

thx for the protip, pub time in the UK now though...

…

On Fri, Jul 7, 2017 at 6:42 PM Brian McFee ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In tests/test_mux.py <#100 (comment)>: > max_iter = 50000 - samples1 = list(stacked_mux.iterate(max_iter=max_iter)) - samples2 = list(flat_mux.iterate(max_iter=max_iter)) - count1 = collections.Counter(samples1) - count2 = collections.Counter(samples2) + chars = 'abcdefghijkl' + samples = list(stacked_mux.iterate(max_iter=max_iter)) + counter = collections.Counter(samples) + assert set(chars) == set(counter.keys()) + + counts = np.array(counter.values()) + exp_count = float(max_iter / len(chars)) + max_error = np.max(np.abs(counts - exp_count) / exp_count) I think this would be cleaner using scipy.stats: counts = np.array(counter.values())assert scipy.stats.chisquare(counts).pvalue >= 0.99 (default for chisquare is to compare against a uniform distribution) — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#100 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA4iq_BHGRwCFXe5Ju66FdzzJz2_Gquoks5sLm3xgaJpZM4OPYyK> .

bmcfee · 2017-08-18T20:10:55Z

I took the liberty of fixing this build error; should be good to merge? I'm not gonna be a stickler for the statistical tests.

bmcfee · 2017-08-21T15:25:57Z

I noticed some stochastic failures on this one, since the _choice helper function was not seeded. I fixed the seed, but then noticed that it was failing for the 0.5% error test, so I went back and replaced it with a chi^2 test at p-value >= 0.95.

I think this is all now in order, and ready to

ejhumphrey · 2017-08-21T15:27:57Z

👍 lgtm

Added a test that stacked bootstrapped streams are approximately unif…

bca84b5

…orm at large N

ejhumphrey requested review from cjacoby and bmcfee July 6, 2017 09:34

bmcfee reviewed Jul 6, 2017

View reviewed changes

bmcfee added this to the 1.1.0 milestone Jul 6, 2017

cjacoby reviewed Jul 6, 2017

View reviewed changes

bmcfee added the testing label Jul 6, 2017

bmcfee assigned ejhumphrey Jul 6, 2017

Fixed comments from PR

7cbddcd

bmcfee reviewed Jul 7, 2017

View reviewed changes

Updated convergence test.

1f8fea9

bmcfee reviewed Jul 7, 2017

View reviewed changes

fixed a type error in this one

696592e

fixed seed for stacked mux test, rewrote to use chisquare

abb4bbb

bmcfee merged commit df01c72 into master Aug 21, 2017

bmcfee mentioned this pull request Aug 21, 2017

Guaranteeing Mux statistics / behavior #81

Closed

bmcfee deleted the ejh_20170706_mux_stats branch January 18, 2018 21:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests for Mux statistics #100

Tests for Mux statistics #100

ejhumphrey commented Jul 6, 2017

ejhumphrey commented Jul 6, 2017

bmcfee Jul 6, 2017

cjacoby Jul 6, 2017

ejhumphrey Jul 7, 2017

bmcfee Jul 6, 2017

cjacoby Jul 6, 2017

ejhumphrey Jul 7, 2017

bmcfee commented Jul 6, 2017

cjacoby left a comment

cjacoby Jul 6, 2017

cjacoby Jul 6, 2017

ejhumphrey Jul 7, 2017

cjacoby Jul 6, 2017

ejhumphrey commented Jul 7, 2017

bmcfee left a comment

bmcfee Jul 6, 2017

ejhumphrey Jul 7, 2017

bmcfee Jul 7, 2017

ejhumphrey Jul 7, 2017

ejhumphrey commented Jul 7, 2017

bmcfee Jul 7, 2017

ejhumphrey commented Jul 7, 2017 via email

bmcfee commented Aug 18, 2017

bmcfee commented Aug 21, 2017

ejhumphrey commented Aug 21, 2017

Tests for Mux statistics #100

Tests for Mux statistics #100

Conversation

ejhumphrey commented Jul 6, 2017

ejhumphrey commented Jul 6, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bmcfee commented Jul 6, 2017

cjacoby left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ejhumphrey commented Jul 7, 2017

bmcfee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ejhumphrey commented Jul 7, 2017

Choose a reason for hiding this comment

ejhumphrey commented Jul 7, 2017 via email

bmcfee commented Aug 18, 2017

bmcfee commented Aug 21, 2017

ejhumphrey commented Aug 21, 2017