Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests for Mux statistics #100

Merged
merged 5 commits into from
Aug 21, 2017
Merged

Tests for Mux statistics #100

merged 5 commits into from
Aug 21, 2017

Conversation

ejhumphrey
Copy link
Collaborator

Adds a test that stacked bootstrapped streams are approximately uniform at large N.

Note: looking over Mux tests, we've accumulated pretty decent coverage the the sampling statistics of the Mux are behaving as advertised. I think some of the sanding and polish that's been thrown at pescador in the last several weeks has really ironed out some of the subtle kinks that were empirically felt / observed. This PR is more symbolic as a seal of approval that we're confident in our work.

@ejhumphrey ejhumphrey requested review from cjacoby and bmcfee July 6, 2017 09:34
@ejhumphrey
Copy link
Collaborator Author

addresses #81

samples2 = list(flat_mux.iterate(max_iter=max_iter))
count1 = collections.Counter(samples1)
count2 = collections.Counter(samples2)
print(count1, count2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this print

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, make it a logger.debug()/logger.info(). But definitely not print

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gone

ef = pescador.Streamer(_choice, 'ef')
mux1 = pescador.Mux([ab, cd, ef], k=2, rate=2,
with_replacement=False, revive=True)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these muxes should be seeded to avoid random test failure.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@bmcfee
Copy link
Collaborator

bmcfee commented Jul 6, 2017

Looking good so far, thanks for this! I think we can make it more rigorous without much effort though (see review comments).

@bmcfee bmcfee added this to the 1.1.0 milestone Jul 6, 2017
Copy link
Collaborator

@cjacoby cjacoby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some non-binding nice-to-haves. Otherwise LGTM

ef = pescador.Streamer(_choice, 'ef')
mux1 = pescador.Mux([ab, cd, ef], k=2, rate=2,
with_replacement=False, revive=True)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -310,3 +312,35 @@ def test_mux_inf_loop():
with_replacement=False, random_state=1234)

assert len(list(mux(max_iter=100))) == 0


def test_mux_stacked_uniform_convergence():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A 1-sentence summary docstring here might be nice. I like these for more complex tests (like this one).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

samples2 = list(flat_mux.iterate(max_iter=max_iter))
count1 = collections.Counter(samples1)
count2 = collections.Counter(samples2)
print(count1, count2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, make it a logger.debug()/logger.info(). But definitely not print

@ejhumphrey
Copy link
Collaborator Author

i can haz final signoff? plznthx

Copy link
Collaborator

@bmcfee bmcfee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can haz final signoff? plznthx

It looks like te

i can haz final signoff? plznthx

I think you didn't address my questions about the statistical tests? https://github.com/pescadores/pescador/pull/100/files#diff-9afbc8da2d9eee8d9778ba62a79a821dR345

assert set('abcdefghijkl') == set(count1.keys()) == set(count2.keys())
c1, c2 = [list(c.values()) for c in (count1, count2)]
np.testing.assert_almost_equal(
np.std(c1) / max_iter, np.std(c2) / max_iter, decimal=2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't seem quite right to me, for a couple of reasons:

  1. Why compare the standard deviation of all value counts?
  2. Line 344 discards the keys and only keeps the values, which makes me nervous since Counter doesn't ensure sorting across python versions.
  3. Why divide by max_iter?
  4. Why compare stacked to flat? I think we can do better by checking the distribution of each against a reference distribution, which I assume would be uniform. This would probably mean splitting this test into two, computing some statistic (eg chi^2) against uniform for each.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. the two distributions should have similar statistics, i.e. approx equal means and variances.
  2. order of keys shouldn't matter; the N characters should occur with similar frequency, and I'm just checking the distributions.
  3. normalizing out max_iter ... otherwise it grows with count, no?
  4. I happened to fix (or at least change) this since you've commented. Now it's a single uniformly sampled stream. Lemme know if you still take issue with it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In reverse order:

  1. Yeah, I think I still take issue (sorry! 😬) The problem I have is that you're comparing two potentially unknown quantities (flat and nested muxes) to each-other, rather than comparing each one to a known reference with the desired properties.

  2. std(X)/N just seems like a weird quantity for count data. I think a better test would be to normalize the count distribution first, so you have a vector of observation frequencies, and then compare it to the uniform distribution, eg by chi^2 or KL-divergence or whatever. Aggregating all counts before comparison seems wrong to me.

  3. See above: test should be coordinate-wise, not in aggregate.

  4. mean and std are fine for gaussian data, but not generally sufficient for arbitrary distributions

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kay, will fix

@ejhumphrey
Copy link
Collaborator Author

looks like we hit a race condition on edits versus comments...


counts = np.array(counter.values())
exp_count = float(max_iter / len(chars))
max_error = np.max(np.abs(counts - exp_count) / exp_count)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be cleaner using scipy.stats:

counts = np.array(counter.values())
assert scipy.stats.chisquare(counts).pvalue >= 0.99

(default for chisquare is to compare against a uniform distribution)

@ejhumphrey
Copy link
Collaborator Author

ejhumphrey commented Jul 7, 2017 via email

@bmcfee
Copy link
Collaborator

bmcfee commented Aug 18, 2017

I took the liberty of fixing this build error; should be good to merge? I'm not gonna be a stickler for the statistical tests.

@bmcfee
Copy link
Collaborator

bmcfee commented Aug 21, 2017

I noticed some stochastic failures on this one, since the _choice helper function was not seeded. I fixed the seed, but then noticed that it was failing for the 0.5% error test, so I went back and replaced it with a chi^2 test at p-value >= 0.95.

I think this is all now in order, and ready to :shipit:

@ejhumphrey
Copy link
Collaborator Author

👍 lgtm

@bmcfee bmcfee merged commit df01c72 into master Aug 21, 2017
@bmcfee bmcfee deleted the ejh_20170706_mux_stats branch January 18, 2018 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants