-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternate distributions for StochasticMux #148
Comments
Quick summary of the analysis: say we have uniform weights over
These expectations match exactly the case where the number of samples is deterministic: In exhaustive mode, we want to know how long until we hit all the streamers. We can think of this as each streamer slot being replaced Finally, we'd want to know what the chance of a streamer not being replaced is. This can be calculated numerically using the negative binomial survival function (which is ugly AF), but we could also be lazy and use some concentration inequalities. If we think of |
Cool! So is pescador going to be renamed to pascalador then?* *(https://en.wikipedia.org/wiki/Binomial_distribution#History) |
Negative. |
👍 💯 |
Quick update: I implemented this and compared poisson/binomial/constant using the notebook linked from #104. This basically computes the rate at which the muxed distribution approaches the full distribution. The idealized case with TLDR there's not much difference across the different schemes, but the scalloping #132 effects are obscured by error regions in these plots. They'll be more pronounced for binomial and constant; less for poisson. The randomized offset idea mentioned at the end of #132 could be a nice solution to that though, and is not tested here. Results: |
Would anyone object to me implementing this and pushing out a 2.2 release, rather than waiting for 3.x? |
No objection; available for code review.
…-C
On Wed, Jan 8, 2020 at 9:46 AM Brian McFee ***@***.***> wrote:
Would anyone object to me implementing this and pushing out a 2.2 release,
rather than waiting for 3.x?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#148?email_source=notifications&email_token=AAHC34ZM3DUYS25J4XBNR6LQ4YGQDA5CNFSM4IEBIMDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEINMOUQ#issuecomment-572180306>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHC346MWIOQSSCC5H7TNETQ4YGQDANCNFSM4IEBIMDA>
.
|
One irritating snag here: the binomial distribution explodes on us if For Does it make sense to fall back on poisson for this case? Or should we simply require |
I'm fine with falling back to poisson for n_active == 1, as long as it is
clearly documented in the docstring.
My gut is that restricting n_active >= 2 is a worse use experience than
falling back to poisson, but I'm not confident in that feeling right now. I
suppose we could also raise a warning to make it clear to the end user that
that had happened.
…On Wed, Jan 8, 2020 at 11:11 AM Brian McFee ***@***.***> wrote:
One irritating snag here: the binomial distribution explodes on us if
n_active=1, because the rate parameter gets scaled up by n_active /
(n_active - 1). This is a weird edge case that we never had to worry
about before, since the poisson model didn't mix in the active set size.
For n_active > 1, the binomial model has the invariant that the expected
number of samples per streamer is always rate, and the variance scales
like rate / n_active. So large active sets mean low variance in the
number of samples per streamer; small active sets should have high variance
in the number of samples per streamer. When n_active=1, maintaining this
would require a random variable with mean = variance = rate, eg a poisson
variable.
Does it make sense to fall back on poisson for this case? Or should we
simply require n_active >= 2 for dist='binomial'? The latter side-steps
the issue, but makes the API somewhat inelegant.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#148?email_source=notifications&email_token=AAHC345HMIO3KLUDLOOPHFTQ4YQNNA5CNFSM4IEBIMDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEINUQIA#issuecomment-572213280>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHC343ZWL67YWZVPLKDAEDQ4YQNNANCNFSM4IEBIMDA>
.
|
I agree with that. I went and redid the analysis allowing for non-uniform streamer weights, and it came out a bit more nuanced. The end story is the same though, convergence to poisson makes sense when we only have one active streamer and still want a randomized refresh. In implementing this, I ran into some snags with the unit tests. In the poisson rate model, we hacked a bias of +1 into the sampler to avoid having streamers that start and stop without generating data. This is mostly a desirable property in practice, but it messes with the analysis pretty badly. I'd like to avoid that in the binomial model. However, this means that the binomial model does need to handle the |
quick update: the problem had to do with pruning empty streams. |
To clarify my earlier comment: the issue is that if we set What do you (all) think about removing the stream pruning behavior? Or at least, changing the default behavior to not prune. I think it's ultimately inconsistent with our design principles that streams should be restartable (that being a distinguishing point compared to iterables). |
I am pro changing the default behavior in this way.
I think it would be nice to get some more input from other users on how
they typically use Pescador. *I *at least hardly used the pruning behavior,
so to me this is fine, but maybe there are people that do?
…On Tue, Jan 14, 2020 at 4:14 PM Brian McFee ***@***.***> wrote:
To clarify my earlier comment: the issue is that if we set n_samples_per_stream
= 0 (randomly selected from binomial or poisson), then a stream can be
activated, never produce samples, and then be deactivated and removed from
the stream, even though it *could* produce new data.
What do you (all) think about removing the stream pruning behavior? Or at
least, changing the default behavior to not prune. I think it's ultimately
inconsistent with our design principles that streams should be restartable
(that being a distinguishing point compared to iterables).
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#148?email_source=notifications&email_token=AAHC344RHSZEC6WCLSM3KFLQ5ZIQDA5CNFSM4IEBIMDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEI6TAVI#issuecomment-574435413>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHC3475IOBNDPYD2XK7PBTQ5ZIQDANCNFSM4IEBIMDA>
.
|
Since I'm finally getting around to doing some cleanup here for a 3.0 release, I'll quickly summarize how I think this should all go:
|
I've been doing a bit of analysis of pescador's sampling behavior, and the Poisson distribution is a bit unwieldy. However, for a specific choice of parameters, drawing
n_samples_to_stream
from a binomial distribution leads to a clean solution in terms of total replacement rate of streamers.More generally, I think it would be useful to extend
StochasticMux
to have adist=
parameter that works as follows:constant
: each streamer gets exactly (at most)rate
samples.binomial
(proposed new default): each streamer gets a binomial draw fromBinomial(rate, 1-1/n_active)
poisson
: current behavior,n_samples ~ 1 + Poisson(rate)
The text was updated successfully, but these errors were encountered: