Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revised distributions for stochasticmux #171

Merged
merged 10 commits into from
Mar 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -103,11 +103,10 @@ jobs:
run: pytest

- name: Upload coverage to Codecov
uses: codecov/codecov-action@v3
uses: codecov/codecov-action@v4
with:
token: ${{ secrets.CODECOV_TOKEN }}
files: ./coverage.xml
directory: ./coverage/reports/
flags: unittests
env_vars: OS,PYTHON
name: codecov-umbrella
Expand Down
8 changes: 8 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,14 @@ Advanced examples

auto_examples/index

***********************
Stochastic mux analysis
***********************
.. toctree::
:maxdepth: 2

muxanalysis

*************
API Reference
*************
Expand Down
262 changes: 262 additions & 0 deletions docs/muxanalysis.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,262 @@
.. _muxanalysis:

Analysis of Stochastic Mux
==========================

:ref:`mux` objects (*mux* for short, *muxen* for plural) allow multiple :ref:`Streamer` objects to
be combined into a single stream by selectively sampling from each constituent stream.
The different kinds of mux objects provide different behaviors, but among them, and among
them, ``StochasticMux`` is the most complex.
This section provides an in-depth analysis of ``StochasticMux``'s behavior.



Stream activation and replacement
---------------------------------

``StochasticMux`` differs from other muxen (``ShuffledMux``, ``RoundRobinMux``, etc.) by
maintaining an **active set** of streamers from the full collection it is multiplexing.
At any given time, samples are drawn only from the active set, while the remaining streamers are
**inactive**.
Each active streamer is limited to produce a (possibly random) number of samples, after which, it is removed from
the active set and replaced by a new streamer selected at random; hence the name **StochasticMux**.

A key quantity to understand when using ``StochasticMux`` is the streamer replacement rate: how
often should we expect streamers to be replaced from the active set, as a function of samples
generated by the mux?
This quantity is important for a couple of reasons:

* If we care about the distribution of samples produced by ``StochasticMux`` being a good
approximation of what you would get if all streamers were active simultaneously (i.e.,
``ShuffledMux`` behavior), then the streamer replacement rate should be small.
* If we have large startup costs involved with activating a streamer (e.g., loading data
from disk), then streamer replacement should be infrequent to ensure high throughput.
What's more, replacement events should be spread out among the active set, to avoid having several replacement events in a short period of time.

In the following sections, we'll analyze replacement rates for the different choices of rate
distributions (`constant`, `poisson`, and `binomial`).
We'll focus the analysis on a single (active) streamer at a time.
The question we'll analyze is specifically: how many samples :math:`N` must we generate (in
expectation) before a specific streamer is deactivated and replaced?
Understanding the distribution of `N` (its mean and variance) will help us understand how often
we should expect to see streamer replacement events.


Notation
--------

Let :math:`A` denote the size of the active set, let :math:`r` denote the number of samples
generated by a particular streamer, and let :math:`p` denote the probability of selecting the
active streamer in question.
We'll make the simplifying assumption that the ``weights`` attached to all streamers are
uniform, i.e., :math:`p = 1/A`.


Constant distribution
---------------------

When using the ``constant`` distribution, the sample limit :math:`r` is fixed in advance.
Our question about the number of samples generated by StochasticMux can then be rephrased
slightly:
how many samples :math:`K` must we draw from *all other active streamers* before drawing the
:math:`r`\ th sample from the streamer under analysis?

This number :math:`K` is a random variable, modeled by the `negative binomial distribution <https://en.wikipedia.org/wiki/Negative_binomial_distribution>`_:

.. math::

\text{Pr}[K = k] = {k + r - 1 \choose k} {(1-p)^k p^r}


It has expected value

.. math::

\text{E}[K] = r \times \frac{1-p}{p},

and variance

.. math::

\text{Var}[K] = r \times \frac{1-p}{p^2}.


The total number of samples produced by the mux before the streamer is replaced is now a random
variable :math:`N = K + r`.
We can use linearity of expectation to compute its expected value as

.. math::

\text{E}[N] = \text{E}[K] + r = r \times\frac{1-p}{p} + r = \frac{r}{p}.


Since :math:`N` and :math:`K` differ only by a constant (:math:`r`), they have the same
variance:

.. math::

\text{Var}[N] = \text{Var}[K].


If we apply the simplifying assumption that streamers are selected uniformly at random (:math:`p
= 1/A`), then we get the following:

* :math:`\text{E}[N] = r \times A`, and
* :math:`\text{Var}[N] = r \times A \times (A-1)`.

In plain language, this says that the streamer replacement rate scales like the product of the size of the active set and the number of samples per streamer.
Making either of these values large implies that we should expect to wait longer to replace an active streamer.
However, the variance of replacement event times is approximately **quadratic** in the size of the active set.
This means that making the active set larger will increase the dispersion of replacement events away from the expected value.


Poisson distribution
--------------------

In pescador version 2 and earlier, the sample limit :math:`r` was not a constant value, but a
random variable :math:`R` drawn from a Poisson distribution with rate parameter :math:`\lambda`.
In this case, the mean and variance of :math:`R` are simply :math:`\text{E}[R] =
\text{Var}[R] = \lambda`.

However, this does not lead to a closed form expression for :math:`\text{E}[N]` or :math:`\text{Var}[N]` because we must now marginalize over :math:`R`:

.. math::

\text{Pr}[N=n] &= \sum_{r=0}^{\infty} \text{Pr}[K=n-r, R = r]\\
&= \sum_{r=0}^{\infty} \text{Pr}[K=n-r ~|~ R = r] \times \text{Pr}[R=r]\\
&= \sum_{r=0}^{\infty} {n - 1 \choose n-r} {(1-p)^{n-r} p^r} \times \frac{\lambda^r e^{-\lambda}}{r!}


While this distribution is still supported, it has been replaced as the default by a binomial
distribution mode which is more amenable to analysis.


.. note::
In pescador ≥ 3.0, poisson mode is actually implemented as :math:`R \sim 1 +
\text{Poisson}(\lambda - 1)`. This maintains the expected value of
:math:`\lambda`, with slightly reduced variance (:math:`\lambda - 1`),
but it ensures that at least one sample is produced from active streamers before deactivation.

This logic is also applied to the binomial mode described below, but omitted
from the analysis here for simplicity.

Binomial distribution
---------------------

In the binomial distribution mode, :math:`R` is a random variable governed by a binomial
distribution with parameters :math:`(m, q)`:

.. math::

\text{Pr}[R=r] = {m \choose r} q^r (1-q)^{m-r}.

(We will come back to determining values for :math:`(m, q)` later.)

This distribution can be integrated with the negative binomial distribution above to yield a
straightforward computation of :math:`\text{Pr}[N]`.

.. math::

\text{Pr}[N=n] &= \sum_{r=0}^{\infty} \text{Pr}[K=n-r ~|~ R= r] \times \text{Pr}[R=r]\\
&= \sum_{r=0}^{\infty} {n-1 \choose n-r} {\left(1-p\right)}^{n-r} p^r \times {m \choose r} q^r {(1-q)}^{m-r}.

If we set :math:`q = 1-p`, this simplifies as follows:

.. math::

\text{Pr}[N=n] &= \sum_{r=0}^{\infty} {n-1 \choose n-r} {\left(1-p\right)}^{n-r} p^r \times {m \choose r} {(1-p)}^r p^{m-r}\\
&= \sum_{r=0}^{\infty} {n-1 \choose n-r} {\left(1-p\right)}^n p^m {m \choose r}\\
&= {\left(1-p\right)}^n p^m {n + m - 1\choose n}.

This distribution again has the form of a negative binomial with parameters :math:`(m, 1-p)`.
If we further set

.. math::

m = \frac{\lambda}{1-p}

for an expected rate parameter :math:`\lambda > 0` (as in the Poisson case above), then the
distribution :math:`\text{Pr}[N=n]` is

.. math::

N \sim \text{NB}\left(\frac{\lambda}{1-p}, 1-p\right),

where NB denotes the probability mass function of the negative binomial distribution.
This yields:

- :math:`\text{E}[R] = \lambda`: each streamer generates :math:`\lambda` samples
on average,
- :math:`\text{Var}[R] = \lambda \times p`,
- :math:`\text{E}[N] = \lambda / p`, and
- :math:`\text{Var}[N] = \lambda \frac{1-p}{p^2}`.

These match the analysis of the constant-mode case above, except that the number of samples per
streamer is now a random variable with expectation :math:`\lambda`.
Again, in the special case where :math:`p=1/A`, we recover

- :math:`\text{E}[N] = \lambda \times A`, and
- :math:`\text{Var}[N] = \lambda \times A \times (A-1)`.

In short, binomial mode ``StochasticMux`` exhibits the same stream replacement
characteristics as the constant-mode case, but relaxes the need for each streamer to
generate an identical number of samples.


Limiting case :math:`p=1`
-------------------------

As defined above, the binomial mode is ill-defined when :math:`p=1` due to a
division-by-zero in the parametrization.
This situation does occur in practice with some configurations of ``StochasticMux``,
e.g. when operating in **exhaustive** mode so that streamers are activated without
replacement and are discarded after deactivation.
In this case, the size of the active set :math:`A` can eventually decay, and the
probability of choosing the last active streamer :math:`p \rightarrow 1`.

To circumvent this issue, ``StochasticMux`` detects this situation automatically and
falls back on a Poisson distribution for :math:`R`.
This is justified by the `Poisson limit theorem <https://en.wikipedia.org/wiki/Poisson_limit_theorem>`_ if we take the product :math:`\lambda/(1-p) \times (1-p) = \lambda` as the limit value as :math:`p \rightarrow 1`.


Discussion
----------

The above analysis tells us, on average, how long we should expect to wait before a
given streamer is exhausted and replaced.
Because this distribution applies equally to all streamers, the variance of this
distribution tells us how dispersed these replacement events are likely to be.

Qualitatively, there are a few things we can observe from the above analysis.

First, for large active set sizes :math:`A`, binomial mode will behave similarly to constant mode because :math:`\text{Var}[R]` will be inversely proportional to :math:`A`.
For small active sets, binomial mode will behave more similarly to Poisson mode.

Second, Poisson mode will exhibit the highest variance of sample limit values
:math:`\text{Var}[R] = \lambda` upper-bounds that of the binomial mode
:math:`\text{Var}[R] = \lambda \times p`.
We can therefore expect that the replacement event distribution :math:`\text{Pr}[N]`
under Poisson mode will also exhibit slightly higher variance in general.

Third, the binomial mode provides a controlled interaction between the stream
replacement rate and the size of the active set, which is difficult to achieve with
Poisson mode.

Finally, we should emphasize that the analysis in this section represents a common,
if simplified application of ``StochasticMux``, and there are many other variables
at play that may alter the mux's behavior, including:

- whether streamers are activated with or without replacement,
- whether streamers are used exhaustively or replaced after deactivation,
- whether streamer weights are uniform or non-uniform,
- whether streamers self-limit instead of relying on the mux for deactivation.

The final point is subtle: remember that streamers encapsulate arbitrary generator
code, and there's nothing stopping a generator from determining its own maximum
number of samples to produce.
If this number is smaller than the value assigned by the mux, the streamer will act
as if it has been exhausted and the mux will replace it immediately.
This situation would both reduce the replacement time average and variance.
(If a streamer self-limits at a number larger than the mux's limit, the mux will
terminate it first and the analysis above still holds.)

Loading