-
Notifications
You must be signed in to change notification settings - Fork 12
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
203 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,195 @@ | ||
.. _muxanalysis: | ||
|
||
Analysis of Stochastic Mux | ||
========================== | ||
|
||
:ref:`mux` objects (*mux* for short, *muxen* for plural) allow multiple :ref:`Streamer` objects to | ||
be combined into a single stream by selectively sampling from each constituent stream. | ||
The different kinds of mux objects provide different behaviors, but among them, and among | ||
them, ``StochasticMux`` is the most complex. | ||
This section provides an in-depth analysis of ``StochasticMux``'s behavior. | ||
|
||
|
||
|
||
Stream activation and replacement | ||
--------------------------------- | ||
|
||
``StochasticMux`` differs from other muxen (``ShuffledMux``, ``RoundRobinMux``, etc.) by | ||
maintaining an **active set** of streamers from the full collection it is multiplexing. | ||
At any given time, samples are drawn only from the active set, while the remaining streamers are | ||
**inactive**. | ||
Each active streamer is limited to produce a (possibly random) number of samples, after which, it is removed from | ||
the active set and replaced by a new streamer selected at random; hence the name **StochasticMux**. | ||
|
||
A key quantity to understand when using ``StochasticMux`` is the streamer replacement rate: how | ||
often should we expect streamers to be replaced from the active set, as a function of samples | ||
generated by the mux? | ||
This quantity is important for a couple of reasons: | ||
|
||
* If we care about the distribution of samples produced by ``StochasticMux`` being a good | ||
approximation of what you would get if all streamers were active simultaneously (i.e., | ||
``ShuffledMux`` behavior), then the streamer replacement rate should be small. | ||
* If we have large startup costs involved with activating a streamer (e.g., loading data | ||
from disk), then streamer replacement should be infrequent to ensure high throughput. | ||
What's more, replacement events should be spread out among the active set, to avoid having several replacement events in a short period of time. | ||
|
||
In the following sections, we'll analyze replacement rates for the different choices of rate | ||
distributions (`constant`, `poisson`, and `binomial`). | ||
We'll focus the analysis on a single (active) streamer at a time. | ||
The question we'll analyze is specifically: how many samples :math:`N` must we generate (in | ||
expectation) before a specific streamer is deactivated and replaced? | ||
Understanding the distribution of `N` (its mean and variance) will help us understand how often | ||
we should expect to see streamer replacement events. | ||
|
||
|
||
Notation | ||
-------- | ||
|
||
Let :math:`A` denote the size of the active set, let :math:`r` denote the number of samples | ||
generated by a particular streamer, and let :math:`p` denote the probability of selecting the | ||
active streamer in question. | ||
We'll make the simplifying assumption that the ``weights`` attached to all streamers are | ||
uniform, i.e., :math:`p = 1/A`. | ||
|
||
|
||
Constant distribution | ||
--------------------- | ||
|
||
When using the ``constant`` distribution, the sample limit :math:`r` is fixed in advance. | ||
Our question about the number of samples generated by StochasticMux can then be rephrased | ||
slightly: | ||
how many samples :math:`K` must we draw from *all other active streamers* before drawing the | ||
:math:`r`\ th sample from the streamer under analysis? | ||
|
||
This number :math:`K` is a random variable, modeled by the `negative binomial distribution <https://en.wikipedia.org/wiki/Negative_binomial_distribution>`_: | ||
|
||
.. math:: | ||
\text{Pr}[K = k] = {k + r - 1 \choose k} {(1-p)^k p^r} | ||
It has expected value | ||
|
||
.. math:: | ||
\text{E}[K] = r \cdot \frac{1-p}{p}, | ||
and variance | ||
|
||
.. math:: | ||
\text{Var}[K] = r \cdot \frac{1-p}{p^2}. | ||
The total number of samples produced by the mux before the streamer is replaced is now a random | ||
variable :math:`N = K + r`. | ||
We can use linearity of expectation to compute its expected value as | ||
|
||
.. math:: | ||
\text{E}[N] = \text{E}[K] + r = r \cdot\frac{1-p}{p} + r = \frac{r}{p}. | ||
Since :math:`N` and :math:`K` differ only by a constant (:math:`r`), they have the same | ||
variance: | ||
|
||
.. math:: | ||
\text{Var}[N] = \text{Var}[K]. | ||
If we apply the simplifying assumption that streamers are selected uniformly at random (:math:`p | ||
= 1/A`), then we get the following: | ||
|
||
* :math:`\text{E}[N] = r \cdot A`, and | ||
* :math:`\text{Var}[N] = r \cdot A \cdot (A-1)`. | ||
|
||
In plain language, this says that the streamer replacement rate scales like the product of the size of the active set and the number of samples per streamer. | ||
Making either of these values large implies that we should expect to wait longer to replace an active streamer. | ||
However, the variance of replacement event times is approximately **quadratic** in the size of the active set. | ||
This means that making the active set larger will increase the dispersion of replacement events away from the expected value. | ||
|
||
|
||
Poisson distribution | ||
-------------------- | ||
|
||
In pescador version 2 and earlier, the sample limit :math:`r` was not a constant value, but a | ||
random variable :math:`R` drawn from a Poisson distribution with rate parameter :math:`\lambda`. | ||
The analysis above can mostly be carried over to handle this case, though it does not lead to a | ||
closed form expression for :math:`\text{E}[N]` or :math:`\text{Var}[N]` because we must now | ||
marginalize over the variable :math:`R`: | ||
|
||
.. math:: | ||
\text{Pr}[K=k] &= \sum_{r=0}^{\infty} \text{Pr}[K=k, R = r]\\ | ||
&= \sum_{r=0}^{\infty} \text{Pr}[K=k ~|~ R = r] \times \text{Pr}[R=r]\\ | ||
&= \sum_{r=0}^{\infty} {k + r - 1 \choose k} {(1-p)^k p^r} \times \frac{\lambda^r e^{-\lambda}}{r!} | ||
While this distribution is still supported, it has been replaced as the default by a binomial | ||
distribution mode which is more amenable to analysis. | ||
|
||
Binomial distribution | ||
--------------------- | ||
|
||
In the binomial distribution mode, :math:`R` is a random variable governed by a binomial | ||
distribution with parameters :math:`(m, q)`: | ||
|
||
.. math:: | ||
\text{Pr}[R=r] = {m \choose r} q^r (1-q)^{m-r}. | ||
(We will come back to determining values for :math:`(m, q)` later.) | ||
|
||
This distribution can be integrated with the negative binomial distribution above to yield a | ||
straightforward computation of :math:`\text{Pr}[N]`. | ||
|
||
.. math:: | ||
\text{Pr}[N=n] &= \sum_{r=0}^{\infty} \text{Pr}[K=n-r ~|~ R= r] \times \text{Pr}[R=r]\\ | ||
&= \sum_{r=0}^{\infty} {n-1 \choose n-r} {\left(1-p\right)}^{n-r} p^r \cdot {m \choose r} q^r {(1-q)}^{m-r}. | ||
If we set :math:`q = 1-p`, this simplifies as follows: | ||
|
||
.. math:: | ||
\text{Pr}[N=n] &= \sum_{r=0}^{\infty} {n-1 \choose n-r} {\left(1-p\right)}^{n-r} p^r \cdot {m \choose r} {(1-p)}^r p^{m-r}\\ | ||
&= \sum_{r=0}^{\infty} {n-1 \choose n-r} {\left(1-p\right)}^n p^m {m \choose r}\\ | ||
&= {\left(1-p\right)}^n p^m {n + m - 1\choose n}. | ||
This distribution again has the form of a negative binomial with parameters :math:`(m, 1-p)`. | ||
If we further set | ||
|
||
.. math:: | ||
m = \frac{\lambda}{1-p} | ||
for an expected rate parameter :math:`\lambda > 0` (as in the Poisson case above), then the | ||
distribution :math:`\text{Pr}[N=n]` is | ||
|
||
.. math:: | ||
N \sim \text{NB}\left(\frac{\lambda}{1-p}, 1-p\right), | ||
where NB denotes the probability mass function of the negative binomial distribution. | ||
This yields: | ||
|
||
- :math:`\text{E}[R] = \lambda`, | ||
- :math:`\text{E}[N] = \lambda / p`, and | ||
- :math:`\text{Var}[N] = \lambda \frac{1-p}{p^2}`. | ||
|
||
These match the analysis of the constant-mode case above, except that the number of samples per | ||
streamer is now a random variable with expectation :math:`\lambda`. | ||
Again, in the special case where :math:`p=1/A`, we recover | ||
|
||
- :math:`\text{E}[N] = \lambda A`, and | ||
- :math:`\text{Var}[N] = \lambda A (A-1)`. | ||
|
||
|
||
Limiting case :math:`p=1` | ||
------------------------- | ||
|
||
|
||
Discussion and recommendations | ||
------------------------------ | ||
|