diff --git a/docs/index.rst b/docs/index.rst
index 247402e..c6badbc 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -96,6 +96,14 @@ Advanced examples
 
     auto_examples/index
 
+***********************
+Stochastic mux analysis
+***********************
+.. toctree::
+    :maxdepth: 2
+
+    muxanalysis
+
 *************
 API Reference
 *************
diff --git a/docs/muxanalysis.rst b/docs/muxanalysis.rst
new file mode 100644
index 0000000..4cd4cb9
--- /dev/null
+++ b/docs/muxanalysis.rst
@@ -0,0 +1,195 @@
+.. _muxanalysis:
+
+Analysis of Stochastic Mux
+==========================
+
+:ref:`mux` objects (*mux* for short, *muxen* for plural) allow multiple :ref:`Streamer` objects to
+be combined into a single stream by selectively sampling from each constituent stream.
+The different kinds of mux objects provide different behaviors, but among them, and among
+them, ``StochasticMux`` is the most complex.
+This section provides an in-depth analysis of ``StochasticMux``'s behavior.
+
+
+
+Stream activation and replacement
+---------------------------------
+
+``StochasticMux`` differs from other muxen (``ShuffledMux``, ``RoundRobinMux``, etc.) by
+maintaining an **active set** of streamers from the full collection it is multiplexing.
+At any given time, samples are drawn only from the active set, while the remaining streamers are
+**inactive**.
+Each active streamer is limited to produce a (possibly random) number of samples, after which, it is removed from
+the active set and replaced by a new streamer selected at random; hence the name **StochasticMux**.
+
+A key quantity to understand when using ``StochasticMux`` is the streamer replacement rate: how
+often should we expect streamers to be replaced from the active set, as a function of samples
+generated by the mux?
+This quantity is important for a couple of reasons:
+
+    * If we care about the distribution of samples produced by ``StochasticMux`` being a good
+      approximation of what you would get if all streamers were active simultaneously (i.e.,
+      ``ShuffledMux`` behavior), then the streamer replacement rate should be small.
+    * If we have large startup costs involved with activating a streamer (e.g., loading data
+      from disk), then streamer replacement should be infrequent to ensure high throughput. 
+      What's more, replacement events should be spread out among the active set, to avoid having several replacement events in a short period of time.
+
+In the following sections, we'll analyze replacement rates for the different choices of rate
+distributions (`constant`, `poisson`, and `binomial`).
+We'll focus the analysis on a single (active) streamer at a time.
+The question we'll analyze is specifically: how many samples :math:`N` must we generate (in
+expectation) before a specific streamer is deactivated and replaced?
+Understanding the distribution of `N` (its mean and variance) will help us understand how often
+we should expect to see streamer replacement events.
+
+
+Notation
+--------
+
+Let :math:`A` denote the size of the active set, let :math:`r` denote the number of samples
+generated by a particular streamer, and let :math:`p` denote the probability of selecting the
+active streamer in question.
+We'll make the simplifying assumption that the ``weights`` attached to all streamers are
+uniform, i.e., :math:`p = 1/A`.
+
+
+Constant distribution
+---------------------
+
+When using the ``constant`` distribution, the sample limit :math:`r` is fixed in advance.
+Our question about the number of samples generated by StochasticMux can then be rephrased
+slightly:
+how many samples :math:`K` must we draw from *all other active streamers* before drawing the
+:math:`r`\ th sample from the streamer under analysis?
+
+This number :math:`K` is a random variable, modeled by the `negative binomial distribution <https://en.wikipedia.org/wiki/Negative_binomial_distribution>`_:
+
+.. math::
+
+   \text{Pr}[K = k] = {k + r - 1 \choose k} {(1-p)^k p^r}
+
+
+It has expected value
+
+.. math::
+
+   \text{E}[K] = r \cdot \frac{1-p}{p},
+
+and variance
+
+.. math::
+
+   \text{Var}[K] = r \cdot \frac{1-p}{p^2}.
+
+
+The total number of samples produced by the mux before the streamer is replaced is now a random
+variable :math:`N = K + r`.
+We can use linearity of expectation to compute its expected value as
+
+.. math::
+
+   \text{E}[N] = \text{E}[K] + r = r \cdot\frac{1-p}{p} + r = \frac{r}{p}.
+
+
+Since :math:`N` and :math:`K` differ only by a constant (:math:`r`), they have the same
+variance:
+
+.. math::
+
+   \text{Var}[N] = \text{Var}[K].
+
+
+If we apply the simplifying assumption that streamers are selected uniformly at random (:math:`p
+= 1/A`), then we get the following:
+
+    * :math:`\text{E}[N] = r \cdot A`, and 
+    * :math:`\text{Var}[N] = r \cdot A \cdot (A-1)`.
+
+In plain language, this says that the streamer replacement rate scales like the product of the size of the active set and the number of samples per streamer.
+Making either of these values large implies that we should expect to wait longer to replace an active streamer.
+However, the variance of replacement event times is approximately **quadratic** in the size of the active set.
+This means that making the active set larger will increase the dispersion of replacement events away from the expected value.
+
+
+Poisson distribution
+--------------------
+
+In pescador version 2 and earlier, the sample limit :math:`r` was not a constant value, but a
+random variable :math:`R` drawn from a Poisson distribution with rate parameter :math:`\lambda`.
+The analysis above can mostly be carried over to handle this case, though it does not lead to a
+closed form expression for :math:`\text{E}[N]` or :math:`\text{Var}[N]` because we must now
+marginalize over the variable :math:`R`:
+
+.. math::
+
+   \text{Pr}[K=k]   &= \sum_{r=0}^{\infty} \text{Pr}[K=k, R = r]\\
+                    &= \sum_{r=0}^{\infty} \text{Pr}[K=k ~|~ R = r] \times \text{Pr}[R=r]\\
+                    &= \sum_{r=0}^{\infty} {k + r - 1 \choose k} {(1-p)^k p^r} \times \frac{\lambda^r e^{-\lambda}}{r!}
+
+
+While this distribution is still supported, it has been replaced as the default by a binomial
+distribution mode which is more amenable to analysis.
+
+Binomial distribution
+---------------------
+
+In the binomial distribution mode, :math:`R` is a random variable governed by a binomial
+distribution with parameters :math:`(m, q)`:
+
+.. math::
+
+   \text{Pr}[R=r] = {m \choose r} q^r (1-q)^{m-r}.
+
+(We will come back to determining values for :math:`(m, q)` later.)
+
+This distribution can be integrated with the negative binomial distribution above to yield a
+straightforward computation of :math:`\text{Pr}[N]`.
+
+.. math::
+
+   \text{Pr}[N=n] &= \sum_{r=0}^{\infty} \text{Pr}[K=n-r ~|~ R= r] \times \text{Pr}[R=r]\\
+   &= \sum_{r=0}^{\infty} {n-1 \choose n-r} {\left(1-p\right)}^{n-r} p^r \cdot {m \choose r} q^r {(1-q)}^{m-r}.
+
+If we set :math:`q = 1-p`, this simplifies as follows:
+
+.. math::
+
+   \text{Pr}[N=n] &= \sum_{r=0}^{\infty} {n-1 \choose n-r} {\left(1-p\right)}^{n-r} p^r \cdot {m \choose r} {(1-p)}^r p^{m-r}\\
+   &= \sum_{r=0}^{\infty} {n-1 \choose n-r} {\left(1-p\right)}^n p^m {m \choose r}\\
+   &= {\left(1-p\right)}^n p^m {n + m - 1\choose n}.
+
+This distribution again has the form of a negative binomial with parameters :math:`(m, 1-p)`.
+If we further set
+
+.. math::
+
+   m = \frac{\lambda}{1-p}
+
+for an expected rate parameter :math:`\lambda > 0` (as in the Poisson case above), then the
+distribution :math:`\text{Pr}[N=n]` is
+
+.. math::
+
+   N \sim \text{NB}\left(\frac{\lambda}{1-p}, 1-p\right),
+
+where NB denotes the probability mass function of the negative binomial distribution.
+This yields:
+
+    - :math:`\text{E}[R] = \lambda`,
+    - :math:`\text{E}[N] = \lambda / p`, and
+    - :math:`\text{Var}[N] = \lambda \frac{1-p}{p^2}`.
+
+These match the analysis of the constant-mode case above, except that the number of samples per
+streamer is now a random variable with expectation :math:`\lambda`.
+Again, in the special case where :math:`p=1/A`, we recover
+
+    - :math:`\text{E}[N] = \lambda A`, and
+    - :math:`\text{Var}[N] = \lambda A (A-1)`.
+
+
+Limiting case :math:`p=1`
+-------------------------
+
+
+Discussion and recommendations
+------------------------------
+