diff --git a/vignettes/epichains.Rmd b/vignettes/epichains.Rmd index 2acfa562..9b96b448 100644 --- a/vignettes/epichains.Rmd +++ b/vignettes/epichains.Rmd @@ -1,6 +1,6 @@ --- title: "Getting started with epichains" -author: "James Azam" +author: "James M. Azam and Sebastian Funk" output: bookdown::html_vignette2: fig_caption: yes @@ -27,4 +27,363 @@ knitr::opts_chunk$set( ) ``` -WIP +_epichains_ provides methods to analyse and simulate the size and length +of branching processes with an arbitrary offspring distribution. These +can be used, for example, to analyse the distribution of chain sizes +or length of infectious disease outbreaks, as discussed in @farrington2003 and +@blumberg2013. + +```{r} +library("epichains") +``` + +## Chain likelihoods + +### [`likelihood()`](https://epiverse-trace.github.io/epichains/reference/likelihood.html) + +This function calculates the likelihood/loglikelihood of observing a vector of outbreak summaries obtained from transmission chains. "Outbreak summaries" here refer to transmission chain sizes or lengths/durations. + +`likelihood()` requires a vector of chain summaries (sizes or lengths), +`chains`, the corresponding statistic to calculate, `statistic`, the offspring +distribution, `offspring_dist` and its associated parameters. `offspring_dist` +is specified as the "base name" of the inbuilt distributions in R, for example, +"pois", "nbinom", etc. `likelihood()` also requires `nsim_obs`, which is the +number of simulations to run if the likelihoods do not have a closed-form +solution and must be simulated. This argument will be explained further in +the next section. + +By default, the result is a log-likelihood but if `log = FALSE`, then +likelihoods are returned. + +Let's look at the following example where we estimate the log-likelihood of +observing `chain_sizes`. +```{r} +set.seed(121) +# example of observed chain sizes +# randomly generate 20 chains of size between 1 to 10 +chain_sizes <- sample(1:10, 20, replace = TRUE) +chain_sizes +``` + +```{r} +# estimate loglikelihood of the observed chain sizes +likelihood_eg <- likelihood( + chains = chain_sizes, + statistic = "size", + offspring_dist = "pois", + nsim_obs = 100, + lambda = 0.5 +) +# Print the estimate +likelihood_eg +``` + +### Joint and individual log-likelihoods + +`likelihood()`, by default, returns the joint log-likelihood. If instead, +the individual log-likelihoods are required, then the `individual` argument +must be set to `TRUE`. To return likelihoods instead, set `log = TRUE`. +```{r} +set.seed(121) +# example of observed chain sizes +# randomly generate 20 chains of size between 1 to 10 +chain_sizes <- sample(1:10, 20, replace = TRUE) +chain_sizes +``` + +```{r} +# estimate loglikelihood of the observed chain sizes +likelihood_ind_eg <- likelihood( + chains = chain_sizes, + statistic = "size", + offspring_dist = "pois", + nsim_obs = 100, + lambda = 0.5, + individual = TRUE +) +# Print the estimate +likelihood_ind_eg +``` + +### How `likelihood()` works + +*epichains* ships with functions for the analytical solutions of some +transmission chain "size" and "length" distributions. For the size +distributions, we provide the `poisson`, `negative binomial`, and `gamma-borel` +mixture. For the length distribution, we provide the `poisson` and `geometric` +distributions. These can be used with `likelihood()` based on what is specified +for `offspring_dist` and `statistic`. + +If an analytical solution does not exist, we provide `offspring_ll()`, which +uses simulations to approximate the probability distributions +([using a linear approximation to the cumulative +distribution](https://en.wikipedia.org/wiki/Empirical_distribution_function) +for unobserved sizes/lengths). In this case, an extra argument `nsim_offspring` +must be passed to `likelihood()` to specify the number of simulations to be +used for this approximation. + +For example, let's look at an example where `chain_sizes` is observed and we +want to calculate the likelihood of this being drawn from a binomial +distribution with probability `prob = 0.9`. +```{r} +set.seed(121) +# example of observed chain sizes; randomly generate 20 chains of size 1 to 10 +chain_sizes <- sample(1:10, 20, replace = TRUE) +# get their likelihood +liks <- likelihood( + chains = chain_sizes, + offspring_dist = "binom", + statistic = "size", + size = 1, + prob = 0.9, + nsim_offspring = 250 +) +liks +``` + +### Observation probability + +`likelihood()` uses the argument `obs_prob` to model the observation +probability. + +By default, it assumes perfect observation, where `obs_prob = 1` +(See `?likelihood`), meaning that all transmission events are observed and +recorded in the data. + +If observations are imperfect, the `obs_prob` must be +less than 1. In the case of imperfect observation, "true" chain sizes +or lengths are simulated `nsim_obs` times, and the likelihood calculated for +each of the simulations. + +For example, if the probability of observing each case is `obs_prob = 0.30`, +we use + +```{r} +set.seed(121) +# example of observed chain sizes; randomly generate 20 chains of size 1 to 10 +chain_sizes <- sample(1:10, 20, replace = TRUE) +# get their likelihood +liks <- likelihood( + chains = chain_sizes, + statistic = "size", + offspring_dist = "pois", + obs_prob = 0.3, + lambda = 0.5, + nsim_obs = 10 +) +liks +``` + +This returns `10` likelihood values (because `nsim_obs = 10`), which can be +averaged to come up with an overall likelihood estimate. + +To find out about the usage of the `likelihood()` function, you can run +`?likelihood` to access its `R` help file. + +## Chain simulation + +There are three simulation functions, herein referred to collectively as the `simulate_*()` functions. + +### [`simulate_tree()`](https://epiverse-trace.github.io/epichains/reference/simulate_tree.html) + +`simulate_tree()` simulates an outbreak from a given number of infections. +It retains and returns information on infectors (ancestors), infectees, the generation of infection, and the time, if a serial distribution is specified. + +Let's look at an example where we simulate the transmission trees of $10$ initial infections/chains. We +assume a poisson offspring distribution with mean, $\text{lambda} = 0.9$, and a serial interval of $3$ days: +```{r} +set.seed(123) + +sim_tree_eg <- simulate_tree( + nchains = 10, + statistic = "size", + offspring_dist = "pois", + stat_max = 10, + serials_dist = function(x) 3, + lambda = 0.9 +) + +head(sim_tree_eg) +``` + +### [`simulate_summary()`](https://epiverse-trace.github.io/epichains/reference/simulate_summary.html) + +`simulate_summary()` is basically `simulate_tree()` except that it does not retain +information on each infector and infectee. It returns the eventual size or length/duration of each transmission chain. + +Here is an example to simulate the previous examples without intervention, +returning the size of each of the $10$ chains. It assumes a poisson offspring distribution with +mean of $0.9$. +```{r} +set.seed(123) + +simulate_summary_eg <- simulate_summary( + nchains = 10, + statistic = "size", + offspring_dist = "pois", + stat_max = 10, + lambda = 0.9 +) + +# Print the results +simulate_summary_eg +``` + +### [`simulate_tree_from_pop()`](https://epiverse-trace.github.io/epichains/reference/simulate_tree_from_pop.html) + +`simulate_tree_from_pop()` simulates outbreaks based on a specified population size and pre-existing immunity until the susceptible pool runs out. + +Here is a quick example where we simulate an outbreak in a population of size $1000$. We assume individuals have a poisson offspring distribution with mean, $\text{lambda} = 1$, and serial interval of $3$: +```{r} +set.seed(7) + +sim_tree_from_pop_eg <- simulate_tree_from_pop( + pop = 1000, + offspring_dist = "pois", + lambda = 1, + serials_dist = function(x) {3} + ) + +head(sim_tree_from_pop_eg) +``` + +#### Simulating chains with interventions + +All the `simulate_*()` functions can model interventions that reduce the $R_0$, +using the `intvn_mean_reduction` argument. In general, these can be +interpreted as population-level interventions. + +To illustrate this, we will use the previous examples for each function and specify +a population-level intervention that reduces $R_0$ by $50\%$. + +Using `simulate_tree()`, we can specify an initial number of cases +and a population level intervention, `intvn_mean_reduction`, that reduces $R_0$ by $50\%$. + +```{r} +set.seed(123) + +sim_tree_intvn_eg <- simulate_tree( + nchains = 10, + statistic = "size", + offspring_dist = "pois", + intvn_mean_reduction = 0.5, + stat_max = 10, + serials_dist = function(x) 3, + lambda = 0.9 +) + +head(sim_tree_intvn_eg) +``` + +Here is an example with `simulate_summary()`, modelling an intervention that reduces $R_0$ by $50\%$. +```{r} +simulate_summary_intvn_eg <- simulate_summary( + nchains = 10, + statistic = "size", + offspring_dist = "pois", + intvn_mean_reduction = 0.5, + stat_max = 10, + lambda = 0.9 +) + +# Print the results +simulate_summary_intvn_eg +``` + +Finally, let's use `simulate_tree_from_pop()`. +```{r} +set.seed(7) + +sim_tree_from_pop_intvn_eg <- simulate_tree_from_pop( + pop = 1000, + offspring_dist = "pois", + intvn_mean_reduction = 0.5, + lambda = 1, + serials_dist = function(x) {3} + ) + +head(sim_tree_from_pop_intvn_eg) +``` + +## Other functionalities + +### Summarising + +You can run `summary()` on `` objects to get useful summaries. +```{r include=TRUE,echo=TRUE} +# Example with simulate_tree() +set.seed(123) + +sim_tree_eg <- simulate_tree( + nchains = 10, + statistic = "size", + offspring_dist = "pois", + stat_max = 10, + serials_dist = function(x) 3, + lambda = 0.9 +) + +summary(sim_tree_eg) + +# Example with simulate_summary() +set.seed(123) + +simulate_summary_eg <- simulate_summary( + nchains = 10, + statistic = "size", + offspring_dist = "pois", + stat_max = 10, + lambda = 0.9 +) + +# Get summaries +summary(simulate_summary_eg) +``` + +### Aggregating + +You can aggregate `` objects returned by the `simulate_*()` functions into a time series, which is a `` with columns "cases" and either "generation" or "time", depending on the value of `grouping_var`. + +To aggregate over "time", you must have specified a serial interval distribution in the simulation step. +```{r include=TRUE,echo=TRUE} +# Example with simulate_tree() +set.seed(123) + +sim_tree_eg <- simulate_tree( + nchains = 10, + statistic = "size", + offspring_dist = "pois", + stat_max = 10, + serials_dist = function(x) 3, + lambda = 0.9 +) + +aggregate(sim_tree_eg, grouping_var = "time") +``` + +### Plotting + +Aggregated `` objects can easily be plotted using base R or `ggplot2` with little to no data manipulation. + +Here is an end-to-end example from simulation through aggregation to plotting. +```{r} +# Run simulation with simulate_tree() +set.seed(123) + +sim_tree_eg <- simulate_tree( + nchains = 10, + statistic = "size", + offspring_dist = "pois", + stat_max = 10, + serials_dist = function(x) 3, + lambda = 0.9 +) + +# Aggregate cases over time +sim_aggreg <- aggregate(sim_tree_eg, grouping_var = "time") + +# Plot cases over time +plot(sim_aggreg, type = "b") +``` + +## References