Skip to content

Commit

Permalink
Add Getting Started content
Browse files Browse the repository at this point in the history
  • Loading branch information
jamesmbaazam committed Oct 9, 2023
1 parent ddd2361 commit 1204b60
Showing 1 changed file with 361 additions and 2 deletions.
363 changes: 361 additions & 2 deletions vignettes/epichains.Rmd
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Getting started with epichains"
author: "James Azam"
author: "James M. Azam and Sebastian Funk"
output:
bookdown::html_vignette2:
fig_caption: yes
Expand All @@ -27,4 +27,363 @@ knitr::opts_chunk$set(
)
```

WIP
_epichains_ provides methods to analyse and simulate the size and length
of branching processes with an arbitrary offspring distribution. These
can be used, for example, to analyse the distribution of chain sizes
or length of infectious disease outbreaks, as discussed in @farrington2003 and
@blumberg2013.

```{r}
library("epichains")
```

## Chain likelihoods

### [`likelihood()`](https://epiverse-trace.github.io/epichains/reference/likelihood.html)

This function calculates the likelihood/loglikelihood of observing a vector of outbreak summaries obtained from transmission chains. "Outbreak summaries" here refer to transmission chain sizes or lengths/durations.

`likelihood()` requires a vector of chain summaries (sizes or lengths),
`chains`, the corresponding statistic to calculate, `statistic`, the offspring
distribution, `offspring_dist` and its associated parameters. `offspring_dist`
is specified as the "base name" of the inbuilt distributions in R, for example,
"pois", "nbinom", etc. `likelihood()` also requires `nsim_obs`, which is the
number of simulations to run if the likelihoods do not have a closed-form
solution and must be simulated. This argument will be explained further in
the next section.

By default, the result is a log-likelihood but if `log = FALSE`, then
likelihoods are returned.

Let's look at the following example where we estimate the log-likelihood of
observing `chain_sizes`.
```{r}
set.seed(121)
# example of observed chain sizes
# randomly generate 20 chains of size between 1 to 10
chain_sizes <- sample(1:10, 20, replace = TRUE)
chain_sizes
```

```{r}
# estimate loglikelihood of the observed chain sizes
likelihood_eg <- likelihood(
chains = chain_sizes,
statistic = "size",
offspring_dist = "pois",
nsim_obs = 100,
lambda = 0.5
)
# Print the estimate
likelihood_eg
```

### Joint and individual log-likelihoods

`likelihood()`, by default, returns the joint log-likelihood. If instead,
the individual log-likelihoods are required, then the `individual` argument
must be set to `TRUE`. To return likelihoods instead, set `log = TRUE`.
```{r}
set.seed(121)
# example of observed chain sizes
# randomly generate 20 chains of size between 1 to 10
chain_sizes <- sample(1:10, 20, replace = TRUE)
chain_sizes
```

```{r}
# estimate loglikelihood of the observed chain sizes
likelihood_ind_eg <- likelihood(
chains = chain_sizes,
statistic = "size",
offspring_dist = "pois",
nsim_obs = 100,
lambda = 0.5,
individual = TRUE
)
# Print the estimate
likelihood_ind_eg
```

### How `likelihood()` works

*epichains* ships with functions for the analytical solutions of some
transmission chain "size" and "length" distributions. For the size
distributions, we provide the `poisson`, `negative binomial`, and `gamma-borel`
mixture. For the length distribution, we provide the `poisson` and `geometric`
distributions. These can be used with `likelihood()` based on what is specified
for `offspring_dist` and `statistic`.

If an analytical solution does not exist, we provide `offspring_ll()`, which
uses simulations to approximate the probability distributions
([using a linear approximation to the cumulative
distribution](https://en.wikipedia.org/wiki/Empirical_distribution_function)
for unobserved sizes/lengths). In this case, an extra argument `nsim_offspring`
must be passed to `likelihood()` to specify the number of simulations to be
used for this approximation.

For example, let's look at an example where `chain_sizes` is observed and we
want to calculate the likelihood of this being drawn from a binomial
distribution with probability `prob = 0.9`.
```{r}
set.seed(121)
# example of observed chain sizes; randomly generate 20 chains of size 1 to 10
chain_sizes <- sample(1:10, 20, replace = TRUE)
# get their likelihood
liks <- likelihood(
chains = chain_sizes,
offspring_dist = "binom",
statistic = "size",
size = 1,
prob = 0.9,
nsim_offspring = 250
)
liks
```

### Observation probability

`likelihood()` uses the argument `obs_prob` to model the observation
probability.

By default, it assumes perfect observation, where `obs_prob = 1`
(See `?likelihood`), meaning that all transmission events are observed and
recorded in the data.

If observations are imperfect, the `obs_prob` must be
less than 1. In the case of imperfect observation, "true" chain sizes
or lengths are simulated `nsim_obs` times, and the likelihood calculated for
each of the simulations.

For example, if the probability of observing each case is `obs_prob = 0.30`,
we use

```{r}
set.seed(121)
# example of observed chain sizes; randomly generate 20 chains of size 1 to 10
chain_sizes <- sample(1:10, 20, replace = TRUE)
# get their likelihood
liks <- likelihood(
chains = chain_sizes,
statistic = "size",
offspring_dist = "pois",
obs_prob = 0.3,
lambda = 0.5,
nsim_obs = 10
)
liks
```

This returns `10` likelihood values (because `nsim_obs = 10`), which can be
averaged to come up with an overall likelihood estimate.

To find out about the usage of the `likelihood()` function, you can run
`?likelihood` to access its `R` help file.

## Chain simulation

There are three simulation functions, herein referred to collectively as the `simulate_*()` functions.

### [`simulate_tree()`](https://epiverse-trace.github.io/epichains/reference/simulate_tree.html)

`simulate_tree()` simulates an outbreak from a given number of infections.
It retains and returns information on infectors (ancestors), infectees, the generation of infection, and the time, if a serial distribution is specified.

Let's look at an example where we simulate the transmission trees of $10$ initial infections/chains. We
assume a poisson offspring distribution with mean, $\text{lambda} = 0.9$, and a serial interval of $3$ days:
```{r}
set.seed(123)
sim_tree_eg <- simulate_tree(
nchains = 10,
statistic = "size",
offspring_dist = "pois",
stat_max = 10,
serials_dist = function(x) 3,
lambda = 0.9
)
head(sim_tree_eg)
```

### [`simulate_summary()`](https://epiverse-trace.github.io/epichains/reference/simulate_summary.html)

`simulate_summary()` is basically `simulate_tree()` except that it does not retain
information on each infector and infectee. It returns the eventual size or length/duration of each transmission chain.

Here is an example to simulate the previous examples without intervention,
returning the size of each of the $10$ chains. It assumes a poisson offspring distribution with
mean of $0.9$.
```{r}
set.seed(123)
simulate_summary_eg <- simulate_summary(
nchains = 10,
statistic = "size",
offspring_dist = "pois",
stat_max = 10,
lambda = 0.9
)
# Print the results
simulate_summary_eg
```

### [`simulate_tree_from_pop()`](https://epiverse-trace.github.io/epichains/reference/simulate_tree_from_pop.html)

`simulate_tree_from_pop()` simulates outbreaks based on a specified population size and pre-existing immunity until the susceptible pool runs out.

Here is a quick example where we simulate an outbreak in a population of size $1000$. We assume individuals have a poisson offspring distribution with mean, $\text{lambda} = 1$, and serial interval of $3$:
```{r}
set.seed(7)
sim_tree_from_pop_eg <- simulate_tree_from_pop(
pop = 1000,
offspring_dist = "pois",
lambda = 1,
serials_dist = function(x) {3}
)
head(sim_tree_from_pop_eg)
```

#### Simulating chains with interventions

All the `simulate_*()` functions can model interventions that reduce the $R_0$,
using the `intvn_mean_reduction` argument. In general, these can be
interpreted as population-level interventions.

To illustrate this, we will use the previous examples for each function and specify
a population-level intervention that reduces $R_0$ by $50\%$.

Using `simulate_tree()`, we can specify an initial number of cases
and a population level intervention, `intvn_mean_reduction`, that reduces $R_0$ by $50\%$.

```{r}
set.seed(123)
sim_tree_intvn_eg <- simulate_tree(
nchains = 10,
statistic = "size",
offspring_dist = "pois",
intvn_mean_reduction = 0.5,
stat_max = 10,
serials_dist = function(x) 3,
lambda = 0.9
)
head(sim_tree_intvn_eg)
```

Here is an example with `simulate_summary()`, modelling an intervention that reduces $R_0$ by $50\%$.
```{r}
simulate_summary_intvn_eg <- simulate_summary(
nchains = 10,
statistic = "size",
offspring_dist = "pois",
intvn_mean_reduction = 0.5,
stat_max = 10,
lambda = 0.9
)
# Print the results
simulate_summary_intvn_eg
```

Finally, let's use `simulate_tree_from_pop()`.
```{r}
set.seed(7)
sim_tree_from_pop_intvn_eg <- simulate_tree_from_pop(
pop = 1000,
offspring_dist = "pois",
intvn_mean_reduction = 0.5,
lambda = 1,
serials_dist = function(x) {3}
)
head(sim_tree_from_pop_intvn_eg)
```

## Other functionalities

### Summarising

You can run `summary()` on `<epichains>` objects to get useful summaries.
```{r include=TRUE,echo=TRUE}
# Example with simulate_tree()
set.seed(123)
sim_tree_eg <- simulate_tree(
nchains = 10,
statistic = "size",
offspring_dist = "pois",
stat_max = 10,
serials_dist = function(x) 3,
lambda = 0.9
)
summary(sim_tree_eg)
# Example with simulate_summary()
set.seed(123)
simulate_summary_eg <- simulate_summary(
nchains = 10,
statistic = "size",
offspring_dist = "pois",
stat_max = 10,
lambda = 0.9
)
# Get summaries
summary(simulate_summary_eg)
```

### Aggregating

You can aggregate `<epichains>` objects returned by the `simulate_*()` functions into a time series, which is a `<data.frame>` with columns "cases" and either "generation" or "time", depending on the value of `grouping_var`.

To aggregate over "time", you must have specified a serial interval distribution in the simulation step.
```{r include=TRUE,echo=TRUE}
# Example with simulate_tree()
set.seed(123)
sim_tree_eg <- simulate_tree(
nchains = 10,
statistic = "size",
offspring_dist = "pois",
stat_max = 10,
serials_dist = function(x) 3,
lambda = 0.9
)
aggregate(sim_tree_eg, grouping_var = "time")
```

### Plotting

Aggregated `<epichains>` objects can easily be plotted using base R or `ggplot2` with little to no data manipulation.

Here is an end-to-end example from simulation through aggregation to plotting.
```{r}
# Run simulation with simulate_tree()
set.seed(123)
sim_tree_eg <- simulate_tree(
nchains = 10,
statistic = "size",
offspring_dist = "pois",
stat_max = 10,
serials_dist = function(x) 3,
lambda = 0.9
)
# Aggregate cases over time
sim_aggreg <- aggregate(sim_tree_eg, grouping_var = "time")
# Plot cases over time
plot(sim_aggreg, type = "b")
```

## References

0 comments on commit 1204b60

Please sign in to comment.