Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor likelihood functions #58

Merged
merged 57 commits into from
Sep 13, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
18336c7
Delete duplicated return value in function doc
jamesmbaazam Sep 4, 2023
f31ee9b
Replace "likelihood" with "log-likelihood" to clarify that the latter…
jamesmbaazam Sep 6, 2023
a5ec3d7
Reword the docs of "offspring_ll"
jamesmbaazam Sep 6, 2023
202956d
Revert to returning log values to comform with function name
jamesmbaazam Sep 6, 2023
74fb973
Remove log argument
jamesmbaazam Sep 6, 2023
5936f31
Explicitly assign arguments to avoid positioning matching
jamesmbaazam Sep 6, 2023
8998bf1
Add a seealso tag and clean up examples
jamesmbaazam Sep 6, 2023
c7cdf8f
Fixed the comment tags in the examples
jamesmbaazam Sep 6, 2023
85e98e9
Give nsim_obs a default value
jamesmbaazam Sep 6, 2023
56c3fd6
Assign lambda a value less than 1 to prevent example outbreak from ov…
jamesmbaazam Sep 6, 2023
9c7adf8
Make documentation of likelihood function more consistent by using "l…
jamesmbaazam Sep 6, 2023
2c81a58
Improve error message
jamesmbaazam Sep 6, 2023
879dc95
Rename a variable
jamesmbaazam Sep 6, 2023
f1e62d4
Replace "likelihood" with "log-likelihood"
jamesmbaazam Sep 6, 2023
8505e33
Lint
jamesmbaazam Sep 6, 2023
54eeac1
Add exp transformation for when log=FALSE
jamesmbaazam Sep 6, 2023
206bc40
Add joint likelihood calculation for where individual=TRUE and depend…
jamesmbaazam Sep 6, 2023
0e61952
Replace "likelihood" with "log-likelihood" to clarify that the latter…
jamesmbaazam Sep 6, 2023
c8a8773
Reword the docs of "offspring_ll"
jamesmbaazam Sep 6, 2023
6c31e44
Add a seealso tag and clean up examples
jamesmbaazam Sep 6, 2023
d86a019
Fixed the comment tags in the examples
jamesmbaazam Sep 6, 2023
263166f
Assign lambda a value less than 1 to prevent example outbreak from ov…
jamesmbaazam Sep 6, 2023
890ff99
Add exp transformation for when log=FALSE
jamesmbaazam Sep 6, 2023
fdcd3fe
Add joint likelihood calculation for where individual=TRUE and depend…
jamesmbaazam Sep 6, 2023
2e4e02c
Linting
jamesmbaazam Sep 6, 2023
e5b4503
Linting
jamesmbaazam Sep 6, 2023
70247ba
Change type coersion to use explicit logical checks
jamesmbaazam Sep 7, 2023
1dd3303
Correct length distribution function doc
jamesmbaazam Sep 7, 2023
34865fb
Generate length distribution function man files
jamesmbaazam Sep 7, 2023
7c96092
Use a bigger vector of chains for example
jamesmbaazam Sep 7, 2023
0090a84
Remove old tests
jamesmbaazam Sep 7, 2023
c78b2d9
Add tests for likelihood function
jamesmbaazam Sep 7, 2023
2328143
Add tests for stat log-likelihood functions
jamesmbaazam Sep 7, 2023
d36f76c
Use the right stat vector for the log-likelihood calculation
jamesmbaazam Sep 7, 2023
b14e0e2
Lint
jamesmbaazam Sep 7, 2023
09b0013
Fix tests
jamesmbaazam Sep 7, 2023
7c6be4c
Validate the log argument
jamesmbaazam Sep 11, 2023
de5800d
Add more tests of the likelihood() function to improve coverage
jamesmbaazam Sep 11, 2023
de97065
Add a test for construct_offspring_ll_name()
jamesmbaazam Sep 11, 2023
c91681f
Fixed linting issues with the tests
jamesmbaazam Sep 11, 2023
bcd542a
Remove default value of nsim_obs argument
jamesmbaazam Sep 12, 2023
1e017b5
Reinstate control to overwrite stat_max when specified as Inf
jamesmbaazam Sep 12, 2023
ce01a87
Inherit dot params from offspring_ll instead of simulate_summary
jamesmbaazam Sep 12, 2023
5c43a84
Add a comment to example
jamesmbaazam Sep 12, 2023
0a4fd60
Replace !isTRUE to isFALSE
jamesmbaazam Sep 12, 2023
c29f0ff
Generate doc for removed default value of nsim_obs
jamesmbaazam Sep 12, 2023
3d4282b
Rename variables
jamesmbaazam Sep 12, 2023
d344a27
Revise documentation of likelihood function
jamesmbaazam Sep 12, 2023
72522fe
Rename chains to x
jamesmbaazam Sep 12, 2023
795e425
Document chains argument due to loss of inheritance
jamesmbaazam Sep 12, 2023
c467946
Linting
jamesmbaazam Sep 12, 2023
a45788c
Revise return value documentation
jamesmbaazam Sep 12, 2023
fe95b8f
Update offspring_ll calls to use x instead of chains
jamesmbaazam Sep 12, 2023
fa8a255
Add more tests
jamesmbaazam Sep 12, 2023
0e65987
Abbreviate code block with ifelse + vapply construct
jamesmbaazam Sep 12, 2023
564e2a9
Assign final value to return the right copy
jamesmbaazam Sep 12, 2023
9e601a1
Replace = with == for consitency.
jamesmbaazam Sep 13, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 75 additions & 42 deletions R/likelihood.R
Original file line number Diff line number Diff line change
@@ -1,30 +1,41 @@
#' Estimate the (log) likelihood for observed branching processes
#' Estimate the log-likelihood/likelihood for observed branching processes
#'
#' @inheritParams offspring_ll
#' @inheritParams simulate_summary
#' @param chains Vector of sizes/lengths of transmission chains.
#' @param nsim_obs Number of simulations if the likelihood is to be
#' approximated for imperfect observations.
#' @param log Logical; Should the results be log-transformed? (Defaults
#' to TRUE).
#' @param chains Vector of chain summaries (sizes/lengths)
#' @param nsim_obs Number of simulations if the log-likelihood/likelihood is to
#' be approximated for imperfect observations.
#' @param log Logical; Should the log-likelihoods be transformed to
jamesmbaazam marked this conversation as resolved.
Show resolved Hide resolved
#' likelihoods? (Defaults to TRUE).
#' @param obs_prob Observation probability (assumed constant)
#' @param exclude A vector of indices of the sizes/lengths to exclude from the
#' likelihood calculation.
#' @param individual If TRUE, a vector of individual (log)likelihood
#' contributions will be returned rather than the sum.
#' @param ... Parameters for the offspring distribution.
jamesmbaazam marked this conversation as resolved.
Show resolved Hide resolved
#' log-likelihood calculation.
#' @param individual If TRUE, a vector of individual log-likelihood/likelihood
#' contributions will be returned rather than the sum/product.
#' @return
#' * A log-likelihood, if \code{log = TRUE} (the default)
jamesmbaazam marked this conversation as resolved.
Show resolved Hide resolved
#' * A vector of log-likelihoods, if \code{log = TRUE} (the default) and
#' \code{obs_prob < 1}, or
#' * A list of individual log-likelihood contributions, if
#' \code{log = TRUE} (the default) and \code{individual = TRUE}.
#' else raw likelihoods, or vector of likelihoods
#' @seealso offspring_ll, pois_size_ll, nbinom_size_ll, gborel_size_ll,
#' pois_length_ll, geom_length_ll.
#' If \code{log = TRUE}
#'
#' * A joint log-likelihood (sum of individual log-likelihoods), if
#' \code{individual == FALSE} (default) and \code{obs_prob == 1} (default), or
#' * A list of individual log-likelihoods, if \code{individual == TRUE} and
#' \code{obs_prob == 1} (default), or
#' * A list of individual log-likelihoods (same length as `nsim_obs`), if
#' \code{individual == TRUE} and \code{0 <= obs_prob < 1}, or
#' * A vector of joint log-likelihoods (same length as `nsim_obs`), if
#' individual == FALSE and \code{0 <= obs_prob < 1} (imperfect observation).
#'
#' If \code{log = FALSE}, the same structure of outputs as above are returned,
#' except that likelihoods, instead of log-likelihoods, are calculated in all
#' cases. Moreover, the joint likelihoods are the product, instead of the sum,
#' of the individual likelihoods.
#' @seealso offspring_ll(), pois_size_ll(), nbinom_size_ll(), gborel_size_ll(),
#' pois_length_ll(), geom_length_ll()
#' @author Sebastian Funk
#' @examples
#' # example of observed chain sizes
#' chain_sizes <- c(1, 1, 4, 7)
#' set.seed(121)
#' # randomly generate 20 chains of size 1 to 10
#' chain_sizes <- sample(1:10, 20, replace = TRUE)
jamesmbaazam marked this conversation as resolved.
Show resolved Hide resolved
#' likelihood(
#' chains = chain_sizes, statistic = "size",
#' offspring_dist = "pois", nsim_obs = 100, lambda = 0.5
Expand All @@ -37,81 +48,103 @@ likelihood <- function(chains, statistic = c("size", "length"), offspring_dist,

## checks
check_offspring_valid(offspring_dist)
checkmate::assert_logical(
log,
any.missing = FALSE,
all.missing = FALSE,
len = 1
)

if (obs_prob <= 0 || obs_prob > 1) stop("'obs_prob' must be within (0,1]")
if (obs_prob <= 0 || obs_prob > 1) {
stop("'obs_prob' is a probability and must be between 0 and 1 inclusive")
}
if (obs_prob < 1) {
if (missing(nsim_obs)) {
stop("'nsim_obs' must be specified if 'obs_prob' is < 1")
}

sample_func <- get_statistic_func(statistic)
statistic_func <- get_statistic_func(statistic)

sampled_x <- replicate(nsim_obs, pmin(
sample_func(
stat_rep_list <- replicate(nsim_obs, pmin(
statistic_func(
length(chains),
chains, obs_prob
),
stat_max
), simplify = FALSE)
size_x <- unlist(sampled_x)
stat_rep_vect <- unlist(stat_rep_list)
if (!is.finite(stat_max)) {
stat_max <- max(size_x) + 1
stat_max <- max(stat_rep_vect) + 1
}
} else {
chains[chains >= stat_max] <- stat_max
size_x <- chains
sampled_x <- list(chains)
stat_rep_vect <- chains
stat_rep_list <- list(chains)
}

## determine for which sizes to calculate the likelihood (for true chain size)
if (any(size_x == stat_max)) {
## determine for which sizes to calculate the log-likelihood
## (for true chain size)
if (any(stat_rep_vect == stat_max)) {
calc_sizes <- seq_len(stat_max - 1)
} else {
calc_sizes <- unique(c(size_x, exclude))
calc_sizes <- unique(c(stat_rep_vect, exclude))
}

## get likelihood function as given by offspring_dist and statistic
## get log-likelihood function as given by offspring_dist and statistic
likelihoods <- vector(mode = "numeric")
ll_func <- construct_offspring_ll_name(offspring_dist, statistic)
pars <- as.list(unlist(list(...))) ## converts vectors to lists

## calculate likelihoods
## calculate log-likelihoods
if (exists(ll_func, where = asNamespace("epichains"), mode = "function")) {
func <- get(ll_func)
likelihoods[calc_sizes] <- do.call(func, c(list(x = calc_sizes), pars))
} else {
likelihoods[calc_sizes] <-
do.call(
offspring_ll,
c(list(
chains = calc_sizes, offspring_dist = offspring_dist,
statistic = statistic, stat_max = stat_max,
log = log
), pars)
c(
list(
x = calc_sizes,
offspring_dist = offspring_dist,
statistic = statistic,
stat_max = stat_max
),
pars
)
)
}

## assign probabilities to stat_max outbreak sizes
if (any(size_x == stat_max)) {
if (any(stat_rep_vect == stat_max)) {
likelihoods[stat_max] <- complementary_logprob(likelihoods)
}

if (!missing(exclude)) {
likelihoods <- likelihoods - log(-expm1(sum(likelihoods[exclude])))
likelihoods[exclude] <- -Inf

sampled_x <- lapply(sampled_x, function(y) {
stat_rep_list <- lapply(stat_rep_list, function(y) {
y[!(y %in% exclude)]
})
}

## assign likelihoods
chains_likelihood <- lapply(sampled_x, function(sx) {
chains_likelihood <- lapply(stat_rep_list, function(sx) {
likelihoods[sx[!(sx %in% exclude)]]
})

if (!individual) {
chains_likelihood <- vapply(chains_likelihood, sum, 0)
## transform log-likelihoods into likelihoods if required
if (isFALSE(log)) {
chains_likelihood <- lapply(chains_likelihood, exp)
}

## if individual == FALSE, return the joint log-likelihood
## (sum of the log-likelihoods), if log == TRUE, else
## multiply the likelihoods
if (isFALSE(individual)) {
summarise_func <- ifelse(log, sum, prod)
chains_likelihood <- vapply(chains_likelihood, summarise_func, 0)
}

return(chains_likelihood)
Expand Down
80 changes: 49 additions & 31 deletions R/stat_likelihoods.R
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#' Likelihood of the size of chains with Poisson offspring distribution
#' Log-likelihood of the size of chains with Poisson offspring distribution
#'
#' @param x vector of sizes
#' @param lambda rate of the Poisson distribution
Expand All @@ -9,7 +9,7 @@ pois_size_ll <- function(x, lambda) {
(x - 1) * log(lambda) - lambda * x + (x - 2) * log(x) - lgamma(x)
}

#' Likelihood of the size of chains with Negative-Binomial offspring
#' Log-likelihood of the size of chains with Negative-Binomial offspring
#' distribution
#'
#' @param x vector of sizes
Expand All @@ -31,7 +31,7 @@ nbinom_size_ll <- function(x, size, prob, mu) {
(size * x + (x - 1)) * log(1 + mu / size)
}

#' Likelihood of the size of chains with gamma-Borel offspring distribution
#' Log-likelihood of the size of chains with gamma-Borel offspring distribution
#'
#' @param x vector of sizes
#' @param size the dispersion parameter (often called \code{k} in ecological
Expand All @@ -52,9 +52,9 @@ gborel_size_ll <- function(x, size, prob, mu) {
(x - 1) * log(x) - (size + x - 1) * log(x + size / mu)
}

#' Likelihood of the length of chains with Poisson offspring distribution
#' Log-likelihood of the length of chains with Poisson offspring distribution
#'
#' @param x vector of sizes
#' @param x vector of lengths
#' @param lambda rate of the Poisson distribution
#' @return log-likelihood values
#' @author Sebastian Funk
Expand All @@ -70,9 +70,9 @@ pois_length_ll <- function(x, lambda) {
log(Gk[x + 1] - Gk[x])
}

#' Likelihood of the length of chains with geometric offspring distribution
#' Log-likelihood of the length of chains with geometric offspring distribution
#'
#' @param x vector of sizes
#' @param x vector of lengths
#' @param prob probability of the geometric distribution with mean
#' \code{1/prob}
#' @return log-likelihood values
Expand All @@ -86,43 +86,61 @@ geom_length_ll <- function(x, prob) {
log(GkmGkm1)
}

#' Likelihood of the length of chains with generic offspring distribution
#' Log-likelihood of the summary (size/length) of chains with generic offspring
#' distribution
#'
#' The likelihoods are calculated with a crude approximation using simulated
#' chains by linearly approximating any missing values in the empirical
#' The log-likelihoods are calculated with a crude approximation using simulated
#' chain summaries by linearly approximating any missing values in the empirical
#' cumulative distribution function (ecdf).
#' @inheritParams likelihood
#' @inheritParams simulate_vec
#' @param chains Vector of sizes/lengths
#' @inheritParams simulate_summary
#' @param x Vector of chain summaries (sizes/lengths)
#' @param nsim_offspring Number of simulations of the offspring distribution
#' for approximating the statistic (size/length) distribution
#' @param log Logical; Should the results be log-transformed? (Defaults
#' to TRUE).
#' @param ... any parameters to pass to \code{\link{simulate_tree}}
#' @return If \code{log = TRUE} (the default), log-likelihood values,
#' else raw likelihoods
#' for approximating the distribution of the chain statistic summary
#' (size/length)
#' @param ... any parameters to pass to \code{\link{simulate_summary}}
#' @return log-likelihood values
#' @author Sebastian Funk
#' @export
offspring_ll <- function(chains, offspring_dist, statistic,
nsim_offspring = 100, log = TRUE, ...) {
#' @seealso [simulate_summary()] for simulating a summary of the transmission
#' chains statistic (without the tree of infections)
#' @examples
#' set.seed(123)
#' chain_size_ll <- offspring_ll(
#' x = c(1, 5, 6, 8, 7, 8, 10),
#' offspring_dist = "pois",
#' statistic = "size",
#' lambda = 0.82
#' )
offspring_ll <- function(x, offspring_dist, statistic,
nsim_offspring = 100, ...) {
# Simulate the chains
chains <- simulate_summary(
nsim_offspring, offspring_dist,
statistic, ...
dist <- simulate_summary(
nchains = nsim_offspring,
offspring_dist = offspring_dist,
statistic = statistic,
...
)

# Compute the empirical Cumulative Distribution Function of the
# simulated chains
chains_empirical_cdf <- stats::ecdf(chains)
chains_empirical_cdf <- stats::ecdf(dist)

# Perform a lagged linear interpolation of the points
acdf <-
diff(c(0, stats::approx(
unique(chains), chains_empirical_cdf(unique(chains)),
seq_len(max(chains[is.finite(chains)]))
)$y))
lik <- acdf[chains]
acdf <- diff(
c(
0,
stats::approx(
unique(dist),
chains_empirical_cdf(unique(dist)),
seq_len(
max(dist[is.finite(dist)])
)
)$y
)
)
lik <- acdf[x]
lik[is.na(lik)] <- 0
out <- ifelse(base::isTRUE(log), log(lik), lik)
out <- log(lik)
return(out)
}
4 changes: 2 additions & 2 deletions man/gborel_size_ll.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions man/geom_length_ll.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading