Skip to content

Commit

Permalink
Closes #2142 Supersede get_summary_records() and enhance `derive_su…
Browse files Browse the repository at this point in the history
…mmary_records()` (#2158)

* feat: #2142 intiial superseding step

* rough draft without missing_values implementation

* rough draft of new enhancement

* feat: #2142 get a good clean slate

* feat: #2142 working enhanced function

* feat: #2142 add appropriate test suite

* feat: #2142 run styler, lintr, add news and roxygen documentation

* chore: #2142 spelling/grammar

* chore: #2142 fix test

* should we inform superseded

* retain deprecated arguments to pass cicd

* chore: #2142 add remotes for admiraldev for proper branching strategy

* min dev versioning

* Update DESCRIPTION

* chore: #2142 address feedback

* upversion our description page to match current version up on github

* docs: little note for running website versions

* feat: #2142 rename filter to filter_add

* Update R/derive_summary_records.R

Co-authored-by: Ben Straub <[email protected]>

* feat: #2142 get checks appropriately running

* roxygen stuff and vignettes

* chore: #2142 roxygen stuff

* get past check-templates

* finally get past templates

* feat: #2142 clear up missing_values usage

* chore: #2142 rename filter to filter_add internally in codebase too

* chore: #2142 adopt and address all other feedback

* missed a renaming

* feat: #2142 remove extra fluff for missing values

* update news blurb

* chore: #2142 update documentation based on feedback

---------

Co-authored-by: Zelos Zhu <[email protected]>
Co-authored-by: Daniel Sjoberg <[email protected]>
Co-authored-by: Jerry Johnson <[email protected]>
Co-authored-by: Ben Straub <[email protected]>
  • Loading branch information
5 people authored Nov 9, 2023
1 parent 708d232 commit 205e0d1
Show file tree
Hide file tree
Showing 32 changed files with 439 additions and 104 deletions.
3 changes: 2 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,9 @@ LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
Depends: R (>= 4.0)
Remotes: pharmaverse/admiraldev
Imports:
admiraldev (>= 0.4.0),
admiraldev (>= 0.5.0.9000),
dplyr (>= 0.8.4),
hms (>= 0.5.3),
lifecycle (>= 0.1.0),
Expand Down
7 changes: 7 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,10 @@ character vector (`'--DTC'`), was imputed. (#2146)
were enhanced such that more than one summary variable can be derived, e.g.,
`AVAL` as the sum and `ADT` as the maximum of the contributing records. (#1792)

- `derive_summary_records()` was enhanced with the following optional arguments: `dataset_add`, `dataset_ref`, `missing_values`. These arguments respectively, generate summary variables from additional datasets, retain/add specific records from a reference dataset, and impute user-defined missing values. `derive_param_exposure()` was enhanced with `dataset_add` as well. (#2142)

- The argument `dataset` is now optional for `derive_summary_records()` and `derive_param_exposure()`. (#2142)

- The "joined" functions (`derive_vars_joined()`, `derive_var_joined_exist_flag()`,
`filter_joined()`, and `event_joined()`) were unified: (#2126)
- The `dataset_add` and `filter_add` arguments were added to
Expand All @@ -37,6 +41,7 @@ were enhanced such that more than one summary variable can be derived, e.g.,
allow more control of the selection of records. It creates a temporary variable
for the event number, which can be used in `order`. (#2140)


## Breaking Changes

- `derive_extreme_records()` the `dataset_add` argument is now mandatory. (#2139)
Expand All @@ -45,6 +50,8 @@ for the event number, which can be used in `order`. (#2140)
`analysis_var` and `summary_fun` were deprecated in favor of `set_values_to`.
(#1792)

- In `derive_summary_records()` and `derive_param_exposure()` the argument `filter` was renamed to `filter_add` (#2142)

- In `derive_var_merged_summary()` the arguments `new_var`, `analysis_var`, and
`summary_fun` were deprecated in favor of `new_vars`. (#1792)

Expand Down
70 changes: 56 additions & 14 deletions R/derive_param_exposure.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,43 @@
#' start (`ASTDT(M)`)and end date (`AENDT(M)`) as the minimum and maximum date by `by_vars`.
#'
#' @param dataset
#' `r roxygen_param_dataset(expected_vars = c("by_vars", "analysis_var"))`
#' `PARAMCD` is expected as well,
#' + Either `ASTDTM` and `AENDTM` or `ASTDT` and `AENDT` are also expected.
#' `r roxygen_param_dataset(expected_vars = c("by_vars"))`
#'
#' @param filter Filter condition
#' @param dataset_add Additional dataset
#'
#' The specified condition is applied to the input dataset before deriving the
#' new parameter, i.e., only observations fulfilling the condition are taken
#' into account.
#' The variables specified for `by_vars`, `analysis_var`, `PARAMCD`,
#' alongside either `ASTDTM` and `AENDTM` or `ASTDT` and `AENDT` are also expected.
#' Observations from the specified dataset are going to be used to calculate and added
#' as new records to the input dataset (`dataset`).
#'
#' *Permitted Values:* a condition
#'
#' @param filter
#'
#' `r lifecycle::badge("deprecated")` Please use `filter_add` instead.
#'
#' Filter condition as logical expression to apply during
#' summary calculation. By default, filtering expressions are computed within
#' `by_vars` as this will help when an aggregating, lagging, or ranking
#' function is involved.
#'
#' For example,
#'
#' + `filter = (AVAL > mean(AVAL, na.rm = TRUE))` will filter all `AVAL`
#' values greater than mean of `AVAL` with in `by_vars`.
#' + `filter = (dplyr::n() > 2)` will filter n count of `by_vars` greater
#' than 2.
#'
#' @param filter_add Filter condition as logical expression to apply during
#' summary calculation. By default, filtering expressions are computed within
#' `by_vars` as this will help when an aggregating, lagging, or ranking
#' function is involved.
#'
#' For example,
#'
#' + `filter_add = (AVAL > mean(AVAL, na.rm = TRUE))` will filter all `AVAL`
#' values greater than mean of `AVAL` with in `by_vars`.
#' + `filter_add = (dplyr::n() > 2)` will filter n count of `by_vars` greater
#' than 2.
#'
#' @param input_code Required parameter code
#'
Expand Down Expand Up @@ -95,6 +121,7 @@
#' # Cumulative dose
#' adex %>%
#' derive_param_exposure(
#' dataset_add = adex,
#' by_vars = exprs(USUBJID),
#' set_values_to = exprs(PARAMCD = "TDOSE", PARCAT1 = "OVERALL"),
#' input_code = "DOSE",
Expand All @@ -106,6 +133,7 @@
#' # average dose in w2-24
#' adex %>%
#' derive_param_exposure(
#' dataset_add = adex,
#' by_vars = exprs(USUBJID),
#' filter = VISIT %in% c("WEEK 2", "WEEK 24"),
#' set_values_to = exprs(PARAMCD = "AVDW224", PARCAT1 = "WEEK2-24"),
Expand All @@ -118,19 +146,22 @@
#' # Any dose adjustment?
#' adex %>%
#' derive_param_exposure(
#' dataset_add = adex,
#' by_vars = exprs(USUBJID),
#' set_values_to = exprs(PARAMCD = "TADJ", PARCAT1 = "OVERALL"),
#' input_code = "ADJ",
#' analysis_var = AVALC,
#' summary_fun = function(x) if_else(sum(!is.na(x)) > 0, "Y", NA_character_)
#' ) %>%
#' select(-ASTDTM, -AENDTM)
derive_param_exposure <- function(dataset,
derive_param_exposure <- function(dataset = NULL,
dataset_add,
by_vars,
input_code,
analysis_var,
summary_fun,
filter = NULL,
filter_add = NULL,
set_values_to = NULL) {
by_vars <- assert_vars(by_vars)
analysis_var <- assert_symbol(enexpr(analysis_var))
Expand All @@ -155,25 +186,36 @@ derive_param_exposure <- function(dataset,
)
}

assert_data_frame(dataset,
assert_data_frame(dataset, required_vars = by_vars, optional = TRUE)
assert_data_frame(dataset_add,
required_vars = expr_c(by_vars, analysis_var, exprs(PARAMCD), dates)
)
filter <- assert_filter_cond(enexpr(filter), optional = TRUE)

if (!missing(filter)) {
deprecate_warn(
"1.0.0",
I("derive_param_exposure(filter = )"),
"derive_param_exposure(filter_add = )"
)
filter_add <- assert_filter_cond(enexpr(filter), optional = TRUE)
}
filter_add <- assert_filter_cond(enexpr(filter_add), optional = TRUE)
assert_varval_list(set_values_to, required_elements = "PARAMCD")
assert_param_does_not_exist(dataset, set_values_to$PARAMCD)
assert_character_scalar(input_code)
params_available <- unique(dataset$PARAMCD)
assert_character_vector(input_code, values = params_available)
assert_s3_class(summary_fun, "function")

if (is.null(filter)) {
filter <- TRUE
if (is.null(filter_add)) {
filter_add <- TRUE
}

derive_summary_records(
dataset,
dataset_add,
by_vars = by_vars,
filter = PARAMCD == !!input_code & !!filter,
filter_add = PARAMCD == !!input_code & !!filter_add,
set_values_to = exprs(
!!analysis_var := {{ summary_fun }}(!!analysis_var),
!!!set_dtm,
Expand Down
132 changes: 116 additions & 16 deletions R/derive_summary_records.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,29 @@
#' retain those common values in the newly derived records. Otherwise new value
#' will be set to `NA`.
#'
#' @param dataset `r roxygen_param_dataset(expected_vars = c("by_vars", "analysis_var"))`
#' @param dataset `r roxygen_param_dataset(expected_vars = c("by_vars"))`
#'
#' @param dataset_add Additional dataset
#'
#' The variables specified for `by_vars` are expected.
#' Observations from the specified dataset are going to be used to calculate and added
#' as new records to the input dataset (`dataset`).
#'
#' @param dataset_ref Reference dataset
#'
#' The variables specified for `by_vars` are expected. For each
#' observation of the specified dataset a new observation is added to the
#' input dataset.
#'
#' @param by_vars Variables to consider for generation of groupwise summary
#' records. Providing the names of variables in [exprs()] will create a
#' groupwise summary and generate summary records for the specified groups.
#'
#' @param filter Filter condition as logical expression to apply during
#' @param filter
#'
#' `r lifecycle::badge("deprecated")` Please use `filter_add` instead.
#'
#' Filter condition as logical expression to apply during
#' summary calculation. By default, filtering expressions are computed within
#' `by_vars` as this will help when an aggregating, lagging, or ranking
#' function is involved.
Expand All @@ -29,6 +45,46 @@
#' + `filter = (dplyr::n() > 2)` will filter n count of `by_vars` greater
#' than 2.
#'
#' @param filter_add Filter condition as logical expression to apply during
#' summary calculation. By default, filtering expressions are computed within
#' `by_vars` as this will help when an aggregating, lagging, or ranking
#' function is involved.
#'
#' For example,
#'
#' + `filter_add = (AVAL > mean(AVAL, na.rm = TRUE))` will filter all `AVAL`
#' values greater than mean of `AVAL` with in `by_vars`.
#' + `filter_add = (dplyr::n() > 2)` will filter n count of `by_vars` greater
#' than 2.
#'
#' @param set_values_to Variables to be set
#'
#' The specified variables are set to the specified values for the new
#' observations.
#'
#' Set a list of variables to some specified value for the new records
#' + LHS refer to a variable.
#' + RHS refers to the values to set to the variable. This can be a string, a
#' symbol, a numeric value, an expression or NA. If summary functions are
#' used, the values are summarized by the variables specified for `by_vars`.
#'
#' For example:
#' ```
#' set_values_to = exprs(
#' AVAL = sum(AVAL),
#' DTYPE = "AVERAGE",
#' )
#' ```
#'
#' @param missing_values Values for missing summary values
#'
#' For observations of the reference dataset (`dataset_ref`) which do not have a
#' complete mapping defined by the summarization defined in `set_values_to`. Only variables
#' specified for `set_values_to` can be specified for `missing_values`.
#'
#' *Permitted Values*: named list of expressions, e.g.,
#' `exprs(AVAL = -9999)`
#'
#' @inheritParams get_summary_records
#'
#' @return A data frame with derived records appended to original dataset.
Expand Down Expand Up @@ -72,6 +128,7 @@
#' # Summarize the average of the triplicate ECG interval values (AVAL)
#' derive_summary_records(
#' adeg,
#' dataset_add = adeg,
#' by_vars = exprs(USUBJID, PARAM, AVISIT),
#' set_values_to = exprs(
#' AVAL = mean(AVAL, na.rm = TRUE),
Expand All @@ -83,6 +140,7 @@
#' # Derive more than one summary variable
#' derive_summary_records(
#' adeg,
#' dataset_add = adeg,
#' by_vars = exprs(USUBJID, PARAM, AVISIT),
#' set_values_to = exprs(
#' AVAL = mean(AVAL),
Expand Down Expand Up @@ -116,27 +174,36 @@
#' # by group
#' derive_summary_records(
#' adeg,
#' dataset_add = adeg,
#' by_vars = exprs(USUBJID, PARAM, AVISIT),
#' filter = n() > 2,
#' filter_add = n() > 2,
#' set_values_to = exprs(
#' AVAL = mean(AVAL, na.rm = TRUE),
#' DTYPE = "AVERAGE"
#' )
#' ) %>%
#' arrange(USUBJID, AVISIT)
derive_summary_records <- function(dataset,
derive_summary_records <- function(dataset = NULL,
dataset_add,
dataset_ref = NULL,
by_vars,
filter = NULL,
filter_add = NULL,
analysis_var,
summary_fun,
set_values_to) {
set_values_to,
missing_values = NULL) {
assert_vars(by_vars)
filter <- assert_filter_cond(enexpr(filter), optional = TRUE)
assert_data_frame(dataset, required_vars = by_vars, optional = TRUE)
assert_data_frame(dataset_add, required_vars = by_vars)
assert_data_frame(
dataset,
required_vars = by_vars
dataset_ref,
required_vars = by_vars,
optional = TRUE
)

assert_varval_list(set_values_to)
assert_expr_list(missing_values, named = TRUE, optional = TRUE)

if (!missing(analysis_var) || !missing(summary_fun)) {
deprecate_warn(
Expand All @@ -149,14 +216,47 @@ derive_summary_records <- function(dataset,
set_values_to <- exprs(!!analysis_var := {{ summary_fun }}(!!analysis_var), !!!set_values_to)
}

# Summarise the analysis value and bind to the original dataset
bind_rows(
dataset,
get_summary_records(
dataset,
by_vars = by_vars,
filter = !!filter,
set_values_to = set_values_to
if (!missing(filter)) {
deprecate_warn(
"1.0.0",
I("derive_summary_records(filter = )"),
"derive_summary_records(filter_add = )"
)
filter_add <- assert_filter_cond(enexpr(filter), optional = TRUE)
}
filter_add <- assert_filter_cond(enexpr(filter_add), optional = TRUE)

summary_records <- dataset_add %>%
group_by(!!!by_vars) %>%
filter_if(filter_add) %>%
summarise(!!!set_values_to) %>%
ungroup()

df_return <- bind_rows(
dataset,
summary_records
)

if (!is.null(dataset_ref)) {
add_vars <- colnames(dataset_add)
ref_vars <- colnames(dataset_ref)

new_ref_obs <- anti_join(
select(dataset_ref, intersect(add_vars, ref_vars)),
select(summary_records, !!!by_vars),
by = map_chr(by_vars, as_name)
)

if (!is.null(missing_values)) {
new_ref_obs <- new_ref_obs %>%
mutate(!!!missing_values)
}

df_return <- bind_rows(
df_return,
new_ref_obs
)
}

df_return
}
Loading

0 comments on commit 205e0d1

Please sign in to comment.