Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #2142 Supersede get_summary_records() and enhance derive_summary_records() #2158

Merged
merged 43 commits into from
Nov 9, 2023
Merged
Show file tree
Hide file tree
Changes from 39 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
1b7e9ab
feat: #2142 intiial superseding step
Oct 6, 2023
c06bc8c
Merge branch 'main' into 2142_derive_summary_records_get_summary_reco…
zdz2101 Oct 6, 2023
6050dc9
rough draft without missing_values implementation
Oct 6, 2023
13cbdba
Merge branch '2142_derive_summary_records_get_summary_records_mods' o…
Oct 6, 2023
836cb0a
rough draft of new enhancement
Oct 6, 2023
e5a01a3
Merge branch 'main' into 2142_derive_summary_records_get_summary_reco…
zdz2101 Oct 13, 2023
0e6ec6c
feat: #2142 get a good clean slate
Oct 24, 2023
70836b5
Merge branch 'main' into 2142_derive_summary_records_get_summary_reco…
zdz2101 Oct 24, 2023
346f595
feat: #2142 working enhanced function
Oct 26, 2023
19001b5
Merge branch 'main' into 2142_derive_summary_records_get_summary_reco…
zdz2101 Oct 26, 2023
5a6263e
feat: #2142 add appropriate test suite
Oct 26, 2023
843d852
feat: #2142 run styler, lintr, add news and roxygen documentation
Oct 26, 2023
2ea2c50
chore: #2142 spelling/grammar
Oct 26, 2023
252e3c2
chore: #2142 fix test
Oct 26, 2023
bf9470f
should we inform superseded
Oct 26, 2023
1adf60c
retain deprecated arguments to pass cicd
Oct 26, 2023
7ba57b6
chore: #2142 add remotes for admiraldev for proper branching strategy
Oct 27, 2023
fc7e5f3
min dev versioning
Oct 27, 2023
ef2ac94
Update DESCRIPTION
ddsjoberg Oct 27, 2023
4210f7e
Merge branch 'main' into 2142_derive_summary_records_get_summary_reco…
zdz2101 Oct 31, 2023
053a419
chore: #2142 address feedback
Oct 31, 2023
0b4ef60
upversion our description page to match current version up on github
jerryekohe Oct 31, 2023
b61cc15
Merge branch 'main' into 2142_derive_summary_records_get_summary_reco…
zdz2101 Nov 2, 2023
d27bba1
Merge branch 'main' into 2142_derive_summary_records_get_summary_reco…
bms63 Nov 2, 2023
6ee8500
docs: little note for running website versions
bms63 Nov 2, 2023
19ced77
feat: #2142 rename filter to filter_add
zdz2101 Nov 3, 2023
48cbbc6
Update R/derive_summary_records.R
zdz2101 Nov 3, 2023
53bdf63
feat: #2142 get checks appropriately running
zdz2101 Nov 3, 2023
affe938
Merge branch '2142_derive_summary_records_get_summary_records_mods' o…
zdz2101 Nov 3, 2023
a15e58a
roxygen stuff and vignettes
zdz2101 Nov 3, 2023
00ae97a
Merge branch 'main' into 2142_derive_summary_records_get_summary_reco…
zdz2101 Nov 3, 2023
5c690f7
chore: #2142 roxygen stuff
zdz2101 Nov 3, 2023
01bc54d
get past check-templates
zdz2101 Nov 3, 2023
0f7a8ab
finally get past templates
zdz2101 Nov 3, 2023
eaf9892
feat: #2142 clear up missing_values usage
zdz2101 Nov 6, 2023
5c3748a
chore: #2142 rename filter to filter_add internally in codebase too
zdz2101 Nov 6, 2023
89a673e
chore: #2142 adopt and address all other feedback
zdz2101 Nov 6, 2023
3794c1d
Merge branch 'main' into 2142_derive_summary_records_get_summary_reco…
zdz2101 Nov 6, 2023
85f4f9e
missed a renaming
zdz2101 Nov 6, 2023
b3fc993
feat: #2142 remove extra fluff for missing values
zdz2101 Nov 7, 2023
2f24282
update news blurb
zdz2101 Nov 7, 2023
b6d61b6
Merge branch 'main' into 2142_derive_summary_records_get_summary_reco…
zdz2101 Nov 7, 2023
a0dc08a
chore: #2142 update documentation based on feedback
zdz2101 Nov 8, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,9 @@ LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
Depends: R (>= 4.0)
Remotes: pharmaverse/admiraldev
Imports:
admiraldev (>= 0.4.0),
admiraldev (>= 0.5.0.9000),
dplyr (>= 0.8.4),
hms (>= 0.5.3),
lifecycle (>= 0.1.0),
Expand Down
5 changes: 5 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ character vector (`'--DTC'`), was imputed. (#2146)
were enhanced such that more than one summary variable can be derived, e.g.,
`AVAL` as the sum and `ADT` as the maximum of the contributing records. (#1792)

- `derive_summary_records()` was enhanced with the following optional arguments: `dataset_add`, `dataset_ref`, `missing_values`. These arguments respectively, generate summary variables from additional datasets, retain/add specific records from a reference dataset, and impute user-defined missing values. (#2142)

bundfussr marked this conversation as resolved.
Show resolved Hide resolved
- The "joined" functions (`derive_vars_joined()`, `derive_var_joined_exist_flag()`,
`filter_joined()`, and `event_joined()`) were unified: (#2126)
- The `dataset_add` and `filter_add` arguments were added to
Expand All @@ -37,6 +39,7 @@ were enhanced such that more than one summary variable can be derived, e.g.,
allow more control of the selection of records. It creates a temporary variable
for the event number, which can be used in `order`. (#2140)


## Breaking Changes

- `derive_extreme_records()` the `dataset_add` argument is now mandatory. (#2139)
Expand All @@ -45,6 +48,8 @@ for the event number, which can be used in `order`. (#2140)
`analysis_var` and `summary_fun` were deprecated in favor of `set_values_to`.
(#1792)

- In `derive_summary_records()` and `derive_param_exposure()` the argument `filter` was renamed to `filter_add` (#2142)

- In `derive_var_merged_summary()` the arguments `new_var`, `analysis_var`, and
`summary_fun` were deprecated in favor of `new_vars`. (#1792)

Expand Down
63 changes: 53 additions & 10 deletions R/derive_param_exposure.R
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,40 @@
#' `PARAMCD` is expected as well,
#' + Either `ASTDTM` and `AENDTM` or `ASTDT` and `AENDT` are also expected.
bundfussr marked this conversation as resolved.
Show resolved Hide resolved
#'
#' @param filter Filter condition
#' @param dataset_add Additional dataset
#'
#' The specified condition is applied to the input dataset before deriving the
#' new parameter, i.e., only observations fulfilling the condition are taken
#' into account.
#' The variables specified for `by_vars` are expected.
#' Observations from the specified dataset are going to be used to calculate and added
#' as new records to the input dataset (`dataset`).
#'
#' *Permitted Values:* a condition
#'
#' @param filter
#'
#' `r lifecycle::badge("deprecated")` Please use `filter_add` instead.
#'
#' Filter condition as logical expression to apply during
#' summary calculation. By default, filtering expressions are computed within
#' `by_vars` as this will help when an aggregating, lagging, or ranking
#' function is involved.
#'
#' For example,
#'
#' + `filter = (AVAL > mean(AVAL, na.rm = TRUE))` will filter all `AVAL`
#' values greater than mean of `AVAL` with in `by_vars`.
#' + `filter = (dplyr::n() > 2)` will filter n count of `by_vars` greater
#' than 2.
#'
#' @param filter_add Filter condition as logical expression to apply during
#' summary calculation. By default, filtering expressions are computed within
#' `by_vars` as this will help when an aggregating, lagging, or ranking
#' function is involved.
#'
#' For example,
#'
#' + `filter_add = (AVAL > mean(AVAL, na.rm = TRUE))` will filter all `AVAL`
#' values greater than mean of `AVAL` with in `by_vars`.
#' + `filter_add = (dplyr::n() > 2)` will filter n count of `by_vars` greater
#' than 2.
#'
#' @param input_code Required parameter code
#'
Expand Down Expand Up @@ -95,6 +122,7 @@
#' # Cumulative dose
#' adex %>%
#' derive_param_exposure(
#' dataset_add = adex,
#' by_vars = exprs(USUBJID),
#' set_values_to = exprs(PARAMCD = "TDOSE", PARCAT1 = "OVERALL"),
#' input_code = "DOSE",
Expand All @@ -106,6 +134,7 @@
#' # average dose in w2-24
#' adex %>%
#' derive_param_exposure(
#' dataset_add = adex,
#' by_vars = exprs(USUBJID),
#' filter = VISIT %in% c("WEEK 2", "WEEK 24"),
#' set_values_to = exprs(PARAMCD = "AVDW224", PARCAT1 = "WEEK2-24"),
Expand All @@ -118,19 +147,22 @@
#' # Any dose adjustment?
#' adex %>%
#' derive_param_exposure(
#' dataset_add = adex,
#' by_vars = exprs(USUBJID),
#' set_values_to = exprs(PARAMCD = "TADJ", PARCAT1 = "OVERALL"),
#' input_code = "ADJ",
#' analysis_var = AVALC,
#' summary_fun = function(x) if_else(sum(!is.na(x)) > 0, "Y", NA_character_)
#' ) %>%
#' select(-ASTDTM, -AENDTM)
derive_param_exposure <- function(dataset,
derive_param_exposure <- function(dataset = NULL,
dataset_add,
by_vars,
input_code,
analysis_var,
summary_fun,
filter = NULL,
filter_add = NULL,
set_values_to = NULL) {
by_vars <- assert_vars(by_vars)
analysis_var <- assert_symbol(enexpr(analysis_var))
Expand Down Expand Up @@ -158,22 +190,33 @@ derive_param_exposure <- function(dataset,
assert_data_frame(dataset,
required_vars = expr_c(by_vars, analysis_var, exprs(PARAMCD), dates)
bundfussr marked this conversation as resolved.
Show resolved Hide resolved
)
filter <- assert_filter_cond(enexpr(filter), optional = TRUE)
assert_data_frame(dataset_add, required_vars = by_vars)

if (!missing(filter)) {
deprecate_warn(
"1.0.0",
I("derive_param_exposure(filter = )"),
"derive_param_exposure(filter_add = )"
)
filter_add <- assert_filter_cond(enexpr(filter), optional = TRUE)
}
filter_add <- assert_filter_cond(enexpr(filter_add), optional = TRUE)
assert_varval_list(set_values_to, required_elements = "PARAMCD")
assert_param_does_not_exist(dataset, set_values_to$PARAMCD)
assert_character_scalar(input_code)
params_available <- unique(dataset$PARAMCD)
assert_character_vector(input_code, values = params_available)
assert_s3_class(summary_fun, "function")

if (is.null(filter)) {
filter <- TRUE
if (is.null(filter_add)) {
filter_add <- TRUE
}

derive_summary_records(
dataset,
dataset_add,
by_vars = by_vars,
filter = PARAMCD == !!input_code & !!filter,
filter_add = PARAMCD == !!input_code & !!filter_add,
set_values_to = exprs(
!!analysis_var := {{ summary_fun }}(!!analysis_var),
!!!set_dtm,
Expand Down
142 changes: 127 additions & 15 deletions R/derive_summary_records.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,29 @@
#' retain those common values in the newly derived records. Otherwise new value
#' will be set to `NA`.
#'
#' @param dataset `r roxygen_param_dataset(expected_vars = c("by_vars", "analysis_var"))`
#' @param dataset `r roxygen_param_dataset(expected_vars = c("by_vars"))`
#'
#' @param dataset_add Additional dataset
#'
#' The variables specified for `by_vars` are expected.
#' Observations from the specified dataset are going to be used to calculate and added
#' as new records to the input dataset (`dataset`).
#'
#' @param dataset_ref Reference dataset
#'
#' The variables specified for `by_vars` are expected. For each
#' observation of the specified dataset a new observation is added to the
#' input dataset.
#'
#' @param by_vars Variables to consider for generation of groupwise summary
#' records. Providing the names of variables in [exprs()] will create a
#' groupwise summary and generate summary records for the specified groups.
#'
#' @param filter Filter condition as logical expression to apply during
#' @param filter
#'
#' `r lifecycle::badge("deprecated")` Please use `filter_add` instead.
#'
#' Filter condition as logical expression to apply during
#' summary calculation. By default, filtering expressions are computed within
#' `by_vars` as this will help when an aggregating, lagging, or ranking
#' function is involved.
Expand All @@ -29,6 +45,46 @@
#' + `filter = (dplyr::n() > 2)` will filter n count of `by_vars` greater
#' than 2.
#'
#' @param filter_add Filter condition as logical expression to apply during
#' summary calculation. By default, filtering expressions are computed within
#' `by_vars` as this will help when an aggregating, lagging, or ranking
#' function is involved.
#'
#' For example,
#'
#' + `filter_add = (AVAL > mean(AVAL, na.rm = TRUE))` will filter all `AVAL`
#' values greater than mean of `AVAL` with in `by_vars`.
#' + `filter_add = (dplyr::n() > 2)` will filter n count of `by_vars` greater
#' than 2.
#'
#' @param set_values_to Variables to be set
#'
#' The specified variables are set to the specified values for the new
#' observations.
#'
#' Set a list of variables to some specified value for the new records
#' + LHS refer to a variable.
#' + RHS refers to the values to set to the variable. This can be a string, a
#' symbol, a numeric value, an expression or NA. If summary functions are
#' used, the values are summarized by the variables specified for `by_vars`.
#'
#' For example:
#' ```
#' set_values_to = exprs(
#' AVAL = sum(AVAL),
#' DTYPE = "AVERAGE",
#' )
#' ```
#'
#' @param missing_values Values for missing summary values
#'
#' For observations of the input dataset (`dataset`) or (`dataset_add`) which do not have a
#' complete mapping defined by the summarization defined in `set_values_to`. Only variables
bundfussr marked this conversation as resolved.
Show resolved Hide resolved
#' specified for `set_values_to` can be specified for `missing_values`.
#'
#' *Permitted Values*: named list of expressions, e.g.,
#' `exprs(AVAL = -9999)`
#'
#' @inheritParams get_summary_records
#'
#' @return A data frame with derived records appended to original dataset.
Expand Down Expand Up @@ -72,6 +128,7 @@
#' # Summarize the average of the triplicate ECG interval values (AVAL)
#' derive_summary_records(
#' adeg,
#' dataset_add = adeg,
#' by_vars = exprs(USUBJID, PARAM, AVISIT),
#' set_values_to = exprs(
#' AVAL = mean(AVAL, na.rm = TRUE),
Expand All @@ -83,6 +140,7 @@
#' # Derive more than one summary variable
#' derive_summary_records(
#' adeg,
#' dataset_add = adeg,
#' by_vars = exprs(USUBJID, PARAM, AVISIT),
#' set_values_to = exprs(
#' AVAL = mean(AVAL),
Expand Down Expand Up @@ -116,6 +174,7 @@
#' # by group
#' derive_summary_records(
#' adeg,
#' dataset_add = adeg,
#' by_vars = exprs(USUBJID, PARAM, AVISIT),
#' filter = n() > 2,
bundfussr marked this conversation as resolved.
Show resolved Hide resolved
#' set_values_to = exprs(
Expand All @@ -124,19 +183,27 @@
#' )
#' ) %>%
#' arrange(USUBJID, AVISIT)
derive_summary_records <- function(dataset,
derive_summary_records <- function(dataset = NULL,
dataset_add,
dataset_ref = NULL,
by_vars,
filter = NULL,
bundfussr marked this conversation as resolved.
Show resolved Hide resolved
filter_add = NULL,
analysis_var,
summary_fun,
set_values_to) {
set_values_to,
missing_values = NULL) {
assert_vars(by_vars)
filter <- assert_filter_cond(enexpr(filter), optional = TRUE)
assert_data_frame(dataset, required_vars = by_vars)
bundfussr marked this conversation as resolved.
Show resolved Hide resolved
assert_data_frame(dataset_add, required_vars = by_vars)
assert_data_frame(
dataset,
required_vars = by_vars
dataset_ref,
required_vars = by_vars,
optional = TRUE
)

assert_varval_list(set_values_to)
assert_expr_list(missing_values, named = TRUE, optional = TRUE)

if (!missing(analysis_var) || !missing(summary_fun)) {
deprecate_warn(
Expand All @@ -149,14 +216,59 @@ derive_summary_records <- function(dataset,
set_values_to <- exprs(!!analysis_var := {{ summary_fun }}(!!analysis_var), !!!set_values_to)
}

# Summarise the analysis value and bind to the original dataset
bind_rows(
dataset,
get_summary_records(
dataset,
by_vars = by_vars,
filter = !!filter,
set_values_to = set_values_to
if (!missing(filter)) {
deprecate_warn(
"1.0.0",
I("derive_summary_records(filter = )"),
"derive_summary_records(filter_add = )"
)
filter_add <- assert_filter_cond(enexpr(filter), optional = TRUE)
}
filter_add <- assert_filter_cond(enexpr(filter_add), optional = TRUE)

summary_records <- dataset_add %>%
group_by(!!!by_vars) %>%
filter_if(filter_add) %>%
summarise(!!!set_values_to) %>%
ungroup()

df_return <- bind_rows(
dataset,
summary_records
)

if (!is.null(dataset_ref)) {
add_vars <- colnames(dataset_add)
ref_vars <- colnames(dataset_ref)

new_ref_obs <- anti_join(
select(dataset_ref, intersect(add_vars, ref_vars)),
select(summary_records, !!!by_vars),
by = map_chr(by_vars, as_name)
)

tmp_ref_obs <- get_new_tmp_var(new_ref_obs, prefix = "tmp_ref_obs")

new_ref_obs <- new_ref_obs %>%
mutate(!!tmp_ref_obs := 1L)

df_return <- bind_rows(
df_return,
new_ref_obs
)
}

if (!is.null(missing_values)) {
update_missings <- map2(
syms(names(missing_values)),
missing_values,
~ expr(if_else(is.na(!!.x) & tmp_ref_obs_1 == 1, !!.y, !!.x))
)
names(update_missings) <- names(missing_values)
df_return <- df_return %>%
mutate(!!!update_missings)
}
bundfussr marked this conversation as resolved.
Show resolved Hide resolved

df_return %>%
remove_tmp_vars()
}
12 changes: 9 additions & 3 deletions R/get_summary_records.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
#' Create Summary Records
#'
#' @description
#'
#' `r lifecycle::badge("superseded")`
#'
#' Development on `get_summary_records()` is complete, and for new code we recommend
#' switching to using the `dataset_add` argument in `derive_summary_records()`.
#'
#' It is not uncommon to have an analysis need whereby one needs to derive an
#' analysis value (`AVAL`) from multiple records. The ADaM basic dataset
#' structure variable `DTYPE` is available to indicate when a new derived
Expand Down Expand Up @@ -64,9 +70,9 @@
#'
#' @return A data frame of derived records.
#'
#' @family der_gen
#' @family superseded
#'
#' @keywords der_gen
#' @keywords superseded
#'
#' @seealso [derive_summary_records()], [derive_var_merged_summary()]
#'
Expand Down Expand Up @@ -154,7 +160,7 @@ get_summary_records <- function(dataset,
filter = NULL,
analysis_var,
summary_fun,
set_values_to) {
set_values_to = NULL) {
assert_vars(by_vars)
filter <- assert_filter_cond(enexpr(filter), optional = TRUE)
assert_data_frame(
Expand Down
Loading
Loading