Skip to content

Commit

Permalink
Closes #2590 improve_joined: make new code optional (#2624)
Browse files Browse the repository at this point in the history
* #2590 improve_joined: make new code optional

* #2590 improve_joined: style files

* #2590 improve_joined: add link to set_admiral_options()
  • Loading branch information
bundfussr authored Jan 7, 2025
1 parent e1ced33 commit a98db65
Show file tree
Hide file tree
Showing 17 changed files with 234 additions and 64 deletions.
4 changes: 3 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,9 @@ are not unique. (#2563)

- The functions `derive_vars_joined()`, `derive_var_joined_exist_flag()`,
`derive_extreme_event()`, and `filter_joined()` were updated to reduce their
memory consumption. (#2590)
memory consumption. As the new code increases the run-time, it is not used by
default. To enable it the new admiral option `save_memory` has to be set to
`TRUE`. (#2590)

- The function `compute_egfr()` updated to allow missing values for sex which result in missing values for output. (#2612)

Expand Down
3 changes: 2 additions & 1 deletion R/admiral_environment.R
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,8 @@ admiral_environment <- new.env(parent = emptyenv())
admiral_environment$admiral_options <- list(
# future_input = exprs(...), nolint
subject_keys = exprs(STUDYID, USUBJID),
signif_digits = 15
signif_digits = 15,
save_memory = FALSE
)

# To enhance features and add inputs as necessary
Expand Down
18 changes: 14 additions & 4 deletions R/admiral_options.R
Original file line number Diff line number Diff line change
Expand Up @@ -65,14 +65,15 @@ get_admiral_option <- function(option) {
# change cli `.val` to end with OR instead of AND
divid <- cli_div(theme = list(.val = list("vec-last" = ", or ", "vec_sep2" = " or ")))
# Return message otherwise, catch typos
cli_abort(c("Invalid function argument.",
cli_abort(c(
"Invalid function argument.",
"i" = "Select one of {.val {possible_inputs}}"
))
}

#' Set the Value of Admiral Options
#' Set the Value of admiral Options
#'
#' Set the Values of Admiral Options That Can Be Modified for Advanced Users.
#' Set the values of admiral options that can be modified for advanced users.
#'
#' @param subject_keys Variables to uniquely identify a subject, defaults to
#' `exprs(STUDYID, USUBJID)`. This option is used as default value for the
Expand All @@ -82,6 +83,11 @@ get_admiral_option <- function(option) {
#' defaults to `15`. This option is used as default value for the `signif_dig` argument in
#' admiral functions `derive_var_atoxgr_dir()` and `derive_var_anrind()`.
#'
#' @param save_memory If set to `TRUE`, an alternative algorithm is used in the
#' functions `derive_vars_joined()`, `derive_var_joined_exist_flag()`,
#' `derive_extreme_event()`, and `filter_joined()` which requires less memory
#' but more run-time.
#'
#' @details
#' Modify an admiral option, e.g `subject_keys`, such that it automatically affects downstream
#' function inputs where `get_admiral_option()` is called such as `derive_param_exist_flag()`.
Expand Down Expand Up @@ -153,7 +159,7 @@ get_admiral_option <- function(option) {
#'
#' derive_var_anrind(advs)
#'
set_admiral_options <- function(subject_keys, signif_digits) {
set_admiral_options <- function(subject_keys, signif_digits, save_memory) {
if (!missing(subject_keys)) {
assert_vars(subject_keys)
admiral_environment$admiral_options$subject_keys <- subject_keys
Expand All @@ -162,6 +168,10 @@ set_admiral_options <- function(subject_keys, signif_digits) {
assert_integer_scalar(signif_digits, subset = "positive")
admiral_environment$admiral_options$signif_digits <- signif_digits
}
if (!missing(save_memory)) {
assert_logical_scalar(save_memory)
admiral_environment$admiral_options$save_memory <- save_memory
}

# Add future input to function formals above
# if (!missing(future_input)) {
Expand Down
1 change: 1 addition & 0 deletions R/derive_extreme_event.R
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,7 @@
#' the selected observations.
#' 1. The observations are added to input dataset.
#'
#' `r roxygen_save_memory()`
#'
#' @return The input dataset with the best or worst observation of each by group
#' added as new observations.
Expand Down
84 changes: 50 additions & 34 deletions R/derive_joined.R
Original file line number Diff line number Diff line change
Expand Up @@ -198,6 +198,8 @@
#' Observations in the additional dataset which have no matching observation in
#' the input dataset are ignored.
#'
#' `r roxygen_save_memory()`
#'
#' @return The output dataset contains all observations and variables of the
#' input dataset and additionally the variables specified for `new_vars` from
#' the additional dataset (`dataset_add`).
Expand Down Expand Up @@ -814,43 +816,57 @@ get_joined_data <- function(dataset,
!!tmp_obs_nr_var
)

# split input dataset into smaller pieces and process them separately
# This reduces the memory consumption.
if (is.null(by_vars_left)) {
# create batches of about 1MB input data
obs_per_batch <- floor(1000000 / as.numeric(object.size(data) / nrow(data)))
tmp_batch_nr <- get_new_tmp_var(data, prefix = "tmp_batch_nr")
data_list <- data %>%
mutate(!!tmp_batch_nr := ceiling(row_number() / obs_per_batch)) %>%
group_by(!!tmp_batch_nr) %>%
group_split(.keep = FALSE)
data_add_list <- list(data_add_to_join)
if (get_admiral_option("save_memory")) {
# split input dataset into smaller pieces and process them separately
# This reduces the memory consumption.
if (is.null(by_vars_left)) {
# create batches of about 1MB input data
obs_per_batch <- floor(1000000 / as.numeric(object.size(data) / nrow(data)))
tmp_batch_nr <- get_new_tmp_var(data, prefix = "tmp_batch_nr")
data_list <- data %>%
mutate(!!tmp_batch_nr := ceiling(row_number() / obs_per_batch)) %>%
group_by(!!tmp_batch_nr) %>%
group_split(.keep = FALSE)
data_add_list <- list(data_add_to_join)
} else {
data_nest <- nest(data, data = everything(), .by = vars2chr(unname(by_vars_left)))
data_add_nest <- nest(data_add, data_add = everything(), .by = vars2chr(unname(by_vars_left)))
data_all_nest <- inner_join(data_nest, data_add_nest, by = vars2chr(by_vars_left))
data_list <- data_all_nest$data
data_add_list <- data_all_nest$data_add
}

joined_data <- map2(
data_list,
data_add_list,
function(x, y) {
get_joined_sub_data(
x,
y,
by_vars = by_vars_left,
tmp_obs_nr_var = tmp_obs_nr_var,
tmp_obs_nr_left = tmp_obs_nr_left,
join_type = join_type,
first_cond_upper = first_cond_upper,
first_cond_lower = first_cond_lower,
filter_join = filter_join
)
}
)
} else {
data_nest <- nest(data, data = everything(), .by = vars2chr(unname(by_vars_left)))
data_add_nest <- nest(data_add, data_add = everything(), .by = vars2chr(unname(by_vars_left)))
data_all_nest <- inner_join(data_nest, data_add_nest, by = vars2chr(by_vars_left))
data_list <- data_all_nest$data
data_add_list <- data_all_nest$data_add
joined_data <- get_joined_sub_data(
data,
dataset_add = data_add,
by_vars = by_vars_left,
tmp_obs_nr_var = tmp_obs_nr_var,
tmp_obs_nr_left = tmp_obs_nr_left,
join_type = join_type,
first_cond_upper = first_cond_upper,
first_cond_lower = first_cond_lower,
filter_join = filter_join
)
}

joined_data <- map2(
data_list,
data_add_list,
function(x, y) {
get_joined_sub_data(
x,
y,
by_vars = by_vars_left,
tmp_obs_nr_var = tmp_obs_nr_var,
tmp_obs_nr_left = tmp_obs_nr_left,
join_type = join_type,
first_cond_upper = first_cond_upper,
first_cond_lower = first_cond_lower,
filter_join = filter_join
)
}
)

bind_rows(joined_data) %>%
remove_tmp_vars() %>%
select(-!!tmp_obs_nr_var_join)
Expand Down
3 changes: 2 additions & 1 deletion R/derive_var_joined_exist_flag.R
Original file line number Diff line number Diff line change
Expand Up @@ -196,8 +196,9 @@
#' set to `true_value` for all observations which were selected in the
#' previous step. For the other observations it is set to `false_value`.
#'
#' @return The input dataset with the variable specified by `new_var` added.
#' `r roxygen_save_memory()`
#'
#' @return The input dataset with the variable specified by `new_var` added.
#'
#' @keywords der_gen
#' @family der_gen
Expand Down
2 changes: 2 additions & 0 deletions R/filter_joined.R
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,8 @@
#' The first observation of each group is selected and the `*.join` variables
#' are dropped.
#'
#' `r roxygen_save_memory()`
#'
#' @returns A subset of the observations of the input dataset. All variables of
#' the input dataset are included in the output dataset.
#'
Expand Down
10 changes: 10 additions & 0 deletions R/roxygen2.R
Original file line number Diff line number Diff line change
Expand Up @@ -47,3 +47,13 @@ roxygen_order_na_handling <- function() {
"[Sort Order](../articles/generic.html#sort_order)."
)
}

roxygen_save_memory <- function() {
paste(
"**Note:** This function creates temporary datasets which may be much bigger",
"than the input datasets. If this causes memory issues, please try setting",
"the admiral option `save_memory` to `TRUE` (see `set_admiral_options()`).",
"This reduces the memory consumption but increases the run-time.",
sep = "\n"
)
}
5 changes: 5 additions & 0 deletions man/derive_extreme_event.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions man/derive_var_joined_exist_flag.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions man/derive_vars_joined.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions man/filter_joined.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/get_admiral_option.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

11 changes: 8 additions & 3 deletions man/set_admiral_options.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 9 additions & 0 deletions tests/testthat/_snaps/admiral_options.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# get_admiral_option Test 2: common typo gives error to select available options

Code
get_admiral_option("subject_key")
Condition
Error in `get_admiral_option()`:
! Invalid function argument.
i Select one of "subject_keys", "signif_digits", or "save_memory"

Loading

0 comments on commit a98db65

Please sign in to comment.