From 13644bd3a7457e84d9ad06cc588c1313b5480127 Mon Sep 17 00:00:00 2001 From: Ramiro Magno Date: Tue, 18 Jun 2024 02:46:47 +0100 Subject: [PATCH] if_then_else support via "conditioned" data frames (#55) * Basic support for "conditioned" data frames - Adds a new S3 class (cnd_df) for represented conditioned data frames, i.e. data frames that carry metadata about what records should be used for derivations - Adds support for basic pretty printing of cnd_df objects - Adds a user-facing function for creating such cnd_df objects: `condition_by` - Adds experimental "mutate"-version function for these conditioned data frames: `derive_by_condition()` * Basic support for conditioned data sets * Extensive support for conditioned tibbles - Joins by raw and target data sets are now aware of conditioned tibbles - Transformation functions, namely `assign_datetime()`, `hardcode*()` and `assign*` are also conditioned-tibble aware - Unit test coverage for most cases indicated at #54 I believe the essential components are here to support the if_then_else algorithm via conditioned tibbles. Now, further testing, assertions and documentation is needed. * Ramm's feedback integration - Move `tgt_dat` to the first position in the argument list for cleaner command pipes. - Rename `condition_by()` to `condition_add()`. - Export `oak_id_vars()` for direct user access. - Update tidyselections to align with the latest practices. * Update on conditioned data frames - Documentation - Examples - New article about cnd_df (WIP) * Styling fixes * Update linting and styling * Tidying up - No need for S3 methods to be exported - `condition_add()` now links to the appropriate article about conditioned data frames - Documentation tweaks - Version bump, NEWS update and pkgdown reference list update * Last tweaks - Add example for `condition_add()` - Re-export S3 methods for `cnd_df` - Update pkgdown reference list * Remove blank line * Tweaks to `%.>%` docs * Automatic renv profile update. --------- Authored-by: ramiromagno --- DESCRIPTION | 7 +- NAMESPACE | 8 + NEWS.md | 8 +- R/assertions.R | 16 + R/assign.R | 67 ++- R/assign_datetime.R | 47 +- R/cnd_df.R | 444 ++++++++++++++++++ R/ct.R | 4 +- R/hardcode.R | 73 ++- R/join.R | 37 ++ R/oak_id_vars.R | 8 +- R/pipe.R | 73 +++ R/sdtm.oak-package.R | 2 +- R/sdtm_join.R | 37 ++ _pkgdown.yml | 16 +- inst/WORDLIST | 3 + man/assign.Rd | 32 +- man/assign_datetime.Rd | 32 +- man/condition_add.Rd | 36 ++ man/contains_oak_id_vars.Rd | 2 +- man/ctl_new_rowid_pillar.cnd_df.Rd | 23 + man/dot_pipe.Rd | 64 +++ man/eval_conditions.Rd | 75 +++ man/figures/lifecycle-archived.svg | 21 + man/figures/lifecycle-defunct.svg | 21 + man/figures/lifecycle-deprecated.svg | 21 + man/figures/lifecycle-experimental.svg | 21 + man/figures/lifecycle-maturing.svg | 21 + man/figures/lifecycle-questioning.svg | 21 + man/figures/lifecycle-soft-deprecated.svg | 21 + man/figures/lifecycle-stable.svg | 29 ++ man/figures/lifecycle-superseded.svg | 21 + man/get_cnd_df_cnd.Rd | 33 ++ man/get_cnd_df_cnd_sum.Rd | 32 ++ man/harcode.Rd | 38 +- man/is_cnd_df.Rd | 32 ++ man/mutate.cnd_df.Rd | 68 +++ man/new_cnd_df.Rd | 44 ++ man/oak_id_vars.Rd | 5 +- man/rm_cnd_df.Rd | 31 ++ man/sdtm_assign.Rd | 20 +- man/sdtm_hardcode.Rd | 20 +- man/sdtm_join.Rd | 35 ++ man/tbl_sum.cnd_df.Rd | 28 ++ renv/profiles/4.4/renv.lock | 538 ---------------------- renv/profiles/4.4/renv/settings.json | 19 - tests/testthat/test-assign.R | 167 +++++++ tests/testthat/test-assign_datetime.R | 12 +- tests/testthat/test-cnd_df.R | 81 ++++ tests/testthat/test-condition_add.R | 86 ++++ tests/testthat/test-eval_conditions.R | 51 ++ tests/testthat/test-hardcode.R | 81 ++++ tests/testthat/test-mutate_cnd_df.R | 103 +++++ tests/testthat/test-pipe.R | 130 ++++++ vignettes/articles/cnd_df.Rmd | 109 +++++ 55 files changed, 2314 insertions(+), 760 deletions(-) create mode 100644 R/assertions.R create mode 100644 R/cnd_df.R create mode 100644 R/join.R create mode 100644 R/pipe.R create mode 100644 R/sdtm_join.R create mode 100644 man/condition_add.Rd create mode 100644 man/ctl_new_rowid_pillar.cnd_df.Rd create mode 100644 man/dot_pipe.Rd create mode 100644 man/eval_conditions.Rd create mode 100644 man/figures/lifecycle-archived.svg create mode 100644 man/figures/lifecycle-defunct.svg create mode 100644 man/figures/lifecycle-deprecated.svg create mode 100644 man/figures/lifecycle-experimental.svg create mode 100644 man/figures/lifecycle-maturing.svg create mode 100644 man/figures/lifecycle-questioning.svg create mode 100644 man/figures/lifecycle-soft-deprecated.svg create mode 100644 man/figures/lifecycle-stable.svg create mode 100644 man/figures/lifecycle-superseded.svg create mode 100644 man/get_cnd_df_cnd.Rd create mode 100644 man/get_cnd_df_cnd_sum.Rd create mode 100644 man/is_cnd_df.Rd create mode 100644 man/mutate.cnd_df.Rd create mode 100644 man/new_cnd_df.Rd create mode 100644 man/rm_cnd_df.Rd create mode 100644 man/sdtm_join.Rd create mode 100644 man/tbl_sum.cnd_df.Rd delete mode 100644 renv/profiles/4.4/renv.lock delete mode 100644 renv/profiles/4.4/renv/settings.json create mode 100644 tests/testthat/test-assign.R create mode 100644 tests/testthat/test-cnd_df.R create mode 100644 tests/testthat/test-condition_add.R create mode 100644 tests/testthat/test-eval_conditions.R create mode 100644 tests/testthat/test-hardcode.R create mode 100644 tests/testthat/test-mutate_cnd_df.R create mode 100644 tests/testthat/test-pipe.R create mode 100644 vignettes/articles/cnd_df.Rmd diff --git a/DESCRIPTION b/DESCRIPTION index ccbf6dc3..a56269d1 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,7 +1,7 @@ Package: sdtm.oak Type: Package Title: SDTM Data Transformation Engine -Version: 0.0.0.9004 +Version: 0.0.0.9005 Authors@R: c( person("Rammprasad", "Ganapathy", role = c("aut", "cre"), email = "ganapathy.rammprasad@gene.com"), @@ -46,9 +46,12 @@ Imports: tibble, vctrs, readr, - glue + glue, + pillar Suggests: knitr, + lifecycle, + magrittr, rmarkdown, spelling, testthat (>= 3.1.7) diff --git a/NAMESPACE b/NAMESPACE index e627383e..73351b8a 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -1,10 +1,14 @@ # Generated by roxygen2: do not edit by hand +S3method(ctl_new_rowid_pillar,cnd_df) S3method(print,iso8601) +S3method(tbl_sum,cnd_df) +export("%.>%") export(assign_ct) export(assign_datetime) export(assign_no_ct) export(clear_cache) +export(condition_add) export(create_iso8601) export(ct_map) export(ct_spec_example) @@ -15,11 +19,15 @@ export(domain_example) export(fmt_cmp) export(hardcode_ct) export(hardcode_no_ct) +export(oak_id_vars) export(problems) export(read_ct_spec) export(read_ct_spec_example) export(read_domain_example) export(sbj_vars) +importFrom(dplyr,mutate) +importFrom(pillar,ctl_new_rowid_pillar) +importFrom(pillar,tbl_sum) importFrom(rlang,"%||%") importFrom(rlang,":=") importFrom(rlang,.data) diff --git a/NEWS.md b/NEWS.md index 6e33105a..0f9a8602 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,6 +1,10 @@ -# sdtm.oak 0.0.0.9004 (development version) +# sdtm.oak 0.0.0.9005 (development version) -## New Features +* New function for creating conditioned data frames: `condition_add()`. +* New pipe operator: `%.>%` for explicit dot placeholder placement. +* `oak_id_vars()` is now an exported function. + +# sdtm.oak 0.0.0.9004 (development version) * New function: `derive_seq()` for deriving a sequence number variable. diff --git a/R/assertions.R b/R/assertions.R new file mode 100644 index 00000000..8228c4ee --- /dev/null +++ b/R/assertions.R @@ -0,0 +1,16 @@ +# Surprisingly, admiraldev doesn't provide `assert_logical_vector`. +assert_logical_vector <- function(arg, optional = FALSE) { + if (optional && is.null(arg)) { + return(invisible(arg)) + } + + if (!is.logical(arg)) { + err_msg <- sprintf( + "`arg` must be a logical vector but is %s.", + admiraldev::what_is_it(arg) + ) + rlang::abort(err_msg) + } + + invisible(arg) +} diff --git a/R/assign.R b/R/assign.R index a91eacaf..57f3b0ed 100644 --- a/R/assign.R +++ b/R/assign.R @@ -21,7 +21,7 @@ #' the variables indicated in `id_vars`. This parameter is optional, see #' section Value for how the output changes depending on this argument value. #' @param id_vars Key variables to be used in the join between the raw dataset -#' (`raw_dat`) and the target data set (`raw_dat`). +#' (`raw_dat`) and the target data set (`tgt_dat`). #' #' @returns The returned data set depends on the value of `tgt_dat`: #' - If no target dataset is supplied, meaning that `tgt_dat` defaults to @@ -35,12 +35,12 @@ #' #' @importFrom rlang := #' @keywords internal -sdtm_assign <- function(raw_dat, - raw_var, +sdtm_assign <- function(tgt_dat = NULL, tgt_var, + raw_dat, + raw_var, ct_spec = NULL, ct_clst = NULL, - tgt_dat = NULL, id_vars = oak_id_vars()) { admiraldev::assert_character_scalar(raw_var) admiraldev::assert_character_scalar(tgt_var) @@ -53,29 +53,18 @@ sdtm_assign <- function(raw_dat, assert_ct_spec(ct_spec, optional = TRUE) assert_ct_clst(ct_spec = ct_spec, ct_clst = ct_clst, optional = TRUE) - # Recode the raw variable following terminology. - tgt_val <- ct_map(raw_dat[[raw_var]], ct_spec = ct_spec, ct_clst = ct_clst) - - # Apply derivation by assigning `raw_var` to `tgt_var`. - # `der_dat`: derived dataset. - der_dat <- + join_dat <- raw_dat |> - dplyr::select(c(id_vars, raw_var)) |> - dplyr::mutate("{tgt_var}" := tgt_val) |> # nolint object_name_linter() - dplyr::select(-rlang::sym(raw_var)) + dplyr::select(dplyr::all_of(c(id_vars, raw_var))) |> + sdtm_join(tgt_dat = tgt_dat, id_vars = id_vars) - # If a target dataset is supplied, then join the so far derived dataset with - # the target dataset (`tgt_dat`), otherwise leave it be. - der_dat <- - if (!is.null(tgt_dat)) { - der_dat |> - dplyr::right_join(y = tgt_dat, by = id_vars) |> - dplyr::relocate(tgt_var, .after = dplyr::last_col()) - } else { - der_dat - } + # Recode the raw variable following terminology. + tgt_val <- ct_map(join_dat[[raw_var]], ct_spec = ct_spec, ct_clst = ct_clst) - der_dat + join_dat |> + mutate("{tgt_var}" := tgt_val) |> # nolint object_name_linter() + dplyr::select(-dplyr::any_of(setdiff(raw_var, tgt_var))) |> + dplyr::relocate(dplyr::all_of(tgt_var), .after = dplyr::last_col()) } #' Derive an SDTM variable @@ -127,9 +116,9 @@ sdtm_assign <- function(raw_dat, #' ) #' #' assign_no_ct( -#' raw_dat = md1, -#' raw_var = "MDIND", #' tgt_var = "CMINDC", +#' raw_dat = md1, +#' raw_var = "MDIND" #' ) #' #' cm_inter <- @@ -175,12 +164,12 @@ sdtm_assign <- function(raw_dat, #' (ct_spec <- read_ct_spec_example("ct-01-cm")) #' #' assign_ct( +#' tgt_dat = cm_inter, +#' tgt_var = "CMINDC", #' raw_dat = md1, #' raw_var = "MDIND", -#' tgt_var = "CMINDC", #' ct_spec = ct_spec, -#' ct_clst = "C66729", -#' tgt_dat = cm_inter +#' ct_clst = "C66729" #' ) #' #' @name assign @@ -189,10 +178,10 @@ NULL #' @order 1 #' @export #' @rdname assign -assign_no_ct <- function(raw_dat, - raw_var, +assign_no_ct <- function(tgt_dat = NULL, tgt_var, - tgt_dat = NULL, + raw_dat, + raw_var, id_vars = oak_id_vars()) { admiraldev::assert_character_scalar(raw_var) admiraldev::assert_character_scalar(tgt_var) @@ -204,10 +193,10 @@ assign_no_ct <- function(raw_dat, admiraldev::assert_data_frame(tgt_dat, required_vars = rlang::syms(id_vars), optional = TRUE) sdtm_assign( + tgt_dat = tgt_dat, + tgt_var = tgt_var, raw_dat = raw_dat, raw_var = raw_var, - tgt_var = tgt_var, - tgt_dat = tgt_dat, id_vars = id_vars ) } @@ -215,12 +204,12 @@ assign_no_ct <- function(raw_dat, #' @order 2 #' @export #' @rdname assign -assign_ct <- function(raw_dat, - raw_var, +assign_ct <- function(tgt_dat = NULL, tgt_var, + raw_dat, + raw_var, ct_spec, ct_clst, - tgt_dat = NULL, id_vars = oak_id_vars()) { admiraldev::assert_character_scalar(raw_var) admiraldev::assert_character_scalar(tgt_var) @@ -232,10 +221,10 @@ assign_ct <- function(raw_dat, admiraldev::assert_data_frame(tgt_dat, required_vars = rlang::syms(id_vars), optional = TRUE) sdtm_assign( + tgt_dat = tgt_dat, + tgt_var = tgt_var, raw_dat = raw_dat, raw_var = raw_var, - tgt_var = tgt_var, - tgt_dat = tgt_dat, id_vars = id_vars, ct_spec = ct_spec, ct_clst = ct_clst diff --git a/R/assign_datetime.R b/R/assign_datetime.R index 4622f579..2e635f2a 100644 --- a/R/assign_datetime.R +++ b/R/assign_datetime.R @@ -61,11 +61,11 @@ #' # indicating that these values are missing/unknown (unk). #' cm1 <- #' assign_datetime( +#' tgt_var = "CMSTDTC", #' raw_dat = md1, #' raw_var = "MDBDR", #' raw_fmt = "d-m-y", -#' raw_unk = c("UN", "UNK"), -#' tgt_var = "CMSTDTC" +#' raw_unk = c("UN", "UNK") #' ) #' #' cm1 @@ -120,11 +120,11 @@ #' # data set `cm_inter`. #' cm2 <- #' assign_datetime( +#' tgt_dat = cm_inter, +#' tgt_var = "CMSTDTC", #' raw_dat = md1, #' raw_var = "MDBDR", -#' raw_fmt = "d-m-y", -#' tgt_var = "CMSTDTC", -#' tgt_dat = cm_inter +#' raw_fmt = "d-m-y" #' ) #' #' cm2 @@ -137,11 +137,11 @@ #' # MDETM (correspondence is by positional matching). #' cm3 <- #' assign_datetime( +#' tgt_var = "CMSTDTC", #' raw_dat = md1, #' raw_var = c("MDEDR", "MDETM"), #' raw_fmt = c("d-m-y", "H:M:S"), -#' raw_unk = c("UN", "UNK"), -#' tgt_var = "CMSTDTC" +#' raw_unk = c("UN", "UNK") #' ) #' #' cm3 @@ -151,12 +151,12 @@ #' #' @export assign_datetime <- - function(raw_dat, + function(tgt_dat = NULL, + tgt_var, + raw_dat, raw_var, raw_fmt, - tgt_var, raw_unk = c("UN", "UNK"), - tgt_dat = NULL, id_vars = oak_id_vars(), .warn = TRUE) { admiraldev::assert_character_vector(raw_var) @@ -170,27 +170,20 @@ assign_datetime <- admiraldev::assert_character_vector(raw_unk) admiraldev::assert_logical_scalar(.warn) + join_dat <- + raw_dat |> + dplyr::select(dplyr::all_of(c(id_vars, raw_var))) |> + sdtm_join(tgt_dat = tgt_dat, id_vars = id_vars) + tgt_val <- - create_iso8601(!!!raw_dat[raw_var], + create_iso8601(!!!join_dat[raw_var], .format = raw_fmt, .na = raw_unk, .warn = .warn ) - der_dat <- - raw_dat |> - dplyr::select(c(id_vars, raw_var)) |> - dplyr::mutate("{tgt_var}" := tgt_val) |> # nolint object_name_linter() - dplyr::select(-raw_var) - - der_dat <- - if (!is.null(tgt_dat)) { - der_dat |> - dplyr::right_join(y = tgt_dat, by = id_vars) |> - dplyr::relocate(tgt_var, .after = dplyr::last_col()) - } else { - der_dat - } - - der_dat + join_dat |> + mutate("{tgt_var}" := tgt_val) |> # nolint object_name_linter() + dplyr::select(-dplyr::any_of(setdiff(raw_var, tgt_var))) |> + dplyr::relocate(dplyr::all_of(tgt_var), .after = dplyr::last_col()) } diff --git a/R/cnd_df.R b/R/cnd_df.R new file mode 100644 index 00000000..9d3dedb9 --- /dev/null +++ b/R/cnd_df.R @@ -0,0 +1,444 @@ +# ------------------------------------------------------------------------------ +# File: cnd_df.R +# Package: sdtm.oak +# Author: Ramiro Magno +# Created: 2024-05-23 +# Last Modified: 2024-05-15 +# ------------------------------------------------------------------------------ +# Description: +# +# This file contains functions and scripts related to the so-called conditioned +# data frames, i.e. those data frames extended with class `cnd_df`. +# +# Functions: +# +# - `new_cnd_df()`: Create a "conditioned data set" (class cnd_df). +# - `is_cnd_df()`: Assess whether the argument is a `cnd_df` data frame. +# - `get_cnd_df_cnd()`: Extract the attribute "cnd" from a `cnd_df` data frame. +# - `get_cnd_df_cnd_sum()`: Extract the attribute "cnd_sum" from a `cnd_df` data frame. +# - `rm_cnd_df()`: De-class a cnd_df data frame. +# - `tbl_sum.cnd_df()`: Print method for `cnd_df` tibbles. +# - `ctl_new_rowid_pillar.cnd_df()`: Print method for the row ids cnd_df` tibbles. +# - `eval_conditions()`: Find which rows match a set of conditions. +# - `condition_add()`: Create a conditioned data frame (user facing). +# - `mutate.cnd_df()`: Mutate a conditioned data frame. + +#' Create a data frame with filtering tags +#' +#' [new_cnd_df()] creates a _conditioned_ data frame, classed `cnd_df`, meaning +#' that this function extends the data frame passed as argument by storing a +#' logical vector `cnd` (as attribute) that marks rows for posterior conditional +#' transformation by methods that support _conditioned_ data frames. +#' +#' @param dat A data frame. +#' @param cnd A logical vector. Length must match the number of rows in `dat`. +#' @param .warn Whether to warn about creating a new _conditioned_ data frame +#' in case that `dat` already is one. +#' +#' @returns A data frame `dat` with the additional class `"cnd_df"` and the +#' following attributes: +#' +#' - `cnd`: The logical vector passed as argument `cnd`: `TRUE` values mark +#' rows in `dat` to be used for transformations; rows marked with `FALSE` are +#' not transformed; and `NA` mark rows whose transformations are to be applied +#' resulting in `NA`. +#' - `cnd_sum`: An integer vector of three elements providing the sum of `TRUE`, +#' `FALSE` and `NA` values in `cnd`, respectively. +#' +#' @seealso [is_cnd_df()], [get_cnd_df_cnd()], [get_cnd_df_cnd_sum()], +#' [rm_cnd_df()]. +#' +#' @examples +#' df <- data.frame(x = 1:3, y = letters[1:3]) +#' sdtm.oak:::new_cnd_df(dat = df, cnd = c(FALSE, NA, TRUE)) +#' +#' @keywords internal +new_cnd_df <- function(dat, cnd, .warn = TRUE) { + admiraldev::assert_data_frame(dat) + assert_logical_vector(cnd) + + if (!identical(nrow(dat), length(cnd))) { + msg <- c( + "Number of rows in `dat` must match length of `cnd`." + ) + rlang::abort(message = msg) + } + + is_cnd_df <- inherits(dat, "cnd_df") + if (.warn && is_cnd_df) { + msg <- "`dat` is already a conditioned data frame (`cnd_df`)." + rlang::warn(message = msg) + } + + if (!is_cnd_df) { + dat <- tibble::as_tibble(dat) + class(dat) <- c("cnd_df", class(dat)) + } + + n_true <- sum(cnd, na.rm = TRUE) + n_false <- sum(!cnd, na.rm = TRUE) + n_na <- length(cnd) - (n_true + n_false) + cnd_sum <- c(n_true = n_true, n_false = n_false, n_na = n_na) + + attr(dat, "cnd") <- cnd + attr(dat, "cnd_sum") <- cnd_sum + + return(dat) +} + +#' Check if a data frame is a conditioned data frame +#' +#' [is_cnd_df()] checks whether a data frame is a conditioned data frame, i.e. +#' of class `cnd_df`. +#' +#' @param dat A data frame. +#' @return `TRUE` if `dat` is a conditioned data frame (class `cnd_df`), +#' otherwise `FALSE`. +#' +#' @seealso [new_cnd_df()], [get_cnd_df_cnd()], [get_cnd_df_cnd_sum()], +#' [rm_cnd_df()]. +#' +#' @examples +#' df <- data.frame(x = 1:3, y = letters[1:3]) +#' sdtm.oak:::is_cnd_df(df) +#' +#' cnd_df <- sdtm.oak:::new_cnd_df(dat = df, cnd = c(FALSE, NA, TRUE)) +#' sdtm.oak:::is_cnd_df(cnd_df) +#' +#' @keywords internal +is_cnd_df <- function(dat) { + inherits(dat, "cnd_df") +} + +#' Get the conditioning vector from a conditioned data frame +#' +#' [get_cnd_df_cnd()] extracts the conditioning vector from a conditioned data +#' frame, i.e. from an object of class `cnd_df`. +#' +#' @param dat A conditioned data frame (`cnd_df`). +#' @return The conditioning vector (`cnd`) if `dat` is a conditioned data frame +#' (`cnd_df`), otherwise `NULL`, or `NULL` if `dat` is not a conditioned data +#' frame (`cnd_df`). +#' +#' @seealso [new_cnd_df()], [is_cnd_df()], [get_cnd_df_cnd_sum()], +#' [rm_cnd_df()]. +#' +#' @examples +#' df <- data.frame(x = 1:3, y = letters[1:3]) +#' sdtm.oak:::get_cnd_df_cnd(df) +#' +#' cnd_df <- sdtm.oak:::new_cnd_df(dat = df, cnd = c(FALSE, NA, TRUE)) +#' sdtm.oak:::get_cnd_df_cnd(cnd_df) +#' +#' @keywords internal +get_cnd_df_cnd <- function(dat) { + if (is_cnd_df(dat)) { + attr(dat, "cnd") + } else { + NULL + } +} + +#' Get the summary of the conditioning vector from a conditioned data frame +#' +#' [get_cnd_df_cnd_sum()] extracts the tally of the conditioning vector from a +#' conditioned data frame. +#' +#' @param dat A conditioned data frame (`cnd_df`). +#' @return A vector of three elements providing the sum of `TRUE`, `FALSE`, and +#' `NA` values in the conditioning vector (`cnd`), or `NULL` if `dat` is not +#' a conditioned data frame (`cnd_df`). +#' +#' @seealso [new_cnd_df()], [is_cnd_df()], [get_cnd_df_cnd()], [rm_cnd_df()]. +#' +#' @examples +#' df <- data.frame(x = 1:3, y = letters[1:3]) +#' sdtm.oak:::get_cnd_df_cnd_sum(df) +#' +#' cnd_df <- sdtm.oak:::new_cnd_df(dat = df, cnd = c(FALSE, NA, TRUE)) +#' sdtm.oak:::get_cnd_df_cnd_sum(cnd_df) +#' +#' @keywords internal +get_cnd_df_cnd_sum <- function(dat) { + if (is_cnd_df(dat)) { + attr(dat, "cnd_sum") + } else { + NULL + } +} + +#' Remove the `cnd_df` class from a data frame +#' +#' This function removes the `cnd_df` class, along with its attributes, if +#' applicable. +#' +#' @param dat A data frame. +#' @return The input `dat` without the `cnd_df` class and associated attributes. +#' +#' @seealso [new_cnd_df()], [is_cnd_df()], [get_cnd_df_cnd()], +#' [get_cnd_df_cnd_sum()]. +#' +#' @examples +#' df <- data.frame(x = 1:3, y = letters[1:3]) +#' cnd_df <- sdtm.oak:::new_cnd_df(dat = df, cnd = c(FALSE, NA, TRUE)) +#' +#' sdtm.oak:::is_cnd_df(cnd_df) +#' sdtm.oak:::is_cnd_df(sdtm.oak:::rm_cnd_df(cnd_df)) +#' +#' @keywords internal +rm_cnd_df <- function(dat) { + if (is_cnd_df(dat)) { + class(dat) <- class(dat)[class(dat) != "cnd_df"] + attr(dat, "cnd") <- NULL + attr(dat, "cnd_sum") <- NULL + } + return(dat) +} + +#' Conditioned tibble header print method +#' +#' Conditioned tibble header print method. This S3 method adds an extra line +#' in the header of a tibble that indicates the tibble is a conditioned tibble +#' (`# Cond. tbl:`) followed by the tally of the conditioning vector: number +#' of TRUE, FALSE and NA values: e.g., `1/1/1`. +#' +#' @param x A conditioned tibble of class `cnd_df`. +#' @param ... Additional arguments passed to the default print method. +#' +#' @importFrom pillar tbl_sum +#' +#' @seealso [ctl_new_rowid_pillar.cnd_df()]. +#' +#' @examples +#' df <- data.frame(x = c(1L, NA_integer_, 3L)) +#' (cnd_df <- condition_add(dat = df, x >= 2L)) +#' pillar::tbl_sum(cnd_df) +#' +#' @export +tbl_sum.cnd_df <- function(x, ...) { + default_header <- NextMethod() + + tally <- get_cnd_df_cnd_sum(x) + h2 <- sprintf("%d/%d/%d", tally[1L], tally[2L], tally[3L]) + c(default_header, "Cond. tbl" = h2) +} + +lgl_to_chr <- function(x) { + dplyr::case_match(x, TRUE ~ "T", FALSE ~ "F", NA ~ "-") +} + +#' Conditioned tibble pillar print method +#' +#' @inheritParams pillar::ctl_new_rowid_pillar +#' @importFrom pillar ctl_new_rowid_pillar +#' +#' @seealso [tbl_sum.cnd_df()]. +#' +#' @export +ctl_new_rowid_pillar.cnd_df <- function(controller, x, width, ...) { + out <- NextMethod() + n_row <- nrow(x) + idx <- seq_len(n_row) + i <- sprintf("%d", idx) + i_width <- nchar(as.character(i)) + i_max_width <- max(i_width) + max_width <- i_max_width + 2L + ws <- strrep(" ", max_width - i_width - 1L) + abb_lgl <- lgl_to_chr(attr(controller, "cnd")[idx]) + + row_ids <- paste0(i, ws, abb_lgl) + width <- max(nchar(as.character(row_ids))) + pillar::new_pillar( + list( + title = out$title, + type = out$type, + data = pillar::pillar_component( + pillar::new_pillar_shaft(list(row_ids = row_ids), + width = width, + class = "pillar_rif_shaft" + ) + ) + ), + width = width + ) +} + +#' Evaluate conditions +#' +#' @description +#' [eval_conditions()] evaluates a set of conditions in the context of a +#' data frame and an optional environment. +#' +#' The utility of this function is to provide an easy way to generate a logical +#' vector of matching records from a set of logical conditions involving +#' variables in a data frame (`dat`) and optionally in a supplementary +#' environment (`.env`). The set of logical conditions are provided as +#' expressions to be evaluated in the context of `dat` and `.env`. +#' +#' Variables are looked up in `dat`, then in `.env`, then in the calling +#' function's environment, followed by its parent environments. +#' +#' @param dat A data frame +#' @param ... A set of logical conditions, e.g. `y & z, x | z` (`x`, `y`, `z` +#' would have to exist either as columns in `dat` or in the environment +#' `.env`). If multiple expressions are included, they are combined with the +#' `&` operator. +#' @param .na Return value to be used when the conditions evaluate to `NA`. +#' @param .env An optional environment to look for variables involved in logical +#' expression passed in `...`. A data frame or a list can also be passed that +#' will be coerced to an environment internally. +#' +#' @returns A logical vector reflecting matching rows in `dat`. +#' +#' @examples +#' # Create a sample data frame +#' df <- data.frame( +#' x = c(1, 2, NA_integer_, 4, 5), +#' y = c(TRUE, FALSE, TRUE, FALSE, TRUE), +#' z = c("a", "b", "a", "b", "a") +#' ) +#' +#' # Simple condition on one column +#' sdtm.oak:::eval_conditions(df, x > 2) +#' +#' # Combined conditions on multiple columns +#' sdtm.oak:::eval_conditions(df, x > 2 & y) +#' sdtm.oak:::eval_conditions(df, x > 2, y) +#' +#' # Using conditions with NA handling +#' df_with_na <- data.frame( +#' x = c(1, 2, NA, 4, 5), +#' y = c(TRUE, FALSE, TRUE, FALSE, TRUE) +#' ) +#' sdtm.oak:::eval_conditions(df_with_na, x > 2, .na = FALSE) +#' +#' # The environment where `eval_conditions()` is called is also inspected +#' # when evaluating conditions in `...`. +#' w <- 1 +#' sdtm.oak:::eval_conditions(df, x > w) +#' +#' # Using an environment +#' env <- rlang::env(w = 2) +#' sdtm.oak:::eval_conditions(df, x > w, .env = env) +#' +#' # In place of an environment, you may alternatively pass a list or data frame. +#' sdtm.oak:::eval_conditions(df, x > w, .env = list(w = 3)) +#' sdtm.oak:::eval_conditions(df, x > w, .env = tibble::tibble(w = 4)) +#' +#' @keywords internal +eval_conditions <- function(dat, + ..., + .na = NA, + .env = rlang::caller_env()) { + conditions <- rlang::enexprs(...) + + # List (or data frame). + if (is.list(.env)) { + .env <- rlang::as_environment(.env, parent = rlang::caller_env()) + } + + lgl_vctrs <- + conditions |> + purrr::map(~ rlang::eval_tidy(.x, dat, env = .env)) |> + purrr::map(~ dplyr::if_else(is.na(.x), .na, .x)) + + cnd <- purrr::reduce(lgl_vctrs, `&`, .init = rep(TRUE, nrow(dat))) + + cnd +} + +#' Add filtering tags to a data set +#' +#' @description +#' `condition_add()` tags records in a data set, indicating which rows match the +#' specified conditions, resulting in a conditioned data frame. Learn how to +#' integrate conditioned data frames in your SDTM domain derivation in +#' `vignette("cnd_df")`. +#' +#' @param dat A data frame. +#' @param ... Conditions to filter the data frame. +#' @param .na Return value to be used when the conditions evaluate to `NA`. +#' @param .dat2 An optional environment to look for variables involved in +#' logical expression passed in `...`. A data frame or a list can also be +#' passed that will be coerced to an environment internally. +#' +#' @returns A conditioned data frame, meaning a tibble with an additional class +#' `cnd_df` and a logical vector attribute indicating matching rows. +#' +#' @examples +#' (df <- tibble::tibble(x = 1L:3L, y = letters[x])) +#' +#' # Mark rows for which `x` greater than `1` +#' (cnd_df <- condition_add(dat = df, x > 1L)) +#' +#' @export +condition_add <- function(dat, ..., .na = NA, .dat2 = rlang::env()) { + admiraldev::assert_data_frame(dat) + # TODO: assertion for `...` (perhaps with admiraldev::assert_filter_cond()?) + # TODO: assertion for `.na` + # TODO: assertion for `.dat2` + + if (is_cnd_df(dat)) { + rlang::warn( + c( + "`dat` is already a conditioned data frame (`cnd_df`).", + "The previous condition will be replaced by the new one." + ) + ) + } + .env <- .dat2 + + cnd <- eval_conditions(dat = dat, ..., .na = .na, .env = .env) + new_cnd_df(dat, cnd = cnd, .warn = FALSE) +} + +#' Mutate method for conditioned data frames +#' +#' [mutate.cnd_df()] is an S3 method to be dispatched by [mutate][dplyr::mutate] +#' generic on conditioned data frames. This function implements a conditional +#' mutate by only changing rows for which the condition stored in the +#' conditioned data frame is `TRUE`. +#' +#' @param .data A conditioned data frame. +#' @param .by Not used when `.data` is a conditioned data frame. +#' @param .before Not used, use `.after` instead. +#' @param .after Control where new columns should appear, i.e. after which +#' columns. +#' +#' @examples +#' df <- tibble::tibble(x = 1L:3L, y = letters[x]) +#' cnd_df <- condition_add(df, x > 1L, y %in% c("a", "b")) +#' +#' # Because `cnd_df` is a conditioned data frame, dplyr::mutate() generic +#' # dispatches this S3 method and mutates only the second row, as that is the +#' # only record that fulfills simultaneously `x > 1L` and `y %in% c("a", "b")`. +#' dplyr::mutate(cnd_df, z = "match") +#' +#' @inheritParams dplyr::mutate +#' @importFrom dplyr mutate +#' @keywords internal +mutate.cnd_df <- function(.data, + ..., + .by = NULL, + .keep = c("all", "used", "unused", "none"), + .before = NULL, + .after = NULL) { + if (!rlang::is_null(.by)) { + rlang::abort("`.by` is not supported on conditioned data frames.") + } + + if (!rlang::is_null(.before)) { + rlang::abort("`.before` is not supported on conditioned data frames, use `.after` instead.") + } + + cnd <- get_cnd_df_cnd(.data) # nolint object_name_linter() + dat <- rm_cnd_df(.data) # avoids recursive S3 method dispatch. + + derivations <- rlang::enquos(...) + derived_vars <- names(derivations) + + lst <- purrr::map(derivations, ~ rlang::expr(dplyr::if_else(!!cnd, !!.x, NA))) + lst <- rlang::set_names(lst, derived_vars) + + dplyr::mutate(dat, !!!lst, .by = NULL, .keep = .keep, .after = .after) +} diff --git a/R/ct.R b/R/ct.R index e66c2d6d..ab5efb00 100644 --- a/R/ct.R +++ b/R/ct.R @@ -232,10 +232,10 @@ ct_mappings <- function(ct_spec, from = ct_spec_vars("from"), to = ct_spec_vars( values_to = "from", names_to = "type" ) |> - dplyr::select(c("type", "from", "to")) |> + dplyr::select(dplyr::all_of(c("type", "from", "to"))) |> dplyr::mutate(type = factor(.data$type, levels = cols)) |> dplyr::arrange(.data$type) |> - dplyr::select(-"type") |> + dplyr::select(-dplyr::all_of("type")) |> tidyr::drop_na("from") |> dplyr::mutate(from = str_split(.data$from)) |> tidyr::unnest(from) |> diff --git a/R/hardcode.R b/R/hardcode.R index 31938689..51d263e8 100644 --- a/R/hardcode.R +++ b/R/hardcode.R @@ -23,7 +23,7 @@ #' the variables indicated in `id_vars`. This parameter is optional, see #' section Value for how the output changes depending on this argument value. #' @param id_vars Key variables to be used in the join between the raw dataset -#' (`raw_dat`) and the target data set (`raw_dat`). +#' (`raw_dat`) and the target data set (`tgt_dat`). #' #' @returns The returned data set depends on the value of `tgt_dat`: #' - If no target dataset is supplied, meaning that `tgt_dat` defaults to @@ -36,13 +36,13 @@ #' #' @importFrom rlang := #' @keywords internal -sdtm_hardcode <- function(raw_dat, - raw_var, +sdtm_hardcode <- function(tgt_dat = NULL, tgt_var, + raw_dat, + raw_var, tgt_val, ct_spec = NULL, ct_clst = NULL, - tgt_dat = NULL, id_vars = oak_id_vars()) { admiraldev::assert_character_scalar(raw_var) admiraldev::assert_character_scalar(tgt_var) @@ -56,29 +56,18 @@ sdtm_hardcode <- function(raw_dat, assert_ct_spec(ct_spec, optional = TRUE) assert_ct_clst(ct_spec = ct_spec, ct_clst = ct_clst, optional = TRUE) - # Recode the hardcoded value following terminology. - tgt_val <- ct_map(tgt_val, ct_spec = ct_spec, ct_clst = ct_clst) - - # Apply derivation of the hardcoded value. - # `der_dat`: derived dataset. - der_dat <- + join_dat <- raw_dat |> - dplyr::select(c(id_vars, raw_var)) |> - dplyr::mutate("{tgt_var}" := recode(x = !!rlang::sym(raw_var), to = tgt_val)) |> # nolint object_name_linter() - dplyr::select(-rlang::sym(raw_var)) + dplyr::select(dplyr::all_of(c(id_vars, raw_var))) |> + sdtm_join(tgt_dat = tgt_dat, id_vars = id_vars) - # If a target dataset is supplied, then join the so far derived dataset with - # the target dataset (`tgt_dat`), otherwise leave it be. - der_dat <- - if (!is.null(tgt_dat)) { - der_dat |> - dplyr::right_join(y = tgt_dat, by = id_vars) |> - dplyr::relocate(tgt_var, .after = dplyr::last_col()) - } else { - der_dat - } + # Recode the hardcoded value following terminology. + tgt_val <- ct_map(tgt_val, ct_spec = ct_spec, ct_clst = ct_clst) - der_dat + join_dat |> + mutate("{tgt_var}" := recode(x = !!rlang::sym(raw_var), to = tgt_val)) |> # nolint object_name_linter() + dplyr::select(-dplyr::any_of(setdiff(raw_var, tgt_var))) |> + dplyr::relocate(dplyr::all_of(tgt_var), .after = dplyr::last_col()) } #' Derive an SDTM variable with a hardcoded value @@ -133,10 +122,10 @@ sdtm_hardcode <- function(raw_dat, #' # Derive a new variable `CMCAT` by overwriting `MDRAW` with the #' # hardcoded value "GENERAL CONCOMITANT MEDICATIONS". #' hardcode_no_ct( +#' tgt_val = "GENERAL CONCOMITANT MEDICATIONS", #' raw_dat = md1, #' raw_var = "MDRAW", -#' tgt_var = "CMCAT", -#' tgt_val = "GENERAL CONCOMITANT MEDICATIONS" +#' tgt_var = "CMCAT" #' ) #' #' cm_inter <- @@ -153,11 +142,11 @@ sdtm_hardcode <- function(raw_dat, #' # hardcoded value "GENERAL CONCOMITANT MEDICATIONS" with a prior join to #' # `target_dataset`. #' hardcode_no_ct( +#' tgt_dat = cm_inter, +#' tgt_val = "GENERAL CONCOMITANT MEDICATIONS", #' raw_dat = md1, #' raw_var = "MDRAW", -#' tgt_var = "CMCAT", -#' tgt_val = "GENERAL CONCOMITANT MEDICATIONS", -#' tgt_dat = cm_inter +#' tgt_var = "CMCAT" #' ) #' #' # Controlled terminology specification @@ -167,13 +156,13 @@ sdtm_hardcode <- function(raw_dat, #' # involving terminology recoding. `NA` values in `MDRAW` are preserved in #' # `CMCAT`. #' hardcode_ct( +#' tgt_dat = cm_inter, +#' tgt_var = "CMCAT", #' raw_dat = md1, #' raw_var = "MDRAW", -#' tgt_var = "CMCAT", #' tgt_val = "GENERAL CONCOMITANT MEDICATIONS", #' ct_spec = ct_spec, -#' ct_clst = "C66729", -#' tgt_dat = cm_inter +#' ct_clst = "C66729" #' ) #' #' @name harcode @@ -181,11 +170,11 @@ NULL #' @export #' @rdname harcode -hardcode_no_ct <- function(raw_dat, +hardcode_no_ct <- function(tgt_dat = NULL, + tgt_val, + raw_dat, raw_var, tgt_var, - tgt_val, - tgt_dat = NULL, id_vars = oak_id_vars()) { admiraldev::assert_character_scalar(raw_var) admiraldev::assert_character_scalar(tgt_var) @@ -200,11 +189,11 @@ hardcode_no_ct <- function(raw_dat, admiraldev::assert_data_frame(tgt_dat, required_vars = rlang::syms(id_vars), optional = TRUE) sdtm_hardcode( + tgt_dat = tgt_dat, + tgt_val = tgt_val, raw_dat = raw_dat, raw_var = raw_var, tgt_var = tgt_var, - tgt_val = tgt_val, - tgt_dat = tgt_dat, id_vars = id_vars ) } @@ -212,13 +201,13 @@ hardcode_no_ct <- function(raw_dat, #' @export #' @rdname harcode hardcode_ct <- - function(raw_dat, + function(tgt_dat = NULL, + tgt_val, + raw_dat, raw_var, tgt_var, - tgt_val, ct_spec, ct_clst, - tgt_dat = NULL, id_vars = oak_id_vars()) { admiraldev::assert_character_scalar(raw_var) admiraldev::assert_character_scalar(tgt_var) @@ -239,13 +228,13 @@ hardcode_ct <- assert_ct_clst(ct_spec = ct_spec, ct_clst = ct_clst, optional = FALSE) sdtm_hardcode( + tgt_dat = tgt_dat, + tgt_val = tgt_val, raw_dat = raw_dat, raw_var = raw_var, tgt_var = tgt_var, - tgt_val = tgt_val, ct_spec = ct_spec, ct_clst = ct_clst, - tgt_dat = tgt_dat, id_vars = id_vars ) } diff --git a/R/join.R b/R/join.R new file mode 100644 index 00000000..d85949b9 --- /dev/null +++ b/R/join.R @@ -0,0 +1,37 @@ +#' SDTM join +#' +#' [sdtm_join()] is a special join between a raw data set and a target data +#' set. This function supports conditioned data frames. +#' +#' @param raw_dat The raw dataset: a dataframe or a conditioned data frame. Must +#' include the variables passed in `id_vars`. +#' @param tgt_dat Target dataset: a data frame or a conditioned data frame to be +#' merged against `raw_dat` by the variables indicated in `id_vars`. +#' @param id_vars Key variables to be used in the join between the raw dataset +#' (`raw_dat`) and the target data set (`raw_dat`). +#' +#' @returns A data frame, or a conditioned data frame if at least one of the +#' input data sets is a conditioned data frame. +#' +#' @keywords internal +#' @importFrom rlang %||% +sdtm_join <- function(raw_dat, + tgt_dat = NULL, + id_vars = oak_id_vars()) { + raw_dat_cnd <- get_cnd_df_cnd(raw_dat) %||% rep(TRUE, nrow(raw_dat)) + tgt_dat <- tgt_dat %||% raw_dat[id_vars] + tgt_dat_cnd <- get_cnd_df_cnd(tgt_dat) %||% rep(TRUE, nrow(tgt_dat)) + + # `rm_cnd_df()` prevents `mutate` from dispatching. + raw_dat <- dplyr::mutate(rm_cnd_df(raw_dat), `__raw_dat_cond__` = raw_dat_cnd) + tgt_dat <- dplyr::mutate(rm_cnd_df(tgt_dat), `__tgt_dat_cond__` = tgt_dat_cnd) + + res <- dplyr::right_join(raw_dat, y = tgt_dat, by = id_vars) + + cnd <- res$`__raw_dat_cond__` & res$`__tgt_dat_cond__` + res |> + dplyr::select(-dplyr::all_of(c( + "__raw_dat_cond__", "__tgt_dat_cond__" + ))) |> + new_cnd_df(cnd = cnd, .warn = FALSE) +} diff --git a/R/oak_id_vars.R b/R/oak_id_vars.R index 718d11da..c88007f8 100644 --- a/R/oak_id_vars.R +++ b/R/oak_id_vars.R @@ -13,11 +13,11 @@ #' as keys in raw datasets. #' #' @examples -#' sdtm.oak:::oak_id_vars() +#' oak_id_vars() #' -#' sdtm.oak:::oak_id_vars(extra_vars = "sample_id") +#' oak_id_vars(extra_vars = "sample_id") #' -#' @keywords internal +#' @export oak_id_vars <- function(extra_vars = NULL) { admiraldev::assert_character_vector(extra_vars, optional = TRUE) unique(c("oak_id", "raw_source", "patient_number", extra_vars)) @@ -37,7 +37,7 @@ oak_id_vars <- function(extra_vars = NULL) { #' # `oak_id_vars()` is the function that defines what are the minimal set of #' # oak keys. Hence, by definition, the following code should always return #' # `TRUE`. -#' sdtm.oak:::contains_oak_id_vars(sdtm.oak:::oak_id_vars()) +#' sdtm.oak:::contains_oak_id_vars(oak_id_vars()) #' #' # Returns `FALSE`. #' sdtm.oak:::contains_oak_id_vars(character()) diff --git a/R/pipe.R b/R/pipe.R new file mode 100644 index 00000000..ee5a561c --- /dev/null +++ b/R/pipe.R @@ -0,0 +1,73 @@ +#' Explicit Dot Pipe +#' +#' @description +#' `r lifecycle::badge("experimental")` +#' +#' This operator pipes an object forward into a function or call expression +#' using an explicit placement of the dot (`.`) placeholder. Unlike magrittr's +#' [%>%][magrittr::%>%] operator, `%.>%` does not automatically place the +#' left-hand side (`lhs`) as the first argument in the right-hand side (`rhs`) +#' call. This operator provides a simpler alternative to the use of braces with +#' magrittr, while achieving similar behavior. +#' +#' @param lhs A value to be piped forward. +#' @param rhs A function call that utilizes the dot (`.`) placeholder to specify +#' where `lhs` should be placed. +#' +#' @details +#' The `%.>%` operator is used to pipe the `lhs` value into the `rhs` function +#' call. Within the `rhs` expression, the placeholder `.` represents the +#' position where `lhs` will be inserted. This provides more control over where +#' the `lhs` value appears in the `rhs` function call, compared to the magrittr +#' pipe operator which always places `lhs` as the first argument of `rhs`. +#' +#' Unlike magrittr's pipe, which may require the use of braces to fully control +#' the placement of `lhs` in nested function calls, `%.>%` simplifies this by +#' directly allowing multiple usages of the dot placeholder without requiring +#' braces. For example, the following expression using magrittr's pipe and +#' braces: +#' +#' ```r +#' library(magrittr) +#' +#' 1:10 %>% { c(min(.), max(.)) } +#' ``` +#' +#' can be written as: +#' +#' ```r +#' 1:10 %.>% c(min(.), max(.)) +#' ``` +#' +#' without needing additional braces. +#' +#' ## Downside +#' +#' The disadvantage of `%.>%` is that you always need to use +#' the dot placeholder, even when piping to the first argument of the +#' right-hand side (`rhs`). +#' +#' @examples +#' +#' # Equivalent to `subset(head(iris), 1:nrow(head(iris)) %% 2 == 0)` +#' head(iris) %.>% subset(., 1:nrow(.) %% 2 == 0) +#' +#' # Equivalent to `c(min(1:10), max(1:10))` +#' 1:10 %.>% c(min(.), max(.)) +#' +#' @rdname dot_pipe +#' @export +`%.>%` <- function(lhs, rhs) { + rhs_expr <- rlang::enexpr(rhs) + if (!contains_dot(rhs_expr)) { + rlang::abort("The right-hand side (rhs) of `%.>%` must contain at least one dot (.) placeholder.") + } + + rlang::eval_tidy(rhs_expr, list(. = lhs), env = rlang::caller_env()) +} + +# Recursively find if an expression contains a dot. +contains_dot <- function(expr) { + rlang::is_symbol(expr) && identical(expr, rlang::sym(".")) || + rlang::is_call(expr) && purrr::some(as.list(expr), contains_dot) +} diff --git a/R/sdtm.oak-package.R b/R/sdtm.oak-package.R index 0ba23dc1..bf0c896a 100644 --- a/R/sdtm.oak-package.R +++ b/R/sdtm.oak-package.R @@ -2,8 +2,8 @@ "_PACKAGE" ## usethis namespace: start -#' @importFrom tibble tibble #' @importFrom rlang .data #' @importFrom stats na.omit +#' @importFrom tibble tibble ## usethis namespace: end NULL diff --git a/R/sdtm_join.R b/R/sdtm_join.R new file mode 100644 index 00000000..ad2b5a1c --- /dev/null +++ b/R/sdtm_join.R @@ -0,0 +1,37 @@ +#' SDTM join +#' +#' [sdtm_join()] is a special join between a raw data set and a target data +#' set. This function supports conditioned data frames. +#' +#' @param raw_dat The raw dataset: a dataframe or a conditioned data frame. Must +#' include the variables passed in `id_vars`. +#' @param tgt_dat Target dataset: a data frame or a conditioned data frame to be +#' merged against `raw_dat` by the variables indicated in `id_vars`. +#' @param id_vars Key variables to be used in the join between the raw dataset +#' (`raw_dat`) and the target data set (`tgt_dat`). +#' +#' @returns A data frame, or a conditioned data frame if, at least, one of the +#' input data sets is a conditioned data frame. +#' +#' @keywords internal +#' @importFrom rlang %||% +sdtm_join <- function(raw_dat, + tgt_dat = NULL, + id_vars = oak_id_vars()) { + raw_dat_cnd <- get_cnd_df_cnd(raw_dat) %||% rep(TRUE, nrow(raw_dat)) + tgt_dat <- tgt_dat %||% raw_dat[id_vars] + tgt_dat_cnd <- get_cnd_df_cnd(tgt_dat) %||% rep(TRUE, nrow(tgt_dat)) + + # `rm_cnd_df()` prevents `mutate` from dispatching. + raw_dat <- dplyr::mutate(rm_cnd_df(raw_dat), `__raw_dat_cond__` = raw_dat_cnd) + tgt_dat <- dplyr::mutate(rm_cnd_df(tgt_dat), `__tgt_dat_cond__` = tgt_dat_cnd) + + res <- dplyr::right_join(raw_dat, y = tgt_dat, by = id_vars) + + cnd <- res$`__raw_dat_cond__` & res$`__tgt_dat_cond__` + res |> + dplyr::select(-dplyr::all_of(c( + "__raw_dat_cond__", "__tgt_dat_cond__" + ))) |> + new_cnd_df(cnd = cnd, .warn = FALSE) +} diff --git a/_pkgdown.yml b/_pkgdown.yml index 878b8998..fd45e961 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -9,13 +9,22 @@ navbar: reference: - title: SDTM derivation - desc: Set of functions to perform SDTM derivations + desc: Toolkit for SDTM derivations contents: - assign - harcode - derive_seq - derive_study_day - assign_datetime + - oak_id_vars + - sbj_vars + +- title: Conditioned data frames + desc: Functions for conditioned data frames (`cnd_df`) + contents: + - condition_add + - ctl_new_rowid_pillar.cnd_df + - tbl_sum.cnd_df - title: SDTM examples desc: SDTM domain file examples @@ -37,9 +46,10 @@ reference: - dtc_formats - problems -- title: Utils +- title: Explicit dot pipe operator + desc: A simple alternative to `%>% {...}` contents: - - sbj_vars + - "%.>%" - title: Package global state contents: diff --git a/inst/WORDLIST b/inst/WORDLIST index e047f082..70830c44 100644 --- a/inst/WORDLIST +++ b/inst/WORDLIST @@ -25,6 +25,9 @@ AE AESTDY CMSTDY DM +ungrouped +magrittr +magrittr's Immunogenicity Pharmacokinetics iRECIST diff --git a/man/assign.Rd b/man/assign.Rd index ff7df056..2f53ff73 100644 --- a/man/assign.Rd +++ b/man/assign.Rd @@ -7,37 +7,37 @@ \title{Derive an SDTM variable} \usage{ assign_no_ct( + tgt_dat = NULL, + tgt_var, raw_dat, raw_var, - tgt_var, - tgt_dat = NULL, id_vars = oak_id_vars() ) assign_ct( + tgt_dat = NULL, + tgt_var, raw_dat, raw_var, - tgt_var, ct_spec, ct_clst, - tgt_dat = NULL, id_vars = oak_id_vars() ) } \arguments{ +\item{tgt_dat}{Target dataset: a data frame to be merged against \code{raw_dat} by +the variables indicated in \code{id_vars}. This parameter is optional, see +section Value for how the output changes depending on this argument value.} + +\item{tgt_var}{The target SDTM variable: a single string indicating the name +of variable to be derived.} + \item{raw_dat}{The raw dataset (dataframe); must include the variables passed in \code{id_vars} and \code{raw_var}.} \item{raw_var}{The raw variable: a single string indicating the name of the raw variable in \code{raw_dat}.} -\item{tgt_var}{The target SDTM variable: a single string indicating the name -of variable to be derived.} - -\item{tgt_dat}{Target dataset: a data frame to be merged against \code{raw_dat} by -the variables indicated in \code{id_vars}. This parameter is optional, see -section Value for how the output changes depending on this argument value.} - \item{id_vars}{Key variables to be used in the join between the raw dataset (\code{raw_dat}) and the target data set (\code{raw_dat}).} @@ -82,9 +82,9 @@ md1 <- ) assign_no_ct( - raw_dat = md1, - raw_var = "MDIND", tgt_var = "CMINDC", + raw_dat = md1, + raw_var = "MDIND" ) cm_inter <- @@ -130,12 +130,12 @@ cm_inter <- (ct_spec <- read_ct_spec_example("ct-01-cm")) assign_ct( + tgt_dat = cm_inter, + tgt_var = "CMINDC", raw_dat = md1, raw_var = "MDIND", - tgt_var = "CMINDC", ct_spec = ct_spec, - ct_clst = "C66729", - tgt_dat = cm_inter + ct_clst = "C66729" ) } diff --git a/man/assign_datetime.Rd b/man/assign_datetime.Rd index 9130e775..b3564ddf 100644 --- a/man/assign_datetime.Rd +++ b/man/assign_datetime.Rd @@ -5,17 +5,24 @@ \title{Derive an ISO8601 date-time variable} \usage{ assign_datetime( + tgt_dat = NULL, + tgt_var, raw_dat, raw_var, raw_fmt, - tgt_var, raw_unk = c("UN", "UNK"), - tgt_dat = NULL, id_vars = oak_id_vars(), .warn = TRUE ) } \arguments{ +\item{tgt_dat}{Target dataset: a data frame to be merged against \code{raw_dat} by +the variables indicated in \code{id_vars}. This parameter is optional, see +section Value for how the output changes depending on this argument value.} + +\item{tgt_var}{The target SDTM variable: a single string indicating the name +of variable to be derived.} + \item{raw_dat}{The raw dataset (dataframe); must include the variables passed in \code{id_vars} and \code{raw_var}.} @@ -30,16 +37,9 @@ element is taken as parsing format for each variable indicated in vector of formats. The first vector of formats is used for parsing the first variable in \code{raw_var}, and so on.} -\item{tgt_var}{The target SDTM variable: a single string indicating the name -of variable to be derived.} - \item{raw_unk}{A character vector of string literals to be regarded as missing values during parsing.} -\item{tgt_dat}{Target dataset: a data frame to be merged against \code{raw_dat} by -the variables indicated in \code{id_vars}. This parameter is optional, see -section Value for how the output changes depending on this argument value.} - \item{id_vars}{Key variables to be used in the join between the raw dataset (\code{raw_dat}) and the target data set (\code{tgt_dat}).} @@ -88,11 +88,11 @@ md1 <- # indicating that these values are missing/unknown (unk). cm1 <- assign_datetime( + tgt_var = "CMSTDTC", raw_dat = md1, raw_var = "MDBDR", raw_fmt = "d-m-y", - raw_unk = c("UN", "UNK"), - tgt_var = "CMSTDTC" + raw_unk = c("UN", "UNK") ) cm1 @@ -147,11 +147,11 @@ cm_inter <- # data set `cm_inter`. cm2 <- assign_datetime( + tgt_dat = cm_inter, + tgt_var = "CMSTDTC", raw_dat = md1, raw_var = "MDBDR", - raw_fmt = "d-m-y", - tgt_var = "CMSTDTC", - tgt_dat = cm_inter + raw_fmt = "d-m-y" ) cm2 @@ -164,11 +164,11 @@ problems(cm2$CMSTDTC) # MDETM (correspondence is by positional matching). cm3 <- assign_datetime( + tgt_var = "CMSTDTC", raw_dat = md1, raw_var = c("MDEDR", "MDETM"), raw_fmt = c("d-m-y", "H:M:S"), - raw_unk = c("UN", "UNK"), - tgt_var = "CMSTDTC" + raw_unk = c("UN", "UNK") ) cm3 diff --git a/man/condition_add.Rd b/man/condition_add.Rd new file mode 100644 index 00000000..5640ab71 --- /dev/null +++ b/man/condition_add.Rd @@ -0,0 +1,36 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/cnd_df.R +\name{condition_add} +\alias{condition_add} +\title{Add filtering tags to a data set} +\usage{ +condition_add(dat, ..., .na = NA, .dat2 = rlang::env()) +} +\arguments{ +\item{dat}{A data frame.} + +\item{...}{Conditions to filter the data frame.} + +\item{.na}{Return value to be used when the conditions evaluate to \code{NA}.} + +\item{.dat2}{An optional environment to look for variables involved in +logical expression passed in \code{...}. A data frame or a list can also be +passed that will be coerced to an environment internally.} +} +\value{ +A conditioned data frame, meaning a tibble with an additional class +\code{cnd_df} and a logical vector attribute indicating matching rows. +} +\description{ +\code{condition_add()} tags records in a data set, indicating which rows match the +specified conditions, resulting in a conditioned data frame. Learn how to +integrate conditioned data frames in your SDTM domain derivation in +\code{vignette("cnd_df")}. +} +\examples{ +(df <- tibble::tibble(x = 1L:3L, y = letters[x])) + +# Mark rows for which `x` greater than `1` +(cnd_df <- condition_add(dat = df, x > 1L)) + +} diff --git a/man/contains_oak_id_vars.Rd b/man/contains_oak_id_vars.Rd index c872bbbe..21bc97fb 100644 --- a/man/contains_oak_id_vars.Rd +++ b/man/contains_oak_id_vars.Rd @@ -21,7 +21,7 @@ variables --- these are defined by the return value of \code{\link[=oak_id_vars] # `oak_id_vars()` is the function that defines what are the minimal set of # oak keys. Hence, by definition, the following code should always return # `TRUE`. -sdtm.oak:::contains_oak_id_vars(sdtm.oak:::oak_id_vars()) +sdtm.oak:::contains_oak_id_vars(oak_id_vars()) # Returns `FALSE`. sdtm.oak:::contains_oak_id_vars(character()) diff --git a/man/ctl_new_rowid_pillar.cnd_df.Rd b/man/ctl_new_rowid_pillar.cnd_df.Rd new file mode 100644 index 00000000..eb469be6 --- /dev/null +++ b/man/ctl_new_rowid_pillar.cnd_df.Rd @@ -0,0 +1,23 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/cnd_df.R +\name{ctl_new_rowid_pillar.cnd_df} +\alias{ctl_new_rowid_pillar.cnd_df} +\title{Conditioned tibble pillar print method} +\usage{ +\method{ctl_new_rowid_pillar}{cnd_df}(controller, x, width, ...) +} +\arguments{ +\item{controller}{The object of class \code{"tbl"} currently printed.} + +\item{x}{A simple (one-dimensional) vector.} + +\item{width}{The available width, can be a vector for multiple tiers.} + +\item{...}{These dots are for future extensions and must be empty.} +} +\description{ +Conditioned tibble pillar print method +} +\seealso{ +\code{\link[=tbl_sum.cnd_df]{tbl_sum.cnd_df()}}. +} diff --git a/man/dot_pipe.Rd b/man/dot_pipe.Rd new file mode 100644 index 00000000..716b9fee --- /dev/null +++ b/man/dot_pipe.Rd @@ -0,0 +1,64 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/pipe.R +\name{\%.>\%} +\alias{\%.>\%} +\title{Explicit Dot Pipe} +\usage{ +lhs \%.>\% rhs +} +\arguments{ +\item{lhs}{A value to be piped forward.} + +\item{rhs}{A function call that utilizes the dot (\code{.}) placeholder to specify +where \code{lhs} should be placed.} +} +\description{ +\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#experimental}{\figure{lifecycle-experimental.svg}{options: alt='[Experimental]'}}}{\strong{[Experimental]}} + +This operator pipes an object forward into a function or call expression +using an explicit placement of the dot (\code{.}) placeholder. Unlike magrittr's +\link[magrittr:pipe]{\%>\%} operator, \verb{\%.>\%} does not automatically place the +left-hand side (\code{lhs}) as the first argument in the right-hand side (\code{rhs}) +call. This operator provides a simpler alternative to the use of braces with +magrittr, while achieving similar behavior. +} +\details{ +The \verb{\%.>\%} operator is used to pipe the \code{lhs} value into the \code{rhs} function +call. Within the \code{rhs} expression, the placeholder \code{.} represents the +position where \code{lhs} will be inserted. This provides more control over where +the \code{lhs} value appears in the \code{rhs} function call, compared to the magrittr +pipe operator which always places \code{lhs} as the first argument of \code{rhs}. + +Unlike magrittr's pipe, which may require the use of braces to fully control +the placement of \code{lhs} in nested function calls, \verb{\%.>\%} simplifies this by +directly allowing multiple usages of the dot placeholder without requiring +braces. For example, the following expression using magrittr's pipe and +braces: + +\if{html}{\out{
}}\preformatted{library(magrittr) + +1:10 \%>\% \{ c(min(.), max(.)) \} +}\if{html}{\out{
}} + +can be written as: + +\if{html}{\out{
}}\preformatted{1:10 \%.>\% c(min(.), max(.)) +}\if{html}{\out{
}} + +without needing additional braces. +\subsection{Downside}{ + +The disadvantage of \verb{\%.>\%} is that you always need to use +the dot placeholder, even when piping to the first argument of the +right-hand side (\code{rhs}). +} +} +\examples{ + +# Equivalent to `subset(head(iris), 1:nrow(head(iris)) \%\% 2 == 0)` +head(iris) \%.>\% subset(., 1:nrow(.) \%\% 2 == 0) + +# Equivalent to `c(min(1:10), max(1:10))` +1:10 \%.>\% c(min(.), max(.)) + +} diff --git a/man/eval_conditions.Rd b/man/eval_conditions.Rd new file mode 100644 index 00000000..cafa6f5d --- /dev/null +++ b/man/eval_conditions.Rd @@ -0,0 +1,75 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/cnd_df.R +\name{eval_conditions} +\alias{eval_conditions} +\title{Evaluate conditions} +\usage{ +eval_conditions(dat, ..., .na = NA, .env = rlang::caller_env()) +} +\arguments{ +\item{dat}{A data frame} + +\item{...}{A set of logical conditions, e.g. \verb{y & z, x | z} (\code{x}, \code{y}, \code{z} +would have to exist either as columns in \code{dat} or in the environment +\code{.env}). If multiple expressions are included, they are combined with the +\code{&} operator.} + +\item{.na}{Return value to be used when the conditions evaluate to \code{NA}.} + +\item{.env}{An optional environment to look for variables involved in logical +expression passed in \code{...}. A data frame or a list can also be passed that +will be coerced to an environment internally.} +} +\value{ +A logical vector reflecting matching rows in \code{dat}. +} +\description{ +\code{\link[=eval_conditions]{eval_conditions()}} evaluates a set of conditions in the context of a +data frame and an optional environment. + +The utility of this function is to provide an easy way to generate a logical +vector of matching records from a set of logical conditions involving +variables in a data frame (\code{dat}) and optionally in a supplementary +environment (\code{.env}). The set of logical conditions are provided as +expressions to be evaluated in the context of \code{dat} and \code{.env}. + +Variables are looked up in \code{dat}, then in \code{.env}, then in the calling +function's environment, followed by its parent environments. +} +\examples{ +# Create a sample data frame +df <- data.frame( + x = c(1, 2, NA_integer_, 4, 5), + y = c(TRUE, FALSE, TRUE, FALSE, TRUE), + z = c("a", "b", "a", "b", "a") +) + +# Simple condition on one column +sdtm.oak:::eval_conditions(df, x > 2) + +# Combined conditions on multiple columns +sdtm.oak:::eval_conditions(df, x > 2 & y) +sdtm.oak:::eval_conditions(df, x > 2, y) + +# Using conditions with NA handling +df_with_na <- data.frame( + x = c(1, 2, NA, 4, 5), + y = c(TRUE, FALSE, TRUE, FALSE, TRUE) +) +sdtm.oak:::eval_conditions(df_with_na, x > 2, .na = FALSE) + +# The environment where `eval_conditions()` is called is also inspected +# when evaluating conditions in `...`. +w <- 1 +sdtm.oak:::eval_conditions(df, x > w) + +# Using an environment +env <- rlang::env(w = 2) +sdtm.oak:::eval_conditions(df, x > w, .env = env) + +# In place of an environment, you may alternatively pass a list or data frame. +sdtm.oak:::eval_conditions(df, x > w, .env = list(w = 3)) +sdtm.oak:::eval_conditions(df, x > w, .env = tibble::tibble(w = 4)) + +} +\keyword{internal} diff --git a/man/figures/lifecycle-archived.svg b/man/figures/lifecycle-archived.svg new file mode 100644 index 00000000..745ab0c7 --- /dev/null +++ b/man/figures/lifecycle-archived.svg @@ -0,0 +1,21 @@ + + lifecycle: archived + + + + + + + + + + + + + + + lifecycle + + archived + + diff --git a/man/figures/lifecycle-defunct.svg b/man/figures/lifecycle-defunct.svg new file mode 100644 index 00000000..d5c9559e --- /dev/null +++ b/man/figures/lifecycle-defunct.svg @@ -0,0 +1,21 @@ + + lifecycle: defunct + + + + + + + + + + + + + + + lifecycle + + defunct + + diff --git a/man/figures/lifecycle-deprecated.svg b/man/figures/lifecycle-deprecated.svg new file mode 100644 index 00000000..b61c57c3 --- /dev/null +++ b/man/figures/lifecycle-deprecated.svg @@ -0,0 +1,21 @@ + + lifecycle: deprecated + + + + + + + + + + + + + + + lifecycle + + deprecated + + diff --git a/man/figures/lifecycle-experimental.svg b/man/figures/lifecycle-experimental.svg new file mode 100644 index 00000000..5d88fc2c --- /dev/null +++ b/man/figures/lifecycle-experimental.svg @@ -0,0 +1,21 @@ + + lifecycle: experimental + + + + + + + + + + + + + + + lifecycle + + experimental + + diff --git a/man/figures/lifecycle-maturing.svg b/man/figures/lifecycle-maturing.svg new file mode 100644 index 00000000..897370ec --- /dev/null +++ b/man/figures/lifecycle-maturing.svg @@ -0,0 +1,21 @@ + + lifecycle: maturing + + + + + + + + + + + + + + + lifecycle + + maturing + + diff --git a/man/figures/lifecycle-questioning.svg b/man/figures/lifecycle-questioning.svg new file mode 100644 index 00000000..7c1721d0 --- /dev/null +++ b/man/figures/lifecycle-questioning.svg @@ -0,0 +1,21 @@ + + lifecycle: questioning + + + + + + + + + + + + + + + lifecycle + + questioning + + diff --git a/man/figures/lifecycle-soft-deprecated.svg b/man/figures/lifecycle-soft-deprecated.svg new file mode 100644 index 00000000..9c166ff3 --- /dev/null +++ b/man/figures/lifecycle-soft-deprecated.svg @@ -0,0 +1,21 @@ + + lifecycle: soft-deprecated + + + + + + + + + + + + + + + lifecycle + + soft-deprecated + + diff --git a/man/figures/lifecycle-stable.svg b/man/figures/lifecycle-stable.svg new file mode 100644 index 00000000..9bf21e76 --- /dev/null +++ b/man/figures/lifecycle-stable.svg @@ -0,0 +1,29 @@ + + lifecycle: stable + + + + + + + + + + + + + + + + lifecycle + + + + stable + + + diff --git a/man/figures/lifecycle-superseded.svg b/man/figures/lifecycle-superseded.svg new file mode 100644 index 00000000..db8d757f --- /dev/null +++ b/man/figures/lifecycle-superseded.svg @@ -0,0 +1,21 @@ + + lifecycle: superseded + + + + + + + + + + + + + + + lifecycle + + superseded + + diff --git a/man/get_cnd_df_cnd.Rd b/man/get_cnd_df_cnd.Rd new file mode 100644 index 00000000..ae3b3d7c --- /dev/null +++ b/man/get_cnd_df_cnd.Rd @@ -0,0 +1,33 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/cnd_df.R +\name{get_cnd_df_cnd} +\alias{get_cnd_df_cnd} +\title{Get the conditioning vector from a conditioned data frame} +\usage{ +get_cnd_df_cnd(dat) +} +\arguments{ +\item{dat}{A conditioned data frame (\code{cnd_df}).} +} +\value{ +The conditioning vector (\code{cnd}) if \code{dat} is a conditioned data frame +(\code{cnd_df}), otherwise \code{NULL}, or \code{NULL} if \code{dat} is not a conditioned data +frame (\code{cnd_df}). +} +\description{ +\code{\link[=get_cnd_df_cnd]{get_cnd_df_cnd()}} extracts the conditioning vector from a conditioned data +frame, i.e. from an object of class \code{cnd_df}. +} +\examples{ +df <- data.frame(x = 1:3, y = letters[1:3]) +sdtm.oak:::get_cnd_df_cnd(df) + +cnd_df <- sdtm.oak:::new_cnd_df(dat = df, cnd = c(FALSE, NA, TRUE)) +sdtm.oak:::get_cnd_df_cnd(cnd_df) + +} +\seealso{ +\code{\link[=new_cnd_df]{new_cnd_df()}}, \code{\link[=is_cnd_df]{is_cnd_df()}}, \code{\link[=get_cnd_df_cnd_sum]{get_cnd_df_cnd_sum()}}, +\code{\link[=rm_cnd_df]{rm_cnd_df()}}. +} +\keyword{internal} diff --git a/man/get_cnd_df_cnd_sum.Rd b/man/get_cnd_df_cnd_sum.Rd new file mode 100644 index 00000000..b7182b18 --- /dev/null +++ b/man/get_cnd_df_cnd_sum.Rd @@ -0,0 +1,32 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/cnd_df.R +\name{get_cnd_df_cnd_sum} +\alias{get_cnd_df_cnd_sum} +\title{Get the summary of the conditioning vector from a conditioned data frame} +\usage{ +get_cnd_df_cnd_sum(dat) +} +\arguments{ +\item{dat}{A conditioned data frame (\code{cnd_df}).} +} +\value{ +A vector of three elements providing the sum of \code{TRUE}, \code{FALSE}, and +\code{NA} values in the conditioning vector (\code{cnd}), or \code{NULL} if \code{dat} is not +a conditioned data frame (\code{cnd_df}). +} +\description{ +\code{\link[=get_cnd_df_cnd_sum]{get_cnd_df_cnd_sum()}} extracts the tally of the conditioning vector from a +conditioned data frame. +} +\examples{ +df <- data.frame(x = 1:3, y = letters[1:3]) +sdtm.oak:::get_cnd_df_cnd_sum(df) + +cnd_df <- sdtm.oak:::new_cnd_df(dat = df, cnd = c(FALSE, NA, TRUE)) +sdtm.oak:::get_cnd_df_cnd_sum(cnd_df) + +} +\seealso{ +\code{\link[=new_cnd_df]{new_cnd_df()}}, \code{\link[=is_cnd_df]{is_cnd_df()}}, \code{\link[=get_cnd_df_cnd]{get_cnd_df_cnd()}}, \code{\link[=rm_cnd_df]{rm_cnd_df()}}. +} +\keyword{internal} diff --git a/man/harcode.Rd b/man/harcode.Rd index e38424a5..573a267e 100644 --- a/man/harcode.Rd +++ b/man/harcode.Rd @@ -7,26 +7,33 @@ \title{Derive an SDTM variable with a hardcoded value} \usage{ hardcode_no_ct( + tgt_dat = NULL, + tgt_val, raw_dat, raw_var, tgt_var, - tgt_val, - tgt_dat = NULL, id_vars = oak_id_vars() ) hardcode_ct( + tgt_dat = NULL, + tgt_val, raw_dat, raw_var, tgt_var, - tgt_val, ct_spec, ct_clst, - tgt_dat = NULL, id_vars = oak_id_vars() ) } \arguments{ +\item{tgt_dat}{Target dataset: a data frame to be merged against \code{raw_dat} by +the variables indicated in \code{id_vars}. This parameter is optional, see +section Value for how the output changes depending on this argument value.} + +\item{tgt_val}{The target SDTM value to be hardcoded into the variable +indicated in \code{tgt_var}.} + \item{raw_dat}{The raw dataset (dataframe); must include the variables passed in \code{id_vars} and \code{raw_var}.} @@ -36,13 +43,6 @@ raw variable in \code{raw_dat}.} \item{tgt_var}{The target SDTM variable: a single string indicating the name of variable to be derived.} -\item{tgt_val}{The target SDTM value to be hardcoded into the variable -indicated in \code{tgt_var}.} - -\item{tgt_dat}{Target dataset: a data frame to be merged against \code{raw_dat} by -the variables indicated in \code{id_vars}. This parameter is optional, see -section Value for how the output changes depending on this argument value.} - \item{id_vars}{Key variables to be used in the join between the raw dataset (\code{raw_dat}) and the target data set (\code{raw_dat}).} @@ -87,10 +87,10 @@ md1 <- # Derive a new variable `CMCAT` by overwriting `MDRAW` with the # hardcoded value "GENERAL CONCOMITANT MEDICATIONS". hardcode_no_ct( + tgt_val = "GENERAL CONCOMITANT MEDICATIONS", raw_dat = md1, raw_var = "MDRAW", - tgt_var = "CMCAT", - tgt_val = "GENERAL CONCOMITANT MEDICATIONS" + tgt_var = "CMCAT" ) cm_inter <- @@ -107,11 +107,11 @@ cm_inter <- # hardcoded value "GENERAL CONCOMITANT MEDICATIONS" with a prior join to # `target_dataset`. hardcode_no_ct( + tgt_dat = cm_inter, + tgt_val = "GENERAL CONCOMITANT MEDICATIONS", raw_dat = md1, raw_var = "MDRAW", - tgt_var = "CMCAT", - tgt_val = "GENERAL CONCOMITANT MEDICATIONS", - tgt_dat = cm_inter + tgt_var = "CMCAT" ) # Controlled terminology specification @@ -121,13 +121,13 @@ hardcode_no_ct( # involving terminology recoding. `NA` values in `MDRAW` are preserved in # `CMCAT`. hardcode_ct( + tgt_dat = cm_inter, + tgt_var = "CMCAT", raw_dat = md1, raw_var = "MDRAW", - tgt_var = "CMCAT", tgt_val = "GENERAL CONCOMITANT MEDICATIONS", ct_spec = ct_spec, - ct_clst = "C66729", - tgt_dat = cm_inter + ct_clst = "C66729" ) } diff --git a/man/is_cnd_df.Rd b/man/is_cnd_df.Rd new file mode 100644 index 00000000..3cc10af8 --- /dev/null +++ b/man/is_cnd_df.Rd @@ -0,0 +1,32 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/cnd_df.R +\name{is_cnd_df} +\alias{is_cnd_df} +\title{Check if a data frame is a conditioned data frame} +\usage{ +is_cnd_df(dat) +} +\arguments{ +\item{dat}{A data frame.} +} +\value{ +\code{TRUE} if \code{dat} is a conditioned data frame (class \code{cnd_df}), +otherwise \code{FALSE}. +} +\description{ +\code{\link[=is_cnd_df]{is_cnd_df()}} checks whether a data frame is a conditioned data frame, i.e. +of class \code{cnd_df}. +} +\examples{ +df <- data.frame(x = 1:3, y = letters[1:3]) +sdtm.oak:::is_cnd_df(df) + +cnd_df <- sdtm.oak:::new_cnd_df(dat = df, cnd = c(FALSE, NA, TRUE)) +sdtm.oak:::is_cnd_df(cnd_df) + +} +\seealso{ +\code{\link[=new_cnd_df]{new_cnd_df()}}, \code{\link[=get_cnd_df_cnd]{get_cnd_df_cnd()}}, \code{\link[=get_cnd_df_cnd_sum]{get_cnd_df_cnd_sum()}}, +\code{\link[=rm_cnd_df]{rm_cnd_df()}}. +} +\keyword{internal} diff --git a/man/mutate.cnd_df.Rd b/man/mutate.cnd_df.Rd new file mode 100644 index 00000000..a11b38fd --- /dev/null +++ b/man/mutate.cnd_df.Rd @@ -0,0 +1,68 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/cnd_df.R +\name{mutate.cnd_df} +\alias{mutate.cnd_df} +\title{Mutate method for conditioned data frames} +\usage{ +\method{mutate}{cnd_df}( + .data, + ..., + .by = NULL, + .keep = c("all", "used", "unused", "none"), + .before = NULL, + .after = NULL +) +} +\arguments{ +\item{.data}{A conditioned data frame.} + +\item{...}{<\code{\link[rlang:args_data_masking]{data-masking}}> Name-value pairs. +The name gives the name of the column in the output. + +The value can be: +\itemize{ +\item A vector of length 1, which will be recycled to the correct length. +\item A vector the same length as the current group (or the whole data frame +if ungrouped). +\item \code{NULL}, to remove the column. +\item A data frame or tibble, to create multiple columns in the output. +}} + +\item{.by}{Not used when \code{.data} is a conditioned data frame.} + +\item{.keep}{Control which columns from \code{.data} are retained in the output. Grouping +columns and columns created by \code{...} are always kept. +\itemize{ +\item \code{"all"} retains all columns from \code{.data}. This is the default. +\item \code{"used"} retains only the columns used in \code{...} to create new +columns. This is useful for checking your work, as it displays inputs +and outputs side-by-side. +\item \code{"unused"} retains only the columns \emph{not} used in \code{...} to create new +columns. This is useful if you generate new columns, but no longer need +the columns used to generate them. +\item \code{"none"} doesn't retain any extra columns from \code{.data}. Only the grouping +variables and columns created by \code{...} are kept. +}} + +\item{.before}{Not used, use \code{.after} instead.} + +\item{.after}{Control where new columns should appear, i.e. after which +columns.} +} +\description{ +\code{\link[=mutate.cnd_df]{mutate.cnd_df()}} is an S3 method to be dispatched by \link[dplyr:mutate]{mutate} +generic on conditioned data frames. This function implements a conditional +mutate by only changing rows for which the condition stored in the +conditioned data frame is \code{TRUE}. +} +\examples{ +df <- tibble::tibble(x = 1L:3L, y = letters[x]) +cnd_df <- condition_add(df, x > 1L, y \%in\% c("a", "b")) + +# Because `cnd_df` is a conditioned data frame, dplyr::mutate() generic +# dispatches this S3 method and mutates only the second row, as that is the +# only record that fulfills simultaneously `x > 1L` and `y \%in\% c("a", "b")`. +dplyr::mutate(cnd_df, z = "match") + +} +\keyword{internal} diff --git a/man/new_cnd_df.Rd b/man/new_cnd_df.Rd new file mode 100644 index 00000000..0de6282e --- /dev/null +++ b/man/new_cnd_df.Rd @@ -0,0 +1,44 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/cnd_df.R +\name{new_cnd_df} +\alias{new_cnd_df} +\title{Create a data frame with filtering tags} +\usage{ +new_cnd_df(dat, cnd, .warn = TRUE) +} +\arguments{ +\item{dat}{A data frame.} + +\item{cnd}{A logical vector. Length must match the number of rows in \code{dat}.} + +\item{.warn}{Whether to warn about creating a new \emph{conditioned} data frame +in case that \code{dat} already is one.} +} +\value{ +A data frame \code{dat} with the additional class \code{"cnd_df"} and the +following attributes: +\itemize{ +\item \code{cnd}: The logical vector passed as argument \code{cnd}: \code{TRUE} values mark +rows in \code{dat} to be used for transformations; rows marked with \code{FALSE} are +not transformed; and \code{NA} mark rows whose transformations are to be applied +resulting in \code{NA}. +\item \code{cnd_sum}: An integer vector of three elements providing the sum of \code{TRUE}, +\code{FALSE} and \code{NA} values in \code{cnd}, respectively. +} +} +\description{ +\code{\link[=new_cnd_df]{new_cnd_df()}} creates a \emph{conditioned} data frame, classed \code{cnd_df}, meaning +that this function extends the data frame passed as argument by storing a +logical vector \code{cnd} (as attribute) that marks rows for posterior conditional +transformation by methods that support \emph{conditioned} data frames. +} +\examples{ +df <- data.frame(x = 1:3, y = letters[1:3]) +sdtm.oak:::new_cnd_df(dat = df, cnd = c(FALSE, NA, TRUE)) + +} +\seealso{ +\code{\link[=is_cnd_df]{is_cnd_df()}}, \code{\link[=get_cnd_df_cnd]{get_cnd_df_cnd()}}, \code{\link[=get_cnd_df_cnd_sum]{get_cnd_df_cnd_sum()}}, +\code{\link[=rm_cnd_df]{rm_cnd_df()}}. +} +\keyword{internal} diff --git a/man/oak_id_vars.Rd b/man/oak_id_vars.Rd index af5550a3..86b9113b 100644 --- a/man/oak_id_vars.Rd +++ b/man/oak_id_vars.Rd @@ -22,9 +22,8 @@ oak_id, raw_source, and patient_number. Extra variable names may be indicated and passed in \code{extra_vars} which are appended to the default names. } \examples{ -sdtm.oak:::oak_id_vars() +oak_id_vars() -sdtm.oak:::oak_id_vars(extra_vars = "sample_id") +oak_id_vars(extra_vars = "sample_id") } -\keyword{internal} diff --git a/man/rm_cnd_df.Rd b/man/rm_cnd_df.Rd new file mode 100644 index 00000000..bd740d2c --- /dev/null +++ b/man/rm_cnd_df.Rd @@ -0,0 +1,31 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/cnd_df.R +\name{rm_cnd_df} +\alias{rm_cnd_df} +\title{Remove the \code{cnd_df} class from a data frame} +\usage{ +rm_cnd_df(dat) +} +\arguments{ +\item{dat}{A data frame.} +} +\value{ +The input \code{dat} without the \code{cnd_df} class and associated attributes. +} +\description{ +This function removes the \code{cnd_df} class, along with its attributes, if +applicable. +} +\examples{ +df <- data.frame(x = 1:3, y = letters[1:3]) +cnd_df <- sdtm.oak:::new_cnd_df(dat = df, cnd = c(FALSE, NA, TRUE)) + +sdtm.oak:::is_cnd_df(cnd_df) +sdtm.oak:::is_cnd_df(sdtm.oak:::rm_cnd_df(cnd_df)) + +} +\seealso{ +\code{\link[=new_cnd_df]{new_cnd_df()}}, \code{\link[=is_cnd_df]{is_cnd_df()}}, \code{\link[=get_cnd_df_cnd]{get_cnd_df_cnd()}}, +\code{\link[=get_cnd_df_cnd_sum]{get_cnd_df_cnd_sum()}}. +} +\keyword{internal} diff --git a/man/sdtm_assign.Rd b/man/sdtm_assign.Rd index 676979dc..14b4ed03 100644 --- a/man/sdtm_assign.Rd +++ b/man/sdtm_assign.Rd @@ -5,25 +5,29 @@ \title{Derive an SDTM variable} \usage{ sdtm_assign( + tgt_dat = NULL, + tgt_var, raw_dat, raw_var, - tgt_var, ct_spec = NULL, ct_clst = NULL, - tgt_dat = NULL, id_vars = oak_id_vars() ) } \arguments{ +\item{tgt_dat}{Target dataset: a data frame to be merged against \code{raw_dat} by +the variables indicated in \code{id_vars}. This parameter is optional, see +section Value for how the output changes depending on this argument value.} + +\item{tgt_var}{The target SDTM variable: a single string indicating the name +of variable to be derived.} + \item{raw_dat}{The raw dataset (dataframe); must include the variables passed in \code{id_vars} and \code{raw_var}.} \item{raw_var}{The raw variable: a single string indicating the name of the raw variable in \code{raw_dat}.} -\item{tgt_var}{The target SDTM variable: a single string indicating the name -of variable to be derived.} - \item{ct_spec}{Study controlled terminology specification: a dataframe with a minimal set of columns, see \code{\link[=ct_spec_vars]{ct_spec_vars()}} for details. This parameter is optional, if left as \code{NULL} no controlled terminology recoding is applied.} @@ -32,12 +36,8 @@ optional, if left as \code{NULL} no controlled terminology recoding is applied.} terminology to apply in the derivation. This parameter is optional, if left as \code{NULL}, all possible recodings in \code{ct_spec} are attempted.} -\item{tgt_dat}{Target dataset: a data frame to be merged against \code{raw_dat} by -the variables indicated in \code{id_vars}. This parameter is optional, see -section Value for how the output changes depending on this argument value.} - \item{id_vars}{Key variables to be used in the join between the raw dataset -(\code{raw_dat}) and the target data set (\code{raw_dat}).} +(\code{raw_dat}) and the target data set (\code{tgt_dat}).} } \value{ The returned data set depends on the value of \code{tgt_dat}: diff --git a/man/sdtm_hardcode.Rd b/man/sdtm_hardcode.Rd index 5c3435b5..065d3942 100644 --- a/man/sdtm_hardcode.Rd +++ b/man/sdtm_hardcode.Rd @@ -5,26 +5,30 @@ \title{Derive an SDTM variable with a hardcoded value} \usage{ sdtm_hardcode( + tgt_dat = NULL, + tgt_var, raw_dat, raw_var, - tgt_var, tgt_val, ct_spec = NULL, ct_clst = NULL, - tgt_dat = NULL, id_vars = oak_id_vars() ) } \arguments{ +\item{tgt_dat}{Target dataset: a data frame to be merged against \code{raw_dat} by +the variables indicated in \code{id_vars}. This parameter is optional, see +section Value for how the output changes depending on this argument value.} + +\item{tgt_var}{The target SDTM variable: a single string indicating the name +of variable to be derived.} + \item{raw_dat}{The raw dataset (dataframe); must include the variables passed in \code{id_vars} and \code{raw_var}.} \item{raw_var}{The raw variable: a single string indicating the name of the raw variable in \code{raw_dat}.} -\item{tgt_var}{The target SDTM variable: a single string indicating the name -of variable to be derived.} - \item{tgt_val}{The target SDTM value to be hardcoded into the variable indicated in \code{tgt_var}.} @@ -36,12 +40,8 @@ optional, if left as \code{NULL} no controlled terminology recoding is applied.} terminology to apply in the derivation. This parameter is optional, if left as \code{NULL}, all possible recodings in \code{ct_spec} are attempted.} -\item{tgt_dat}{Target dataset: a data frame to be merged against \code{raw_dat} by -the variables indicated in \code{id_vars}. This parameter is optional, see -section Value for how the output changes depending on this argument value.} - \item{id_vars}{Key variables to be used in the join between the raw dataset -(\code{raw_dat}) and the target data set (\code{raw_dat}).} +(\code{raw_dat}) and the target data set (\code{tgt_dat}).} } \value{ The returned data set depends on the value of \code{tgt_dat}: diff --git a/man/sdtm_join.Rd b/man/sdtm_join.Rd new file mode 100644 index 00000000..1c7c02e7 --- /dev/null +++ b/man/sdtm_join.Rd @@ -0,0 +1,35 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/join.R, R/sdtm_join.R +\name{sdtm_join} +\alias{sdtm_join} +\title{SDTM join} +\usage{ +sdtm_join(raw_dat, tgt_dat = NULL, id_vars = oak_id_vars()) + +sdtm_join(raw_dat, tgt_dat = NULL, id_vars = oak_id_vars()) +} +\arguments{ +\item{raw_dat}{The raw dataset: a dataframe or a conditioned data frame. Must +include the variables passed in \code{id_vars}.} + +\item{tgt_dat}{Target dataset: a data frame or a conditioned data frame to be +merged against \code{raw_dat} by the variables indicated in \code{id_vars}.} + +\item{id_vars}{Key variables to be used in the join between the raw dataset +(\code{raw_dat}) and the target data set (\code{tgt_dat}).} +} +\value{ +A data frame, or a conditioned data frame if at least one of the +input data sets is a conditioned data frame. + +A data frame, or a conditioned data frame if, at least, one of the +input data sets is a conditioned data frame. +} +\description{ +\code{\link[=sdtm_join]{sdtm_join()}} is a special join between a raw data set and a target data +set. This function supports conditioned data frames. + +\code{\link[=sdtm_join]{sdtm_join()}} is a special join between a raw data set and a target data +set. This function supports conditioned data frames. +} +\keyword{internal} diff --git a/man/tbl_sum.cnd_df.Rd b/man/tbl_sum.cnd_df.Rd new file mode 100644 index 00000000..ad009f66 --- /dev/null +++ b/man/tbl_sum.cnd_df.Rd @@ -0,0 +1,28 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/cnd_df.R +\name{tbl_sum.cnd_df} +\alias{tbl_sum.cnd_df} +\title{Conditioned tibble header print method} +\usage{ +\method{tbl_sum}{cnd_df}(x, ...) +} +\arguments{ +\item{x}{A conditioned tibble of class \code{cnd_df}.} + +\item{...}{Additional arguments passed to the default print method.} +} +\description{ +Conditioned tibble header print method. This S3 method adds an extra line +in the header of a tibble that indicates the tibble is a conditioned tibble +(\verb{# Cond. tbl:}) followed by the tally of the conditioning vector: number +of TRUE, FALSE and NA values: e.g., \code{1/1/1}. +} +\examples{ +df <- data.frame(x = c(1L, NA_integer_, 3L)) +(cnd_df <- condition_add(dat = df, x >= 2L)) +pillar::tbl_sum(cnd_df) + +} +\seealso{ +\code{\link[=ctl_new_rowid_pillar.cnd_df]{ctl_new_rowid_pillar.cnd_df()}}. +} diff --git a/renv/profiles/4.4/renv.lock b/renv/profiles/4.4/renv.lock deleted file mode 100644 index ec45f2c2..00000000 --- a/renv/profiles/4.4/renv.lock +++ /dev/null @@ -1,538 +0,0 @@ -{ - "R": { - "Version": "4.4.0", - "Repositories": [ - { - "Name": "CRAN", - "URL": "https://packagemanager.posit.co/cran/latest" - } - ] - }, - "Packages": { - "R6": { - "Package": "R6", - "Version": "2.5.1", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R" - ], - "Hash": "470851b6d5d0ac559e9d01bb352b4021" - }, - "admiraldev": { - "Package": "admiraldev", - "Version": "1.0.0", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "dplyr", - "hms", - "lifecycle", - "lubridate", - "magrittr", - "purrr", - "rlang", - "stringr", - "tidyr", - "tidyselect" - ], - "Hash": "4ab0476ca36f502f6cdd2080f8d0f261" - }, - "assertthat": { - "Package": "assertthat", - "Version": "0.2.1", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "tools" - ], - "Hash": "50c838a310445e954bc13f26f26a6ecf" - }, - "bit": { - "Package": "bit", - "Version": "4.0.5", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R" - ], - "Hash": "d242abec29412ce988848d0294b208fd" - }, - "bit64": { - "Package": "bit64", - "Version": "4.0.5", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "bit", - "methods", - "stats", - "utils" - ], - "Hash": "9fe98599ca456d6552421db0d6772d8f" - }, - "cachem": { - "Package": "cachem", - "Version": "1.0.8", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "fastmap", - "rlang" - ], - "Hash": "c35768291560ce302c0a6589f92e837d" - }, - "cli": { - "Package": "cli", - "Version": "3.6.2", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "utils" - ], - "Hash": "1216ac65ac55ec0058a6f75d7ca0fd52" - }, - "clipr": { - "Package": "clipr", - "Version": "0.8.0", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "utils" - ], - "Hash": "3f038e5ac7f41d4ac41ce658c85e3042" - }, - "cpp11": { - "Package": "cpp11", - "Version": "0.4.7", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R" - ], - "Hash": "5a295d7d963cc5035284dcdbaf334f4e" - }, - "crayon": { - "Package": "crayon", - "Version": "1.5.2", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "grDevices", - "methods", - "utils" - ], - "Hash": "e8a1e41acf02548751f45c718d55aa6a" - }, - "dplyr": { - "Package": "dplyr", - "Version": "1.1.4", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "R6", - "cli", - "generics", - "glue", - "lifecycle", - "magrittr", - "methods", - "pillar", - "rlang", - "tibble", - "tidyselect", - "utils", - "vctrs" - ], - "Hash": "fedd9d00c2944ff00a0e2696ccf048ec" - }, - "fansi": { - "Package": "fansi", - "Version": "1.0.6", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "grDevices", - "utils" - ], - "Hash": "962174cf2aeb5b9eea581522286a911f" - }, - "fastmap": { - "Package": "fastmap", - "Version": "1.1.1", - "Source": "Repository", - "Repository": "CRAN", - "Hash": "f7736a18de97dea803bde0a2daaafb27" - }, - "generics": { - "Package": "generics", - "Version": "0.1.3", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "methods" - ], - "Hash": "15e9634c0fcd294799e9b2e929ed1b86" - }, - "glue": { - "Package": "glue", - "Version": "1.7.0", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "methods" - ], - "Hash": "e0b3a53876554bd45879e596cdb10a52" - }, - "hms": { - "Package": "hms", - "Version": "1.1.3", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "lifecycle", - "methods", - "pkgconfig", - "rlang", - "vctrs" - ], - "Hash": "b59377caa7ed00fa41808342002138f9" - }, - "lifecycle": { - "Package": "lifecycle", - "Version": "1.0.4", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "cli", - "glue", - "rlang" - ], - "Hash": "b8552d117e1b808b09a832f589b79035" - }, - "lubridate": { - "Package": "lubridate", - "Version": "1.9.3", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "generics", - "methods", - "timechange" - ], - "Hash": "680ad542fbcf801442c83a6ac5a2126c" - }, - "magrittr": { - "Package": "magrittr", - "Version": "2.0.3", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R" - ], - "Hash": "7ce2733a9826b3aeb1775d56fd305472" - }, - "memoise": { - "Package": "memoise", - "Version": "2.0.1", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "cachem", - "rlang" - ], - "Hash": "e2817ccf4a065c5d9d7f2cfbe7c1d78c" - }, - "pillar": { - "Package": "pillar", - "Version": "1.9.0", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "cli", - "fansi", - "glue", - "lifecycle", - "rlang", - "utf8", - "utils", - "vctrs" - ], - "Hash": "15da5a8412f317beeee6175fbc76f4bb" - }, - "pkgconfig": { - "Package": "pkgconfig", - "Version": "2.0.3", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "utils" - ], - "Hash": "01f28d4278f15c76cddbea05899c5d6f" - }, - "prettyunits": { - "Package": "prettyunits", - "Version": "1.2.0", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R" - ], - "Hash": "6b01fc98b1e86c4f705ce9dcfd2f57c7" - }, - "progress": { - "Package": "progress", - "Version": "1.2.3", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "R6", - "crayon", - "hms", - "prettyunits" - ], - "Hash": "f4625e061cb2865f111b47ff163a5ca6" - }, - "purrr": { - "Package": "purrr", - "Version": "1.0.2", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "cli", - "lifecycle", - "magrittr", - "rlang", - "vctrs" - ], - "Hash": "1cba04a4e9414bdefc9dcaa99649a8dc" - }, - "readr": { - "Package": "readr", - "Version": "2.1.5", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "R6", - "cli", - "clipr", - "cpp11", - "crayon", - "hms", - "lifecycle", - "methods", - "rlang", - "tibble", - "tzdb", - "utils", - "vroom" - ], - "Hash": "9de96463d2117f6ac49980577939dfb3" - }, - "renv": { - "Package": "renv", - "Version": "1.0.7", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "utils" - ], - "Hash": "397b7b2a265bc5a7a06852524dabae20" - }, - "rlang": { - "Package": "rlang", - "Version": "1.1.3", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "utils" - ], - "Hash": "42548638fae05fd9a9b5f3f437fbbbe2" - }, - "stringi": { - "Package": "stringi", - "Version": "1.8.4", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "stats", - "tools", - "utils" - ], - "Hash": "39e1144fd75428983dc3f63aa53dfa91" - }, - "stringr": { - "Package": "stringr", - "Version": "1.5.1", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "cli", - "glue", - "lifecycle", - "magrittr", - "rlang", - "stringi", - "vctrs" - ], - "Hash": "960e2ae9e09656611e0b8214ad543207" - }, - "tibble": { - "Package": "tibble", - "Version": "3.2.1", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "fansi", - "lifecycle", - "magrittr", - "methods", - "pillar", - "pkgconfig", - "rlang", - "utils", - "vctrs" - ], - "Hash": "a84e2cc86d07289b3b6f5069df7a004c" - }, - "tidyr": { - "Package": "tidyr", - "Version": "1.3.1", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "cli", - "cpp11", - "dplyr", - "glue", - "lifecycle", - "magrittr", - "purrr", - "rlang", - "stringr", - "tibble", - "tidyselect", - "utils", - "vctrs" - ], - "Hash": "915fb7ce036c22a6a33b5a8adb712eb1" - }, - "tidyselect": { - "Package": "tidyselect", - "Version": "1.2.1", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "cli", - "glue", - "lifecycle", - "rlang", - "vctrs", - "withr" - ], - "Hash": "829f27b9c4919c16b593794a6344d6c0" - }, - "timechange": { - "Package": "timechange", - "Version": "0.3.0", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "cpp11" - ], - "Hash": "c5f3c201b931cd6474d17d8700ccb1c8" - }, - "tzdb": { - "Package": "tzdb", - "Version": "0.4.0", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "cpp11" - ], - "Hash": "f561504ec2897f4d46f0c7657e488ae1" - }, - "utf8": { - "Package": "utf8", - "Version": "1.2.4", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R" - ], - "Hash": "62b65c52671e6665f803ff02954446e9" - }, - "vctrs": { - "Package": "vctrs", - "Version": "0.6.5", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "cli", - "glue", - "lifecycle", - "rlang" - ], - "Hash": "c03fa420630029418f7e6da3667aac4a" - }, - "vroom": { - "Package": "vroom", - "Version": "1.6.5", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "bit64", - "cli", - "cpp11", - "crayon", - "glue", - "hms", - "lifecycle", - "methods", - "progress", - "rlang", - "stats", - "tibble", - "tidyselect", - "tzdb", - "vctrs", - "withr" - ], - "Hash": "390f9315bc0025be03012054103d227c" - }, - "withr": { - "Package": "withr", - "Version": "3.0.0", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "grDevices", - "graphics" - ], - "Hash": "d31b6c62c10dcf11ec530ca6b0dd5d35" - } - } -} diff --git a/renv/profiles/4.4/renv/settings.json b/renv/profiles/4.4/renv/settings.json deleted file mode 100644 index 74c1d4bb..00000000 --- a/renv/profiles/4.4/renv/settings.json +++ /dev/null @@ -1,19 +0,0 @@ -{ - "bioconductor.version": null, - "external.libraries": [], - "ignored.packages": [], - "package.dependency.fields": [ - "Imports", - "Depends", - "LinkingTo" - ], - "ppm.enabled": null, - "ppm.ignored.urls": [], - "r.version": null, - "snapshot.type": "explicit", - "use.cache": true, - "vcs.ignore.cellar": true, - "vcs.ignore.library": true, - "vcs.ignore.local": true, - "vcs.manage.ignores": true -} diff --git a/tests/testthat/test-assign.R b/tests/testthat/test-assign.R new file mode 100644 index 00000000..0ac42317 --- /dev/null +++ b/tests/testthat/test-assign.R @@ -0,0 +1,167 @@ +test_that("assign_ct works as expected with a conditioned `tgt_dat`", { + vs_raw_dat <- tibble::tibble( + oak_id = 1L:5L, + raw_source = c("VS1", "VS2", "VS3", "VS4", "VS5"), + patient_number = c(101L, 102L, 103L, 104L, 105L), + TEMPLOC = c("Oral", "Axillary", "Rectal", "Tympanic", "Temporal") + ) + + vs_tgt_dat <- tibble::tibble( + oak_id = as.integer(rep(1L:5L, each = 4L)), + raw_source = rep(c("VS1", "VS2", "VS3", "VS4", "VS5"), each = 4L), + patient_number = as.integer(rep(c(101L, 102L, 103L, 104L, 105L), each = 4L)), + VSTESTCD = c( + "TEMP", "BPSYS", "BPDIAS", "HR", + "TEMP", "BPSYS", "BPDIAS", "HR", + "TEMP", "BPSYS", "BPDIAS", "HR", + "TEMP", "BPSYS", "BPDIAS", "HR", + "TEMP", "BPSYS", "BPDIAS", "HR" + ) + ) + + # vital signs' locations + vs_loc_raw <- c( + "Mouth", "Arm", "Arm", "Arm", "Armpit", "Arm", "Arm", "Arm", + "Rectum", "Arm", "Arm", "Arm", "auris", "Arm", "Arm", "Arm", "brow", "Arm", + "Arm", "Arm" + ) + + vs_loc_tgt <- c( + "ORAL", + rep(NA, 3L), + "AXILLA", + rep(NA, 3L), + "ANUS", + rep(NA, 3L), + "EAR", + rep(NA, 3L), + "FOREHEAD", + rep(NA, 3L) + ) + + ct_spec <- tibble::tibble( + codelist_code = "C74456", + term_code = c("C32141", "C12674", "C12394", "C89803", "C43362"), + CodedData = c("ARM", "AXILLA", "EAR", "FOREHEAD", "ANUS"), + term_value = c("ARM", "AXILLA", "EAR", "FOREHEAD", "ANUS"), + collected_value = c("Arm", "Armpit", "auris", "brow", "anus"), + term_synonyms = c("Arm", "Axillary", "Tympanic", "Temporal", "Rectal") + ) + + result <- + assign_ct( + tgt_dat = condition_add(vs_tgt_dat, VSTESTCD == "TEMP"), + tgt_var = "VSLOC", + raw_dat = vs_raw_dat, + raw_var = "TEMPLOC", + ct_spec = ct_spec, + ct_clst = "C74456" + ) + + expected_result <- + tibble::add_column( + vs_tgt_dat, + VSLOC = vs_loc_tgt + ) + + expect_identical(result, expected_result) +}) + + +test_that("assign_ct works as expected with both `raw_dat` and `tgt_dat` as conditioned data frames", { + ct_spec <- tibble::tibble( + codelist_code = "C78734", + term_code = c("C150895", "C12434", "C13275", "C89803", "C12801"), + CodedData = c("SWABBED MATERIAL", "BLOOD", "SALIVA", "URINE", "TISSUE"), + term_value = c("SWABBED MATERIAL", "BLOOD", "SALIVA", "URINE", "TISSUE"), + collected_value = c("Nasopharyngeal Swab", "blood", "drool", "urine sample", "tissue"), + term_synonyms = c("Swab", "Blood", "Spit", "urinary excretion", "tissue sample") + ) + + fa_raw_dat <- tibble::tibble( + oak_id = as.integer(1L:5L), + raw_source = c("FA1", "FA2", "FA3", "FA4", "FA5"), + patient_number = 101L:105L, + SPCNM = c("Nasopharyngeal Swab", "Blood", "Saliva", "Urine", "Tissue"), + SPECTYP = c(NA, NA, "Swab", NA, NA) + ) + + fa_tgt_dat <- tibble::tibble( + oak_id = 1L:5L, + raw_source = c("FA1", "FA2", "FA3", "FA4", "FA5"), + patient_number = 101L:105L, + FATESTCD = c("STATUS", "OTHER", "STATUS", "STATUS", "OTHER"), + FAOBJ = c( + "Severe Acute Resp Syndrome Coronavirus 2", + "Other Condition", + "Severe Acute Resp Syndrome Coronavirus 2", + "Severe Acute Resp Syndrome Coronavirus 2", + "Other Condition" + ) + ) + + result <- + assign_ct( + tgt_dat = condition_add( + fa_tgt_dat, + FATESTCD == "STATUS" & + FAOBJ == "Severe Acute Resp Syndrome Coronavirus 2" + ), + tgt_var = "FASPEC", + raw_dat = condition_add(fa_raw_dat, is.na(SPECTYP)), + raw_var = "SPCNM", + ct_spec = ct_spec, + ct_clst = "C78734" + ) + + expected_result <- + fa_tgt_dat |> + tibble::add_column(FASPEC = c("SWABBED MATERIAL", NA, NA, "URINE", NA)) + + expect_identical(result, expected_result) +}) + +test_that("assign_ct works as expected with conditions across both data sets", { + cm_raw_dat <- tibble::tibble( + oak_id = 1L:5L, + raw_source = paste0("MD", 1L:5L), + patient_number = 101L:105L, + CMMODIFY = c("ASPIRIN EC", "IBUPROFEN LYSINE", "PARACETAMOL", "DICLOFENAC", "NAPROXEN") + ) + + cm_tgt_dat <- tibble::tibble( + oak_id = 1L:5L, + raw_source = paste0("MD", 1L:5L), + patient_number = 101L:105L, + CMTRT = c("ASPIRIN", "IBUPROFEN", "PARACETAMOL", "DICLOFENAC", "NAPROXEN") + ) + + # This only works if the raw data set and the target data set have the same + # number of records, otherwise the comparison CMMODIFY != CMTRT is not + # meaningful. + result1 <- + assign_no_ct( + tgt_dat = condition_add(cm_tgt_dat, CMMODIFY != CMTRT, .dat2 = cm_raw_dat), + tgt_var = "CMMODIFY", + raw_dat = cm_raw_dat, + raw_var = "CMMODIFY" + ) + + # Because both data sets have to have the same number of records for the + # comparison to be meaningful, then we can just as well condition the + # raw data set itself. + result2 <- + assign_no_ct( + tgt_dat = cm_tgt_dat, + tgt_var = "CMMODIFY", + raw_dat = condition_add(cm_raw_dat, CMMODIFY != CMTRT, .dat2 = cm_tgt_dat), + raw_var = "CMMODIFY" + ) + + expected_result <- + cm_tgt_dat |> + tibble::add_column(CMMODIFY = c("ASPIRIN EC", "IBUPROFEN LYSINE", NA, NA, NA)) + + expect_identical(result1, expected_result) + expect_identical(result2, expected_result) +}) diff --git a/tests/testthat/test-assign_datetime.R b/tests/testthat/test-assign_datetime.R index 5cb6e42c..3e9a73bf 100644 --- a/tests/testthat/test-assign_datetime.R +++ b/tests/testthat/test-assign_datetime.R @@ -22,22 +22,22 @@ test_that("assign_datetime: date and time conversion", { r"{There were 12 parsing problems\. Run `problems\(\)` on parsed results for details\.}" expect_warning(rlang::with_interactive( assign_datetime( + tgt_var = "CMSTDTC", raw_dat = md1, raw_var = c("MDEDR", "MDETM"), raw_fmt = c("d-m-y", "H:M:S"), - raw_unk = c("UN", "UNK"), - tgt_var = "CMSTDTC" + raw_unk = c("UN", "UNK") ) ), regexp = warning_msg) # If not run interactively then warnings should not be raised. expect_silent( cm1 <- assign_datetime( + tgt_var = "CMSTDTC", raw_dat = md1, raw_var = c("MDEDR", "MDETM"), raw_fmt = c("d-m-y", "H:M:S"), - raw_unk = c("UN", "UNK"), - tgt_var = "CMSTDTC" + raw_unk = c("UN", "UNK") ) ) @@ -72,8 +72,8 @@ test_that("assign_datetime: date and time conversion", { expected <- cm1 |> - dplyr::select("oak_id", "raw_source", "patient_number") |> + dplyr::select(dplyr::all_of(c("oak_id", "raw_source", "patient_number"))) |> dplyr::bind_cols(tibble::tibble(CMSTDTC = cmstdtc)) - expect_equal(object = cm1, expected = expected) + expect_identical(object = cm1, expected = expected) }) diff --git a/tests/testthat/test-cnd_df.R b/tests/testthat/test-cnd_df.R new file mode 100644 index 00000000..1574c4e7 --- /dev/null +++ b/tests/testthat/test-cnd_df.R @@ -0,0 +1,81 @@ +test_that("new_cnd_df creates conditioned data frame correctly", { + df <- tibble(x = 1L:3L, y = letters[1L:3L]) + cnd <- c(FALSE, NA, TRUE) + cnd_df <- sdtm.oak:::new_cnd_df(dat = df, cnd = cnd) + + expect_s3_class(cnd_df, "cnd_df") + expect_identical(attr(cnd_df, "cnd"), cnd) + expect_identical(attr(cnd_df, "cnd_sum"), c(n_true = 1L, n_false = 1L, n_na = 1L)) +}) + +test_that("new_cnd_df gives warning if dat is already cnd_df", { + df <- tibble(x = 1L:3L, y = letters[1L:3L]) + cnd <- c(FALSE, NA, TRUE) + cnd_df <- sdtm.oak:::new_cnd_df(dat = df, cnd = cnd) + + expect_warning(sdtm.oak:::new_cnd_df(dat = cnd_df, cnd = cnd, .warn = TRUE)) +}) + +test_that("new_cnd_df errors when cnd length doesn't match dat rows", { + df <- tibble(x = 1L:3L, y = letters[1L:3L]) + cnd <- c(FALSE, TRUE) + + expect_error(sdtm.oak:::new_cnd_df(dat = df, cnd = cnd)) +}) + +test_that("is_cnd_df correctly identifies cnd_df class", { + df <- tibble(x = 1L:3L, y = letters[1L:3L]) + cnd_df <- sdtm.oak:::new_cnd_df(dat = df, cnd = c(FALSE, NA, TRUE)) + + expect_true(sdtm.oak:::is_cnd_df(cnd_df)) + expect_false(sdtm.oak:::is_cnd_df(df)) +}) + +test_that("get_cnd_df_cnd correctly extracts cnd attribute", { + df <- tibble(x = 1L:3L, y = letters[1L:3L]) + cnd <- c(FALSE, NA, TRUE) + cnd_df <- sdtm.oak:::new_cnd_df(dat = df, cnd = cnd) + + expect_identical(sdtm.oak:::get_cnd_df_cnd(cnd_df), cnd) + expect_null(sdtm.oak:::get_cnd_df_cnd(df)) +}) + +test_that("get_cnd_df_cnd_sum correctly extracts cnd_sum attribute", { + df <- tibble(x = 1L:3L, y = letters[1L:3L]) + cnd <- c(FALSE, NA, TRUE) + cnd_df <- sdtm.oak:::new_cnd_df(dat = df, cnd = cnd) + + expect_identical(sdtm.oak:::get_cnd_df_cnd_sum(cnd_df), c(n_true = 1L, n_false = 1L, n_na = 1L)) + expect_null(sdtm.oak:::get_cnd_df_cnd_sum(df)) +}) + +test_that("rm_cnd_df correctly removes cnd_df class and attributes", { + df <- tibble(x = 1L:3L, y = letters[1L:3L]) + cnd <- c(FALSE, NA, TRUE) + cnd_df <- sdtm.oak:::new_cnd_df(dat = df, cnd = cnd) + un_cnd_df <- sdtm.oak:::rm_cnd_df(cnd_df) + + expect_false(inherits(un_cnd_df, "cnd_df")) + expect_null(attr(un_cnd_df, "cnd")) + expect_null(attr(un_cnd_df, "cnd_sum")) +}) + +test_that("tbl_sum.cnd_df adds conditioning summary to tibble header", { + df <- tibble(x = 1L:3L, y = letters[1L:3L]) + cnd <- c(FALSE, NA, TRUE) + cnd_df <- sdtm.oak:::new_cnd_df(dat = df, cnd = cnd) + + sum_output <- tbl_sum(cnd_df) + expect_identical(sum_output["Cond. tbl"], c("Cond. tbl" = "1/1/1")) +}) + +test_that("ctl_new_rowid_pillar.cnd_df customizes row IDs with condition", { + df <- tibble(x = 1L:3L, y = letters[1L:3L]) + cnd <- c(FALSE, NA, TRUE) + cnd_df <- sdtm.oak:::new_cnd_df(dat = df, cnd = cnd) + + rowid_pillar <- ctl_new_rowid_pillar(controller = cnd_df, x = cnd_df, width = 10L) + + expect_s3_class(rowid_pillar, "pillar") + expect_identical(rowid_pillar$data[[1L]]$row_ids, c("1 F", "2 -", "3 T")) +}) diff --git a/tests/testthat/test-condition_add.R b/tests/testthat/test-condition_add.R new file mode 100644 index 00000000..387d745a --- /dev/null +++ b/tests/testthat/test-condition_add.R @@ -0,0 +1,86 @@ +test_that("condition_add tags records correctly with single condition", { + df <- tibble::tibble(x = 1L:3L, y = letters[1L:3L]) + + cnd_df <- condition_add(dat = df, x > 1L) + expect_true(is_cnd_df(cnd_df)) + expect_identical(get_cnd_df_cnd(cnd_df), c(FALSE, TRUE, TRUE)) + expect_identical(get_cnd_df_cnd_sum(cnd_df), c(n_true = 2L, n_false = 1L, n_na = 0L)) +}) + +test_that("condition_add tags records correctly with multiple conditions", { + df <- tibble::tibble(x = 1L:5L, y = c(1.1, 2.2, 3.3, 4.4, 5.5), z = factor(letters[1L:5L])) + + cnd_df <- condition_add(dat = df, x > 1L & y < 5.0) + cnd_df_multiple <- condition_add(dat = df, x > 1L, y < 5.0) + expect_true(is_cnd_df(cnd_df)) + expect_identical(get_cnd_df_cnd(cnd_df), c(FALSE, TRUE, TRUE, TRUE, FALSE)) + expect_identical(get_cnd_df_cnd_sum(cnd_df), c(n_true = 3L, n_false = 2L, n_na = 0L)) + + expect_identical(get_cnd_df_cnd(cnd_df_multiple), c(FALSE, TRUE, TRUE, TRUE, FALSE)) + expect_identical(get_cnd_df_cnd_sum(cnd_df_multiple), c(n_true = 3L, n_false = 2L, n_na = 0L)) +}) + +test_that("condition_add handles different data types correctly", { + df <- tibble::tibble(x = 1L:5L, y = c(1.1, 2.2, 3.3, 4.4, 5.5), z = letters[1L:5L], w = factor(letters[1L:5L])) + + cnd_df <- condition_add(dat = df, x > 2L & y < 5.0 & z %in% c("c", "d", "e") & w %in% c("c", "d", "e")) + expect_true(is_cnd_df(cnd_df)) + expect_identical(get_cnd_df_cnd(cnd_df), c(FALSE, FALSE, TRUE, TRUE, FALSE)) + expect_identical(get_cnd_df_cnd_sum(cnd_df), c(n_true = 2L, n_false = 3L, n_na = 0L)) +}) + +test_that("condition_add does not care about conditions' arguments being named", { + df <- tibble::tibble(x = 1L:5L, y = c(1.1, 2.2, 3.3, 4.4, 5.5)) + + cnd_df_named <- condition_add(dat = df, cond1 = x > 2L, cond2 = y < 5.0) + cnd_df_unnamed <- condition_add(dat = df, x > 2L, y < 5.0) + + expect_identical(cnd_df_named, cnd_df_unnamed) +}) + +test_that("condition_add handles empty data frames", { + df <- tibble::tibble(x = integer(0L), y = character(0L)) + + cnd_df <- condition_add(dat = df, x > 1L) + expect_true(is_cnd_df(cnd_df)) + expect_identical(nrow(cnd_df), 0L) + expect_identical(get_cnd_df_cnd(cnd_df), logical(0L)) + expect_identical(get_cnd_df_cnd_sum(cnd_df), c(n_true = 0L, n_false = 0L, n_na = 0L)) +}) + +test_that("condition_add gives warning if dat is already a conditioned data frame", { + df <- tibble::tibble(x = 1L:3L, y = letters[1L:3L]) + cnd_df <- new_cnd_df(dat = df, cnd = c(FALSE, NA, TRUE)) + + expect_warning(condition_add(dat = cnd_df, x > 1L), "The previous condition will be replaced by the new one.") +}) + +test_that("`condition_add()`: `dat` variables take precedence over variables in `.dat2`", { + df <- tibble::tibble(x = 1L:3L, y = letters[1L:3L]) + .dat2_env <- rlang::env(x = 2L) + .dat2_list <- list(x = 2L) + .dat2_df <- tibble::tibble(x = 2L) + + cnd_df_env <- condition_add(dat = df, x > 2L, .dat2 = .dat2_env) + cnd_df_list <- condition_add(dat = df, x > 2L, .dat2 = .dat2_list) + cnd_df_df <- condition_add(dat = df, x > 2L, .dat2 = .dat2_df) + + expect_identical(get_cnd_df_cnd(cnd_df_env), c(FALSE, FALSE, TRUE)) + expect_identical(get_cnd_df_cnd(cnd_df_list), c(FALSE, FALSE, TRUE)) + expect_identical(get_cnd_df_cnd(cnd_df_df), c(FALSE, FALSE, TRUE)) +}) + +test_that("condition_add handles .dat2 with additional variables", { + df <- tibble::tibble(x = 1L:3L, y = letters[1L:3L]) + .dat2_env <- rlang::env(z = 3L, w = 1L) + .dat2_list <- list(z = 3L, w = 1L) + .dat2_df <- tibble::tibble(z = 3L, w = 1L) + + cnd_df_env <- condition_add(dat = df, x > w & x < z, .dat2 = .dat2_env) + cnd_df_list <- condition_add(dat = df, x > w & x < z, .dat2 = .dat2_list) + cnd_df_df <- condition_add(dat = df, x > w & x < z, .dat2 = .dat2_df) + + expect_identical(get_cnd_df_cnd(cnd_df_env), c(FALSE, TRUE, FALSE)) + expect_identical(get_cnd_df_cnd(cnd_df_list), c(FALSE, TRUE, FALSE)) + expect_identical(get_cnd_df_cnd(cnd_df_df), c(FALSE, TRUE, FALSE)) +}) diff --git a/tests/testthat/test-eval_conditions.R b/tests/testthat/test-eval_conditions.R new file mode 100644 index 00000000..8cf0e4b2 --- /dev/null +++ b/tests/testthat/test-eval_conditions.R @@ -0,0 +1,51 @@ +test_that("`eval_conditions()` evaluates conditions correctly", { + df <- tibble::tibble( + x = c(1L, 2L, NA_integer_, 4L, 5L), + y = c(TRUE, FALSE, TRUE, FALSE, TRUE), + z = c("a", "b", "a", "b", "a") + ) + + # Tag records for which `x` is greater than 2. + expect_identical( + sdtm.oak:::eval_conditions(df, x > 2L), + c(FALSE, FALSE, NA, TRUE, TRUE) + ) + + # Tag records for which `x` is greater than 2 and `y` is TRUE. + expect_identical( + sdtm.oak:::eval_conditions(df, x > 2L, y), + c(FALSE, FALSE, NA, FALSE, TRUE) + ) + + # Tag records for which `x` is greater than 2 and convert resulting NAs into FALSE. + expect_identical( + sdtm.oak:::eval_conditions(df, x > 2L, .na = FALSE), + c(FALSE, FALSE, FALSE, TRUE, TRUE) + ) + + # Conditions may involve variables defined in the caller environment. + w <- 1L + expect_identical( + sdtm.oak:::eval_conditions(df, x > w), + c(FALSE, TRUE, NA, TRUE, TRUE) + ) + + # Conditions may look into variables defined in other scopes (e.g., in + # environments). + env <- rlang::env(w = 1L) + expect_identical( + sdtm.oak:::eval_conditions(df, x > w, .env = env), + c(FALSE, TRUE, NA, TRUE, TRUE) + ) + + # Other scopes are not restricted to environments but lists and tibbles also + # work as namespaces for look-up. + expect_identical( + sdtm.oak:::eval_conditions(df, x > w, .env = list(w = 3L)), + c(FALSE, FALSE, NA, TRUE, TRUE) + ) + expect_identical( + sdtm.oak:::eval_conditions(df, x > w, .env = tibble::tibble(w = 4L)), + c(FALSE, FALSE, NA, FALSE, TRUE) + ) +}) diff --git a/tests/testthat/test-hardcode.R b/tests/testthat/test-hardcode.R new file mode 100644 index 00000000..eb56cc7a --- /dev/null +++ b/tests/testthat/test-hardcode.R @@ -0,0 +1,81 @@ +# `aesos`: example raw data set. +aesos <- tibble::tribble( + ~oak_id, ~raw_source, ~patient_number, ~AESO, ~AESOSP, ~AESEV, ~AESER, ~AETERM, + 1L, "RS1", 101L, 0L, "Pain", "Mild", "No", "Headache", + 2L, "RS1", 102L, 0L, NA, "Severe", "Yes", "Dizziness", + 3L, "RS2", 103L, 1L, NA, "Moderate", "No", NA, + 4L, "RS2", 104L, 1L, NA, "Mild", "No", "Eye issues", + 5L, "RS3", 105L, 1L, "Nausea", "Severe", "Yes", "Food Poisoning" +) + +# `oe_inter`: example target data set. +oe_inter <- tibble::tribble( + ~oak_id, ~raw_source, ~patient_number, + 1L, "RS1", 101L, + 3L, "RS2", 103L, + 4L, "RS2", 104L, + 5L, "RS3", 105L +) + +test_that("hardcode_no_ct works as expected", { + aesos_cnd <- condition_add(aesos, AESO == 1L & !is.na(AESOSP)) + + result <- hardcode_no_ct( + raw_dat = aesos_cnd, + raw_var = "AESO", + tgt_var = "OEORRES", + tgt_val = "Y", + tgt_dat = oe_inter + ) + + expected_result <- tibble::tribble( + ~oak_id, ~raw_source, ~patient_number, ~OEORRES, + # NA because `aesos_cnd` is conditioned to be FALSE on this record. + 1L, "RS1", 101L, NA_character_, + # NA because `aesos_cnd` is conditioned to be FALSE on this record. + 3L, "RS2", 103L, NA_character_, + # NA because `aesos_cnd` is conditioned to be FALSE on this record. + 4L, "RS2", 104L, NA_character_, + # Successful derivation + 5L, "RS3", 105L, "Y" + ) + + expect_identical(result, expected_result) +}) + +test_that("hardcode_ct works as expected", { + aesos_cnd <- condition_add(aesos, AESO == 1L & is.na(AESOSP)) + ct_spec <- tibble::tibble( + codelist_code = "C117743", + term_code = "C178048", + CodedData = "HYPERMIA", + term_value = "HYPERMIA", + collected_value = "IOISYMPO", + term_synonyms = "IOISYMPO" + ) + + result <- + hardcode_ct( + raw_dat = aesos_cnd, + raw_var = "AETERM", + tgt_var = "OETESTCD", + tgt_val = "IOISYMPO", + ct_spec = ct_spec, + ct_clst = "C117743", + tgt_dat = oe_inter + ) + + expected_result <- tibble::tribble( + ~oak_id, ~raw_source, ~patient_number, ~OETESTCD, + # `NA` because `aesos_cnd` is conditioned to be FALSE for this record. + 1L, "RS1", 101L, NA_character_, + # `NA` because AETERM == NA for this record in `aesos_cnd`. + 3L, "RS2", 103L, NA_character_, + # Successful derivation: IOISYMPO -> HYPERMIA. + 4L, "RS2", 104L, "HYPERMIA", + # `NA` because `aesos_cnd` is conditioned to be FALSE for this record. + 5L, "RS3", 105L, NA_character_ + ) + + expect_identical(result, expected_result) +}) diff --git a/tests/testthat/test-mutate_cnd_df.R b/tests/testthat/test-mutate_cnd_df.R new file mode 100644 index 00000000..fc4488bd --- /dev/null +++ b/tests/testthat/test-mutate_cnd_df.R @@ -0,0 +1,103 @@ +test_that("mutate.cnd_df correctly mutates conditioned data frame", { + df <- tibble::tibble(x = 1L:3L, y = letters[1L:3L]) + cnd_df <- new_cnd_df(dat = df, cnd = c(FALSE, NA, TRUE)) + + mutated_df <- dplyr::mutate(cnd_df, z = x + 1L) + expect_true("z" %in% colnames(mutated_df)) + expect_identical(mutated_df$z, c(NA, NA, 4L)) +}) + +test_that("mutate.cnd_df handles multiple mutations", { + df <- tibble::tibble(x = 1L:3L, y = 1L:3L) + cnd_df <- new_cnd_df(dat = df, cnd = c(TRUE, FALSE, TRUE)) + + mutated_df <- dplyr::mutate(cnd_df, z = x + y, w = x * y) + expect_true(all(c("z", "w") %in% colnames(mutated_df))) + expect_identical(mutated_df$z, c(2L, NA, 6L)) + expect_identical(mutated_df$w, c(1L, NA, 9L)) +}) + +test_that("mutate.cnd_df retains original data for non-conditioned rows", { + df <- tibble::tibble(x = 1L:4L, y = 2L:5L) + cnd_df <- new_cnd_df(dat = df, cnd = c(TRUE, FALSE, TRUE, NA)) + + mutated_df <- dplyr::mutate(cnd_df, z = x + y) + expect_identical(mutated_df$z, c(3L, NA, 7L, NA)) + expect_identical(mutated_df$x, df$x) + expect_identical(mutated_df$y, df$y) +}) + +test_that("mutate.cnd_df works with different data types", { + df <- tibble::tibble(x = 1L:3L, y = c(1.1, 2.2, 3.3), z = c("a", "b", "c")) + cnd_df <- new_cnd_df(dat = df, cnd = c(TRUE, FALSE, TRUE)) + + mutated_df <- dplyr::mutate(cnd_df, w = x * y, v = paste0(z, x)) + expect_true(all(c("w", "v") %in% colnames(mutated_df))) + expect_equal(mutated_df$w, c(1.1, NA, 9.9), tolerance = 0.0001) + expect_identical(mutated_df$v, c("a1", NA, "c3")) +}) + +test_that("mutate.cnd_df handles empty data frames", { + df <- tibble::tibble(x = integer(0L), y = integer(0L)) + cnd_df <- new_cnd_df(dat = df, cnd = logical(0L)) + + mutated_df <- dplyr::mutate(cnd_df, z = x + y) + expect_identical(nrow(mutated_df), 0L) + expect_true("z" %in% colnames(mutated_df)) + expect_identical(mutated_df$z, integer(0L)) +}) + +test_that("mutate.cnd_df handles .keep parameter correctly", { + df <- tibble::tibble(x = 1L:3L, y = 1L:3L) + cnd_df <- new_cnd_df(dat = df, cnd = c(TRUE, FALSE, TRUE)) + + mutated_df_all <- dplyr::mutate(cnd_df, z = x + y, .keep = "all") + expect_true(all(c("x", "y", "z") %in% colnames(mutated_df_all))) + + mutated_df_used <- dplyr::mutate(cnd_df, z = 2L * x, .keep = "used") + expect_true(all(c("x", "z") %in% colnames(mutated_df_used))) + + mutated_df_unused <- dplyr::mutate(cnd_df, z = 2L * x, .keep = "unused") + expect_true(all(c("y", "z") %in% colnames(mutated_df_unused))) + + mutated_df_none <- dplyr::mutate(cnd_df, z = x + y, .keep = "none") + expect_identical(colnames(mutated_df_none), "z") + expect_false(any(c("x", "y") %in% colnames(mutated_df_none))) +}) + +test_that("mutate.cnd_df handles .after parameter correctly", { + df <- tibble::tibble(x = 1L:3L, y = 1L:3L) + cnd_df <- new_cnd_df(dat = df, cnd = c(TRUE, FALSE, TRUE)) + + mutated_df_after <- dplyr::mutate(cnd_df, z = x + y, .after = "x") + expect_identical(colnames(mutated_df_after), c("x", "z", "y")) +}) + +test_that("mutate.cnd_df works with named arguments", { + df <- tibble::tibble(x = 1L:3L, y = 1L:3L) + cnd_df <- new_cnd_df(dat = df, cnd = c(TRUE, FALSE, TRUE)) + + mutated_df_named <- dplyr::mutate(cnd_df, new_col = x + y) + expect_true("new_col" %in% colnames(mutated_df_named)) + expect_identical(mutated_df_named$new_col, c(2L, NA, 6L)) +}) + +test_that("mutate.cnd_df errors when .by is used", { + df <- tibble::tibble(x = 1L:3L, y = 1L:3L) + cnd_df <- new_cnd_df(dat = df, cnd = c(TRUE, FALSE, TRUE)) + + expect_error( + dplyr::mutate(cnd_df, z = x + y, .by = "y"), + regex = "`\\.by` is not supported on conditioned data frames." + ) +}) + +test_that("mutate.cnd_df errors when .before is used", { + df <- tibble::tibble(x = 1L:3L, y = 1L:3L) + cnd_df <- new_cnd_df(dat = df, cnd = c(TRUE, FALSE, TRUE)) + + expect_error( + dplyr::mutate(cnd_df, z = x + y, .before = "x"), + regex = "`\\.before` is not supported on conditioned data frames, use `\\.after` instead." + ) +}) diff --git a/tests/testthat/test-pipe.R b/tests/testthat/test-pipe.R new file mode 100644 index 00000000..70beaa8b --- /dev/null +++ b/tests/testthat/test-pipe.R @@ -0,0 +1,130 @@ +`%>%` <- magrittr::`%>%` + +test_that("Basic operations work correctly", { + expect_identical(5L %.>% (2L + .), 5L %>% + { + 2L + . + }) + expect_identical("hello" %.>% toupper(.), "hello" %>% + { + toupper(.) + }) +}) + +test_that("Nested operations work correctly", { + expect_identical(5L %.>% (2L + . + 3L), 5L %>% + { + 2L + . + 3L + }) + expect_identical("hello" %.>% paste(., "world"), "hello" %>% + { + paste(., "world") + }) +}) + +test_that("Piping with braces", { + mtcars2 <- mtcars %.>% { + .$cyl <- .$cyl * 2L + . + } + expect_identical(mtcars2$cyl, mtcars$cyl * 2L) +}) + +test_that("Dot used multiple times in rhs", { + expect_identical(5L %.>% (. * 2L + .), 5L %>% + { + . * 2L + . + }) + expect_identical("hello" %.>% paste(., toupper(.)), "hello" %>% + { + paste(., toupper(.)) + }) +}) + +test_that("Dot used in nested functions", { + expect_identical(mtcars %.>% subset(., seq_len(nrow(.)) %% 2L == 0L), mtcars %>% + { + subset(., seq_len(nrow(.)) %% 2L == 0L) + }) + expect_identical(1L:10L %.>% c(min(.), max(.)), 1L:10L %>% + { + c(min(.), max(.)) + }) +}) + +test_that("Error when dot is not used in rhs", { + expect_error(5L %.>% (2L + 2L)) + expect_error("hello" %.>% toupper) +}) + +test_that("Complex expressions work correctly", { + expect_identical(5L %.>% (2L + . + 3L + . * 2L), 5L %>% + { + 2L + . + 3L + . * 2L + }) + expect_identical(mtcars %.>% subset(., gear == 4L & mpg > mean(mpg)), mtcars %>% + { + subset(., gear == 4L & mpg > mean(mpg)) + }) + expect_identical(mtcars %.>% subset(., cyl == 6L) %.>% nrow(.), mtcars %>% + { + subset(., cyl == 6L) + } %>% + nrow()) +}) + +test_that("Functions returning functions", { + expect_identical(1L:5L %.>% (sapply(., function(x) x * 2L)), 1L:5L %>% + { + sapply(., function(x) x * 2L) + }) + expect_identical(mtcars %.>% (apply(., 2L, function(x) mean(x))), mtcars %>% + { + apply(., 2L, function(x) mean(x)) + }) +}) + +test_that("Dot used in custom functions", { + custom_function <- function(x) { + x + 1L + } + expect_identical(5L %.>% custom_function(.), 5L %>% + { + custom_function(.) + }) + expect_identical(mtcars %.>% head(.), mtcars %>% + { + head(.) + }) +}) + +test_that("Anonymous functions with \\(x)", { + expect_identical(1L:5L %.>% (purrr::map(., \(x) x * 2L)), 1L:5L %>% + { + purrr::map(., \(x) x * 2L) + }) +}) + +test_that("Anonymous functions with function(x)", { + expect_identical(1L:5L %.>% (purrr::map(., function(x) x * 2L)), 1L:5L %>% + { + purrr::map(., function(x) x * 2L) + }) +}) + +test_that("Piping with environment-dependent functions", { + env <- environment() + "x" %.>% assign(x = ., 100L, envir = env) + expect_identical(x, 100L) +}) + +test_that("`.` is restored", { + 1L %.>% identity(.) + expect_error(., "not found") + + . <- "foo" + 1L %.>% identity(.) + expect_identical(., "foo") +}) + +# TODO: Support for lazy-evaluation diff --git a/vignettes/articles/cnd_df.Rmd b/vignettes/articles/cnd_df.Rmd new file mode 100644 index 00000000..5f8ebdd5 --- /dev/null +++ b/vignettes/articles/cnd_df.Rmd @@ -0,0 +1,109 @@ +--- +title: "Conditioned Data Frames" +--- + +```{r setup, include=FALSE} +knitr::opts_chunk$set(echo = TRUE) +library(sdtm.oak) +library(tibble) +``` + +## Introduction + +Conditioned data frames, or `cnd_df`, are a powerful tool in the `{sdtm.oak}` +package designed to facilitate conditional transformations on data frames. This +article explains how to create and use conditioned data frames, particularly in +the context of SDTM domain derivations. + +## Creating Conditioned Data Frames + +A conditioned data frame is a regular data frame extended with a logical vector +`cnd` that marks rows for subsequent conditional transformations. The +`condition_add()` function is used to create these conditioned data frames. + +### Simple Example + +Consider a simple data frame `df`: + +```{r} +(df <- tibble(x = 1L:3L, y = letters[1L:3L])) +``` + +We can create a conditioned data frame where only rows where `x > 1` are marked: + +```{r} +(cnd_df <- condition_add(dat = df, x > 1L)) +``` + +Here, only the second and third rows are marked as `TRUE`. + +## Usage in SDTM Domain Derivations + +The real power of conditioned data frames manifests when they are used with +functions such as `assign_no_ct`, `assign_ct`, `hardcode_no_ct`, and +`hardcode_ct`. These functions perform derivations only for the records that +match the pattern of `TRUE` values in conditioned data frames. + +### Example with Concomitant Medications (CM) Domain + +Consider a simplified dataset of concomitant medications, where we want to +derive a new variable CMGRPID (Concomitant Medication Group ID) based on the +condition that the medication treatment (CMTRT) is `"BENADRYL"`. + +Here is a simplified raw Concomitant Medications data set (`cm_raw`): + +```{r} +cm_raw <- tibble::tibble( + oak_id = seq_len(14L), + raw_source = "ConMed", + patient_number = c(375L, 375L, 376L, 377L, 377L, 377L, 377L, 378L, 378L, 378L, 378L, 379L, 379L, 379L), + MDNUM = c(1L, 2L, 1L, 1L, 2L, 3L, 5L, 4L, 1L, 2L, 3L, 1L, 2L, 3L), + MDRAW = c( + "BABY ASPIRIN", "CORTISPORIN", "ASPIRIN", + "DIPHENHYDRAMINE HCL", "PARCETEMOL", "VOMIKIND", + "ZENFLOX OZ", "AMITRYPTYLINE", "BENADRYL", + "DIPHENHYDRAMINE HYDROCHLORIDE", "TETRACYCLINE", + "BENADRYL", "SOMINEX", "ZQUILL" + ) +) +cm_raw +``` + +To derive the `CMTRT` variable we use the `assign_no_ct()` function to map the +`MDRAW` variable to the `CMTRT` variable: + +```{r} +tgt_dat <- assign_no_ct( + tgt_var = "CMTRT", + raw_dat = cm_raw, + raw_var = "MDRAW" +) +tgt_dat +``` + +Then we create a conditioned data frame from the target data set (`tgt_dat`), +meaning we create a conditioned data frame where only rows with `CMTRT` equal to +`"BENADRYL"` are marked: + +```{r} +(cnd_tgt_dat <- condition_add(tgt_dat, CMTRT == "BENADRYL")) +``` + +Finally, we derive the `CMGRPID` variable conditionally. Using `assign_no_ct()`, +we derive `CMGRPID` which indicates the group ID for the medication, +based on the conditioned target data set: + +```{r} +derived_tgt_dat <- assign_no_ct( + tgt_dat = cnd_tgt_dat, + tgt_var = "CMGRPID", + raw_dat = cm_raw, + raw_var = "MDNUM" +) +derived_tgt_dat +``` + +Conditioned data frames in the `{sdtm.oak}` package provide a flexible way to +perform conditional transformations on data sets. By marking specific rows for +transformation, users can efficiently derive SDTM variables, ensuring that only +relevant records are processed.