-
Notifications
You must be signed in to change notification settings - Fork 13
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Implementation of
hardcode_no_ct()
, hardcode_ct()
, `assign_no_ct(…
…)` and `assign_ct()` (#41) * First mockup of `hardcode_no_ct()` * Update `hardcode_no_ct()` Update `hardcode_no_ct()` by allowing the rewriting of the `target_sdtm_variable` variable to preserve `NA` * Align `hardcode_no_ct()` code style with Ramm's expectations * Add `hardcode_*()` and `assign_*()` functions * hardcode_no_ct algorithm code changes (#45) * hardcode_no_ct algorithm code changes * harcode_ct working as expected * assign_ct and assign_no_ct works great. * address review comments * Add `oak_id_vars()` * Fix typo in `recode()` * Simplify `oak_id_vars()` docs * Update `assign_*` and `hardcode_*` implementations * Introduce memoisation of `ct_mappings()` * Update of README introductory paragraph * Update hardcode_* functions' interface * Add `contains_oak_id_vars()` function * Update `contains_oak_id_vars()` doc examples * Update `sdtm_harcode()` and dependant functions * Update `assign_*` and `hardcore_*` related functions * Automatic renv profile update. * Automatic renv profile update. * Make `ct` and `cl` parameters mandatory for `assign_ct()` * Add functions ct importing - Adds three new user facing ct-related functions: `read_ct_example()`, `ct_example()` and `read_ct()` - Provides a ct example file in inst/ct/ * Bring `hardcode*()` and `assign*()` related assertions closer to user calling functions * Add lagging behind Rd for `ct_example()` * Add `assert_ct()` * Add ct assertions * Remove R/.gitkeep As it is no longer needed. * Add unit tests for `ct_vars()` * Update dependencies * Export `ct_vars()` Export `ct_vars()` such that we can cross-reference it from other functions' documentation. * Update `assert_ct()` docs * Clarify `assign_ct()`/`assign_no_ct()` doc * Improve grammar in doc * Remove last empty line from ct example file * Add documentation to `sdtm_assign()` and ct-related unit tests Although we had discussed to keep assertions only at the user facing functions, I am getting the feeling we would miss assertions also at the internal function... because of several reasons: firstly, the internal function is more flexible having more optional parameters, which requires extra assertion logic, and also because eventually we will be checking code coverage and we will regret not having done this now. * Update hardcode-related fns * Changes to meet linter issues * Code reformatting * Code reflow * Improve `assert_cl()` docs * Update `read_ct()` docs * Automatic renv profile update. * Automatic renv profile update. * Add units tests for `recode()` * Remove `are_to_recode()` function Ended up not using this function. * Add units tests for `assert_ct()` * Add one more test for `assert_ct()` * Add a basic unit test for `ct_mappings()` * Fill in some doc details of ct-related functions * Remove leftover doc text in `assign` * Update website's reference * Styling update * Bump version and update NEWS * Fix a few lintr issues * Add examples to `ct_map()` doc * Fix typo in `problems()` doc * Fix typo * Remove lint issues * Replace `.data` usage in tidyselect expressions See tidyverse/tidyverse.org#600 for more details. * Variable renaming - `ct` to `ct_spec` (ct specification) - `cl` to `ct_cltc` (codelist code) * Finish pending renaming of variables * Rename code-list to codelist * Fix style * Fix style * Update `ct_map()` doc example * Make tibbles more readable in doc examples * Rename `ct_cltc` to `ct_clst` As per @rammprasad's suggestion. --------- Co-authored-by: Ram Ganapathy <[email protected]> Co-authored-by: ramiromagno <[email protected]>
- Loading branch information
1 parent
45863c3
commit 5fc61af
Showing
42 changed files
with
2,807 additions
and
12 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,3 +14,4 @@ | |
^data-raw$ | ||
^staged_dependencies.yaml$ | ||
^vignettes/articles$ | ||
^inst/ct/README.md$ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,7 @@ | ||
Package: sdtm.oak | ||
Type: Package | ||
Title: SDTM Data Transformation Engine | ||
Version: 0.0.0.9001 | ||
Version: 0.0.0.9002 | ||
Authors@R: c( | ||
person("Rammprasad", "Ganapathy", role = c("aut", "cre"), | ||
email = "[email protected]"), | ||
|
@@ -38,11 +38,15 @@ Depends: R (>= 4.2) | |
Imports: | ||
admiraldev, | ||
dplyr (>= 1.0.0), | ||
memoise, | ||
assertthat, | ||
purrr (>= 0.3.3), | ||
rlang (>= 0.4.4), | ||
stringr (>= 1.4.0), | ||
tibble | ||
tibble, | ||
vctrs, | ||
readr, | ||
glue | ||
Suggests: | ||
knitr, | ||
rmarkdown, | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,22 @@ | ||
# Generated by roxygen2: do not edit by hand | ||
|
||
S3method(print,iso8601) | ||
export(assign_ct) | ||
export(assign_no_ct) | ||
export(clear_cache) | ||
export(create_iso8601) | ||
export(ct_map) | ||
export(ct_spec_example) | ||
export(ct_spec_vars) | ||
export(derive_study_day) | ||
export(fmt_cmp) | ||
export(hardcode_ct) | ||
export(hardcode_no_ct) | ||
export(problems) | ||
export(read_ct_spec) | ||
export(read_ct_spec_example) | ||
importFrom(rlang,"%||%") | ||
importFrom(rlang,":=") | ||
importFrom(rlang,.data) | ||
importFrom(stats,na.omit) | ||
importFrom(tibble,tibble) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,243 @@ | ||
#' Derive an SDTM variable | ||
#' | ||
#' @description | ||
#' [sdtm_assign()] is an internal function packing the same functionality as | ||
#' [assign_no_ct()] and [assign_ct()] together but aimed at developers only. | ||
#' As a user please use either [assign_no_ct()] or [assign_ct()]. | ||
#' | ||
#' @param raw_dat The raw dataset (dataframe); must include the | ||
#' variables passed in `id_vars` and `raw_var`. | ||
#' @param raw_var The raw variable: a single string indicating the name of the | ||
#' raw variable in `raw_dat`. | ||
#' @param tgt_var The target SDTM variable: a single string indicating the name | ||
#' of variable to be derived. | ||
#' @param ct_spec Study controlled terminology specification: a dataframe with a | ||
#' minimal set of columns, see [ct_spec_vars()] for details. This parameter is | ||
#' optional, if left as `NULL` no controlled terminology recoding is applied. | ||
#' @param ct_clst A codelist code indicating which subset of the controlled | ||
#' terminology to apply in the derivation. This parameter is optional, if left | ||
#' as `NULL`, all possible recodings in `ct_spec` are attempted. | ||
#' @param tgt_dat Target dataset: a data frame to be merged against `raw_dat` by | ||
#' the variables indicated in `id_vars`. This parameter is optional, see | ||
#' section Value for how the output changes depending on this argument value. | ||
#' @param id_vars Key variables to be used in the join between the raw dataset | ||
#' (`raw_dat`) and the target data set (`raw_dat`). | ||
#' | ||
#' @returns The returned data set depends on the value of `tgt_dat`: | ||
#' - If no target dataset is supplied, meaning that `tgt_dat` defaults to | ||
#' `NULL`, then the returned data set is `raw_dat`, selected for the variables | ||
#' indicated in `id_vars`, and a new extra column: the derived variable, as | ||
#' indicated in `tgt_var`. | ||
#' - If the target dataset is provided, then it is merged with the raw data set | ||
#' `raw_dat` by the variables indicated in `id_vars`, with a new column: the | ||
#' derived variable, as indicated in `tgt_var`. | ||
#' | ||
#' | ||
#' @importFrom rlang := | ||
#' @keywords internal | ||
sdtm_assign <- function(raw_dat, | ||
raw_var, | ||
tgt_var, | ||
ct_spec = NULL, | ||
ct_clst = NULL, | ||
tgt_dat = NULL, | ||
id_vars = oak_id_vars()) { | ||
admiraldev::assert_character_scalar(raw_var) | ||
admiraldev::assert_character_scalar(tgt_var) | ||
admiraldev::assert_character_vector(id_vars) | ||
assertthat::assert_that(contains_oak_id_vars(id_vars), | ||
msg = "`id_vars` must include the oak id vars." | ||
) | ||
admiraldev::assert_data_frame(raw_dat, required_vars = rlang::syms(c(id_vars, raw_var))) | ||
admiraldev::assert_data_frame(tgt_dat, required_vars = rlang::syms(id_vars), optional = TRUE) | ||
assert_ct_spec(ct_spec, optional = TRUE) | ||
assert_ct_clst(ct_spec = ct_spec, ct_clst = ct_clst, optional = TRUE) | ||
|
||
# Recode the raw variable following terminology. | ||
tgt_val <- ct_map(raw_dat[[raw_var]], ct_spec = ct_spec, ct_clst = ct_clst) | ||
|
||
# Apply derivation by assigning `raw_var` to `tgt_var`. | ||
# `der_dat`: derived dataset. | ||
der_dat <- | ||
raw_dat |> | ||
dplyr::select(c(id_vars, raw_var)) |> | ||
dplyr::mutate("{tgt_var}" := tgt_val) |> # nolint object_name_linter() | ||
dplyr::select(-rlang::sym(raw_var)) | ||
|
||
# If a target dataset is supplied, then join the so far derived dataset with | ||
# the target dataset (`tgt_dat`), otherwise leave it be. | ||
der_dat <- | ||
if (!is.null(tgt_dat)) { | ||
der_dat |> | ||
dplyr::right_join(y = tgt_dat, by = id_vars) |> | ||
dplyr::relocate(tgt_var, .after = dplyr::last_col()) | ||
} else { | ||
der_dat | ||
} | ||
|
||
der_dat | ||
} | ||
|
||
#' Derive an SDTM variable | ||
#' | ||
#' @description | ||
#' - [assign_no_ct()] maps a variable in a raw dataset to a target SDTM | ||
#' variable that has no terminology restrictions. | ||
#' | ||
#' - [assign_ct()] maps a variable in a raw dataset to a target SDTM variable | ||
#' following controlled terminology recoding. | ||
#' | ||
#' @param raw_dat The raw dataset (dataframe); must include the | ||
#' variables passed in `id_vars` and `raw_var`. | ||
#' @param raw_var The raw variable: a single string indicating the name of the | ||
#' raw variable in `raw_dat`. | ||
#' @param tgt_var The target SDTM variable: a single string indicating the name | ||
#' of variable to be derived. | ||
#' @param ct_spec Study controlled terminology specification: a dataframe with a | ||
#' minimal set of columns, see [ct_spec_vars()] for details. | ||
#' @param ct_clst A codelist code indicating which subset of the controlled | ||
#' terminology to apply in the derivation. | ||
#' @param tgt_dat Target dataset: a data frame to be merged against `raw_dat` by | ||
#' the variables indicated in `id_vars`. This parameter is optional, see | ||
#' section Value for how the output changes depending on this argument value. | ||
#' @param id_vars Key variables to be used in the join between the raw dataset | ||
#' (`raw_dat`) and the target data set (`raw_dat`). | ||
#' | ||
#' @returns The returned data set depends on the value of `tgt_dat`: | ||
#' - If no target dataset is supplied, meaning that `tgt_dat` defaults to | ||
#' `NULL`, then the returned data set is `raw_dat`, selected for the variables | ||
#' indicated in `id_vars`, and a new extra column: the derived variable, as | ||
#' indicated in `tgt_var`. | ||
#' - If the target dataset is provided, then it is merged with the raw data set | ||
#' `raw_dat` by the variables indicated in `id_vars`, with a new column: the | ||
#' derived variable, as indicated in `tgt_var`. | ||
#' | ||
#' @examples | ||
#' | ||
#' md1 <- | ||
#' tibble::tibble( | ||
#' oak_id = 1:14, | ||
#' raw_source = "MD1", | ||
#' patient_number = 101:114, | ||
#' MDIND = c( | ||
#' "NAUSEA", "NAUSEA", "ANEMIA", "NAUSEA", "PYREXIA", | ||
#' "VOMITINGS", "DIARHHEA", "COLD", | ||
#' "FEVER", "LEG PAIN", "FEVER", "COLD", "COLD", "PAIN" | ||
#' ) | ||
#' ) | ||
#' | ||
#' assign_no_ct( | ||
#' raw_dat = md1, | ||
#' raw_var = "MDIND", | ||
#' tgt_var = "CMINDC", | ||
#' ) | ||
#' | ||
#' cm_inter <- | ||
#' tibble::tibble( | ||
#' oak_id = 1:14, | ||
#' raw_source = "MD1", | ||
#' patient_number = 101:114, | ||
#' CMTRT = c( | ||
#' "BABY ASPIRIN", | ||
#' "CORTISPORIN", | ||
#' "ASPIRIN", | ||
#' "DIPHENHYDRAMINE HCL", | ||
#' "PARCETEMOL", | ||
#' "VOMIKIND", | ||
#' "ZENFLOX OZ", | ||
#' "AMITRYPTYLINE", | ||
#' "BENADRYL", | ||
#' "DIPHENHYDRAMINE HYDROCHLORIDE", | ||
#' "TETRACYCLINE", | ||
#' "BENADRYL", | ||
#' "SOMINEX", | ||
#' "ZQUILL" | ||
#' ), | ||
#' CMROUTE = c( | ||
#' "ORAL", | ||
#' "ORAL", | ||
#' NA, | ||
#' "ORAL", | ||
#' "ORAL", | ||
#' "ORAL", | ||
#' "INTRAMUSCULAR", | ||
#' "INTRA-ARTERIAL", | ||
#' NA, | ||
#' "NON-STANDARD", | ||
#' "RANDOM_VALUE", | ||
#' "INTRA-ARTICULAR", | ||
#' "TRANSDERMAL", | ||
#' "OPHTHALMIC" | ||
#' ) | ||
#' ) | ||
#' | ||
#' # Controlled terminology specification | ||
#' (ct_spec <- read_ct_spec_example("ct-01-cm")) | ||
#' | ||
#' assign_ct( | ||
#' raw_dat = md1, | ||
#' raw_var = "MDIND", | ||
#' tgt_var = "CMINDC", | ||
#' ct_spec = ct_spec, | ||
#' ct_clst = "C66729", | ||
#' tgt_dat = cm_inter | ||
#' ) | ||
#' | ||
#' @name assign | ||
NULL | ||
|
||
#' @order 1 | ||
#' @export | ||
#' @rdname assign | ||
assign_no_ct <- function(raw_dat, | ||
raw_var, | ||
tgt_var, | ||
tgt_dat = NULL, | ||
id_vars = oak_id_vars()) { | ||
admiraldev::assert_character_scalar(raw_var) | ||
admiraldev::assert_character_scalar(tgt_var) | ||
admiraldev::assert_character_vector(id_vars) | ||
assertthat::assert_that(contains_oak_id_vars(id_vars), | ||
msg = "`id_vars` must include the oak id vars." | ||
) | ||
admiraldev::assert_data_frame(raw_dat, required_vars = rlang::syms(c(id_vars, raw_var))) | ||
admiraldev::assert_data_frame(tgt_dat, required_vars = rlang::syms(id_vars), optional = TRUE) | ||
|
||
sdtm_assign( | ||
raw_dat = raw_dat, | ||
raw_var = raw_var, | ||
tgt_var = tgt_var, | ||
tgt_dat = tgt_dat, | ||
id_vars = id_vars | ||
) | ||
} | ||
|
||
#' @order 2 | ||
#' @export | ||
#' @rdname assign | ||
assign_ct <- function(raw_dat, | ||
raw_var, | ||
tgt_var, | ||
ct_spec, | ||
ct_clst, | ||
tgt_dat = NULL, | ||
id_vars = oak_id_vars()) { | ||
admiraldev::assert_character_scalar(raw_var) | ||
admiraldev::assert_character_scalar(tgt_var) | ||
admiraldev::assert_character_vector(id_vars) | ||
assertthat::assert_that(contains_oak_id_vars(id_vars), | ||
msg = "`id_vars` must include the oak id vars." | ||
) | ||
admiraldev::assert_data_frame(raw_dat, required_vars = rlang::syms(c(id_vars, raw_var))) | ||
admiraldev::assert_data_frame(tgt_dat, required_vars = rlang::syms(id_vars), optional = TRUE) | ||
|
||
sdtm_assign( | ||
raw_dat = raw_dat, | ||
raw_var = raw_var, | ||
tgt_var = tgt_var, | ||
tgt_dat = tgt_dat, | ||
id_vars = id_vars, | ||
ct_spec = ct_spec, | ||
ct_clst = ct_clst | ||
) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
#' Clear `{sdtm.oak}` cache of memoised functions | ||
#' | ||
#' @description | ||
#' Some of `{sdtm.oak}` functions have their results cached for runtime | ||
#' efficiency. Use this function to reset the cache. | ||
#' | ||
#' Memoised functions: | ||
#' - [ct_mappings()] | ||
#' | ||
#' @return Returns a logical value, indicating whether the resetting of the | ||
#' cache was successful (`TRUE`) or not (`FALSE`). | ||
#' | ||
#' @examples | ||
#' clear_cache() | ||
#' | ||
#' @export | ||
clear_cache <- function() { | ||
memoise::forget(ct_mappings) | ||
} |
Oops, something went wrong.