Skip to content

Commit

Permalink
Implementation of hardcode_no_ct(), hardcode_ct(), `assign_no_ct(…
Browse files Browse the repository at this point in the history
…)` and `assign_ct()` (#41)

* First mockup of `hardcode_no_ct()`

* Update `hardcode_no_ct()`

Update `hardcode_no_ct()` by allowing the rewriting of the `target_sdtm_variable` variable to preserve `NA`

* Align `hardcode_no_ct()` code style with Ramm's expectations

* Add `hardcode_*()` and `assign_*()` functions

* hardcode_no_ct algorithm code changes (#45)

* hardcode_no_ct algorithm code changes

* harcode_ct working as expected

* assign_ct and assign_no_ct works great.

* address review comments

* Add `oak_id_vars()`

* Fix typo in `recode()`

* Simplify `oak_id_vars()` docs

* Update `assign_*` and `hardcode_*` implementations

* Introduce memoisation of `ct_mappings()`

* Update of README introductory paragraph

* Update hardcode_* functions' interface

* Add `contains_oak_id_vars()` function

* Update `contains_oak_id_vars()` doc examples

* Update `sdtm_harcode()` and dependant functions

* Update `assign_*` and `hardcore_*` related functions

* Automatic renv profile update.

* Automatic renv profile update.

* Make `ct` and `cl` parameters mandatory for `assign_ct()`

* Add functions ct importing

- Adds three new user facing ct-related functions: `read_ct_example()`, `ct_example()` and `read_ct()`
- Provides a ct example file in inst/ct/

* Bring `hardcode*()` and `assign*()` related assertions closer to user calling functions

* Add lagging behind Rd for `ct_example()`

* Add `assert_ct()`

* Add ct assertions

* Remove R/.gitkeep

As it is no longer needed.

* Add unit tests for `ct_vars()`

* Update dependencies

* Export `ct_vars()`

Export `ct_vars()` such that we can cross-reference it from other functions' documentation.

* Update `assert_ct()` docs

* Clarify `assign_ct()`/`assign_no_ct()` doc

* Improve grammar in doc

* Remove last empty line from ct example file

* Add documentation to `sdtm_assign()` and ct-related unit tests

Although we had discussed to keep assertions only at the user facing functions, I am getting the feeling we would miss assertions also at the internal function... because of several reasons: firstly, the internal function is more flexible having more optional parameters, which requires extra assertion logic, and also because eventually we will be checking code coverage and we will regret not having done this now.

* Update hardcode-related fns

* Changes to meet linter issues

* Code reformatting

* Code reflow

* Improve `assert_cl()` docs

* Update `read_ct()` docs

* Automatic renv profile update.

* Automatic renv profile update.

* Add units tests for `recode()`

* Remove `are_to_recode()` function

Ended up not using this function.

* Add units tests for `assert_ct()`

* Add one more test for `assert_ct()`

* Add a basic unit test for `ct_mappings()`

* Fill in some doc details of ct-related functions

* Remove leftover doc text in `assign`

* Update website's reference

* Styling update

* Bump version and update NEWS

* Fix a few lintr issues

* Add examples to `ct_map()` doc

* Fix typo in `problems()` doc

* Fix typo

* Remove lint issues

* Replace `.data` usage in tidyselect expressions

See tidyverse/tidyverse.org#600 for more details.

* Variable renaming

- `ct` to `ct_spec` (ct specification)
- `cl` to `ct_cltc` (codelist code)

* Finish pending renaming of variables

* Rename code-list to codelist

* Fix style

* Fix style

* Update `ct_map()` doc example

* Make tibbles more readable in doc examples

* Rename `ct_cltc` to `ct_clst`

As per @rammprasad's suggestion.

---------

Co-authored-by: Ram Ganapathy <[email protected]>
Co-authored-by: ramiromagno <[email protected]>
  • Loading branch information
3 people authored Apr 10, 2024
1 parent 45863c3 commit 5fc61af
Show file tree
Hide file tree
Showing 42 changed files with 2,807 additions and 12 deletions.
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,4 @@
^data-raw$
^staged_dependencies.yaml$
^vignettes/articles$
^inst/ct/README.md$
8 changes: 6 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: sdtm.oak
Type: Package
Title: SDTM Data Transformation Engine
Version: 0.0.0.9001
Version: 0.0.0.9002
Authors@R: c(
person("Rammprasad", "Ganapathy", role = c("aut", "cre"),
email = "[email protected]"),
Expand Down Expand Up @@ -38,11 +38,15 @@ Depends: R (>= 4.2)
Imports:
admiraldev,
dplyr (>= 1.0.0),
memoise,
assertthat,
purrr (>= 0.3.3),
rlang (>= 0.4.4),
stringr (>= 1.4.0),
tibble
tibble,
vctrs,
readr,
glue
Suggests:
knitr,
rmarkdown,
Expand Down
13 changes: 13 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,9 +1,22 @@
# Generated by roxygen2: do not edit by hand

S3method(print,iso8601)
export(assign_ct)
export(assign_no_ct)
export(clear_cache)
export(create_iso8601)
export(ct_map)
export(ct_spec_example)
export(ct_spec_vars)
export(derive_study_day)
export(fmt_cmp)
export(hardcode_ct)
export(hardcode_no_ct)
export(problems)
export(read_ct_spec)
export(read_ct_spec_example)
importFrom(rlang,"%||%")
importFrom(rlang,":=")
importFrom(rlang,.data)
importFrom(stats,na.omit)
importFrom(tibble,tibble)
12 changes: 12 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,15 @@
# sdtm.oak 0.0.0.9002 (development version)

## New Features

* New function: `derive_study_day()` for study day calculation.

* New functions for basic SDTM derivations: ` assign_no_ct()`, `assign_ct()`,
`hardcode_no_ct()` and `hardcode_ct()`.

* New functions for handling controlled terminologies: `read_ct_spec()`,
`read_ct_spec_example()`, `ct_spec_example()` and `ct_map()`.

# sdtm.oak 0.0.0.9001 (development version)

## New Features
Expand Down
Empty file removed R/.gitkeep
Empty file.
243 changes: 243 additions & 0 deletions R/assign.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,243 @@
#' Derive an SDTM variable
#'
#' @description
#' [sdtm_assign()] is an internal function packing the same functionality as
#' [assign_no_ct()] and [assign_ct()] together but aimed at developers only.
#' As a user please use either [assign_no_ct()] or [assign_ct()].
#'
#' @param raw_dat The raw dataset (dataframe); must include the
#' variables passed in `id_vars` and `raw_var`.
#' @param raw_var The raw variable: a single string indicating the name of the
#' raw variable in `raw_dat`.
#' @param tgt_var The target SDTM variable: a single string indicating the name
#' of variable to be derived.
#' @param ct_spec Study controlled terminology specification: a dataframe with a
#' minimal set of columns, see [ct_spec_vars()] for details. This parameter is
#' optional, if left as `NULL` no controlled terminology recoding is applied.
#' @param ct_clst A codelist code indicating which subset of the controlled
#' terminology to apply in the derivation. This parameter is optional, if left
#' as `NULL`, all possible recodings in `ct_spec` are attempted.
#' @param tgt_dat Target dataset: a data frame to be merged against `raw_dat` by
#' the variables indicated in `id_vars`. This parameter is optional, see
#' section Value for how the output changes depending on this argument value.
#' @param id_vars Key variables to be used in the join between the raw dataset
#' (`raw_dat`) and the target data set (`raw_dat`).
#'
#' @returns The returned data set depends on the value of `tgt_dat`:
#' - If no target dataset is supplied, meaning that `tgt_dat` defaults to
#' `NULL`, then the returned data set is `raw_dat`, selected for the variables
#' indicated in `id_vars`, and a new extra column: the derived variable, as
#' indicated in `tgt_var`.
#' - If the target dataset is provided, then it is merged with the raw data set
#' `raw_dat` by the variables indicated in `id_vars`, with a new column: the
#' derived variable, as indicated in `tgt_var`.
#'
#'
#' @importFrom rlang :=
#' @keywords internal
sdtm_assign <- function(raw_dat,
raw_var,
tgt_var,
ct_spec = NULL,
ct_clst = NULL,
tgt_dat = NULL,
id_vars = oak_id_vars()) {
admiraldev::assert_character_scalar(raw_var)
admiraldev::assert_character_scalar(tgt_var)
admiraldev::assert_character_vector(id_vars)
assertthat::assert_that(contains_oak_id_vars(id_vars),
msg = "`id_vars` must include the oak id vars."
)
admiraldev::assert_data_frame(raw_dat, required_vars = rlang::syms(c(id_vars, raw_var)))
admiraldev::assert_data_frame(tgt_dat, required_vars = rlang::syms(id_vars), optional = TRUE)
assert_ct_spec(ct_spec, optional = TRUE)
assert_ct_clst(ct_spec = ct_spec, ct_clst = ct_clst, optional = TRUE)

# Recode the raw variable following terminology.
tgt_val <- ct_map(raw_dat[[raw_var]], ct_spec = ct_spec, ct_clst = ct_clst)

# Apply derivation by assigning `raw_var` to `tgt_var`.
# `der_dat`: derived dataset.
der_dat <-
raw_dat |>
dplyr::select(c(id_vars, raw_var)) |>
dplyr::mutate("{tgt_var}" := tgt_val) |> # nolint object_name_linter()
dplyr::select(-rlang::sym(raw_var))

# If a target dataset is supplied, then join the so far derived dataset with
# the target dataset (`tgt_dat`), otherwise leave it be.
der_dat <-
if (!is.null(tgt_dat)) {
der_dat |>
dplyr::right_join(y = tgt_dat, by = id_vars) |>
dplyr::relocate(tgt_var, .after = dplyr::last_col())
} else {
der_dat
}

der_dat
}

#' Derive an SDTM variable
#'
#' @description
#' - [assign_no_ct()] maps a variable in a raw dataset to a target SDTM
#' variable that has no terminology restrictions.
#'
#' - [assign_ct()] maps a variable in a raw dataset to a target SDTM variable
#' following controlled terminology recoding.
#'
#' @param raw_dat The raw dataset (dataframe); must include the
#' variables passed in `id_vars` and `raw_var`.
#' @param raw_var The raw variable: a single string indicating the name of the
#' raw variable in `raw_dat`.
#' @param tgt_var The target SDTM variable: a single string indicating the name
#' of variable to be derived.
#' @param ct_spec Study controlled terminology specification: a dataframe with a
#' minimal set of columns, see [ct_spec_vars()] for details.
#' @param ct_clst A codelist code indicating which subset of the controlled
#' terminology to apply in the derivation.
#' @param tgt_dat Target dataset: a data frame to be merged against `raw_dat` by
#' the variables indicated in `id_vars`. This parameter is optional, see
#' section Value for how the output changes depending on this argument value.
#' @param id_vars Key variables to be used in the join between the raw dataset
#' (`raw_dat`) and the target data set (`raw_dat`).
#'
#' @returns The returned data set depends on the value of `tgt_dat`:
#' - If no target dataset is supplied, meaning that `tgt_dat` defaults to
#' `NULL`, then the returned data set is `raw_dat`, selected for the variables
#' indicated in `id_vars`, and a new extra column: the derived variable, as
#' indicated in `tgt_var`.
#' - If the target dataset is provided, then it is merged with the raw data set
#' `raw_dat` by the variables indicated in `id_vars`, with a new column: the
#' derived variable, as indicated in `tgt_var`.
#'
#' @examples
#'
#' md1 <-
#' tibble::tibble(
#' oak_id = 1:14,
#' raw_source = "MD1",
#' patient_number = 101:114,
#' MDIND = c(
#' "NAUSEA", "NAUSEA", "ANEMIA", "NAUSEA", "PYREXIA",
#' "VOMITINGS", "DIARHHEA", "COLD",
#' "FEVER", "LEG PAIN", "FEVER", "COLD", "COLD", "PAIN"
#' )
#' )
#'
#' assign_no_ct(
#' raw_dat = md1,
#' raw_var = "MDIND",
#' tgt_var = "CMINDC",
#' )
#'
#' cm_inter <-
#' tibble::tibble(
#' oak_id = 1:14,
#' raw_source = "MD1",
#' patient_number = 101:114,
#' CMTRT = c(
#' "BABY ASPIRIN",
#' "CORTISPORIN",
#' "ASPIRIN",
#' "DIPHENHYDRAMINE HCL",
#' "PARCETEMOL",
#' "VOMIKIND",
#' "ZENFLOX OZ",
#' "AMITRYPTYLINE",
#' "BENADRYL",
#' "DIPHENHYDRAMINE HYDROCHLORIDE",
#' "TETRACYCLINE",
#' "BENADRYL",
#' "SOMINEX",
#' "ZQUILL"
#' ),
#' CMROUTE = c(
#' "ORAL",
#' "ORAL",
#' NA,
#' "ORAL",
#' "ORAL",
#' "ORAL",
#' "INTRAMUSCULAR",
#' "INTRA-ARTERIAL",
#' NA,
#' "NON-STANDARD",
#' "RANDOM_VALUE",
#' "INTRA-ARTICULAR",
#' "TRANSDERMAL",
#' "OPHTHALMIC"
#' )
#' )
#'
#' # Controlled terminology specification
#' (ct_spec <- read_ct_spec_example("ct-01-cm"))
#'
#' assign_ct(
#' raw_dat = md1,
#' raw_var = "MDIND",
#' tgt_var = "CMINDC",
#' ct_spec = ct_spec,
#' ct_clst = "C66729",
#' tgt_dat = cm_inter
#' )
#'
#' @name assign
NULL

#' @order 1
#' @export
#' @rdname assign
assign_no_ct <- function(raw_dat,
raw_var,
tgt_var,
tgt_dat = NULL,
id_vars = oak_id_vars()) {
admiraldev::assert_character_scalar(raw_var)
admiraldev::assert_character_scalar(tgt_var)
admiraldev::assert_character_vector(id_vars)
assertthat::assert_that(contains_oak_id_vars(id_vars),
msg = "`id_vars` must include the oak id vars."
)
admiraldev::assert_data_frame(raw_dat, required_vars = rlang::syms(c(id_vars, raw_var)))
admiraldev::assert_data_frame(tgt_dat, required_vars = rlang::syms(id_vars), optional = TRUE)

sdtm_assign(
raw_dat = raw_dat,
raw_var = raw_var,
tgt_var = tgt_var,
tgt_dat = tgt_dat,
id_vars = id_vars
)
}

#' @order 2
#' @export
#' @rdname assign
assign_ct <- function(raw_dat,
raw_var,
tgt_var,
ct_spec,
ct_clst,
tgt_dat = NULL,
id_vars = oak_id_vars()) {
admiraldev::assert_character_scalar(raw_var)
admiraldev::assert_character_scalar(tgt_var)
admiraldev::assert_character_vector(id_vars)
assertthat::assert_that(contains_oak_id_vars(id_vars),
msg = "`id_vars` must include the oak id vars."
)
admiraldev::assert_data_frame(raw_dat, required_vars = rlang::syms(c(id_vars, raw_var)))
admiraldev::assert_data_frame(tgt_dat, required_vars = rlang::syms(id_vars), optional = TRUE)

sdtm_assign(
raw_dat = raw_dat,
raw_var = raw_var,
tgt_var = tgt_var,
tgt_dat = tgt_dat,
id_vars = id_vars,
ct_spec = ct_spec,
ct_clst = ct_clst
)
}
19 changes: 19 additions & 0 deletions R/clear_cache.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#' Clear `{sdtm.oak}` cache of memoised functions
#'
#' @description
#' Some of `{sdtm.oak}` functions have their results cached for runtime
#' efficiency. Use this function to reset the cache.
#'
#' Memoised functions:
#' - [ct_mappings()]
#'
#' @return Returns a logical value, indicating whether the resetting of the
#' cache was successful (`TRUE`) or not (`FALSE`).
#'
#' @examples
#' clear_cache()
#'
#' @export
clear_cache <- function() {
memoise::forget(ct_mappings)
}
Loading

0 comments on commit 5fc61af

Please sign in to comment.