Skip to content

Commit

Permalink
assign_datetime algorithm (#47)
Browse files Browse the repository at this point in the history
* First mockup of `hardcode_no_ct()`

* Update `hardcode_no_ct()`

Update `hardcode_no_ct()` by allowing the rewriting of the `target_sdtm_variable` variable to preserve `NA`

* Align `hardcode_no_ct()` code style with Ramm's expectations

* Add `hardcode_*()` and `assign_*()` functions

* hardcode_no_ct algorithm code changes (#45)

* hardcode_no_ct algorithm code changes

* harcode_ct working as expected

* assign_ct and assign_no_ct works great.

* address review comments

* Add `oak_id_vars()`

* Fix typo in `recode()`

* Simplify `oak_id_vars()` docs

* Update `assign_*` and `hardcode_*` implementations

* Introduce memoisation of `ct_mappings()`

* Update of README introductory paragraph

* Update hardcode_* functions' interface

* Add `contains_oak_id_vars()` function

* Update `contains_oak_id_vars()` doc examples

* Update `sdtm_harcode()` and dependant functions

* Update `assign_*` and `hardcore_*` related functions

* Automatic renv profile update.

* Automatic renv profile update.

* Make `ct` and `cl` parameters mandatory for `assign_ct()`

* Add functions ct importing

- Adds three new user facing ct-related functions: `read_ct_example()`, `ct_example()` and `read_ct()`
- Provides a ct example file in inst/ct/

* Bring `hardcode*()` and `assign*()` related assertions closer to user calling functions

* Add lagging behind Rd for `ct_example()`

* Add `assert_ct()`

* Add ct assertions

* Remove R/.gitkeep

As it is no longer needed.

* Add unit tests for `ct_vars()`

* Update dependencies

* Export `ct_vars()`

Export `ct_vars()` such that we can cross-reference it from other functions' documentation.

* Update `assert_ct()` docs

* Clarify `assign_ct()`/`assign_no_ct()` doc

* Improve grammar in doc

* Remove last empty line from ct example file

* Add documentation to `sdtm_assign()` and ct-related unit tests

Although we had discussed to keep assertions only at the user facing functions, I am getting the feeling we would miss assertions also at the internal function... because of several reasons: firstly, the internal function is more flexible having more optional parameters, which requires extra assertion logic, and also because eventually we will be checking code coverage and we will regret not having done this now.

* Update hardcode-related fns

* Changes to meet linter issues

* Code reformatting

* Code reflow

* Improve `assert_cl()` docs

* Update `read_ct()` docs

* Automatic renv profile update.

* Automatic renv profile update.

* Add units tests for `recode()`

* Remove `are_to_recode()` function

Ended up not using this function.

* Add units tests for `assert_ct()`

* Add one more test for `assert_ct()`

* Add a basic unit test for `ct_mappings()`

* Fill in some doc details of ct-related functions

* Remove leftover doc text in `assign`

* Update website's reference

* Styling update

* Bump version and update NEWS

* Fix a few lintr issues

* Add examples to `ct_map()` doc

* Fix typo in `problems()` doc

* Fix typo

* Initial mockup of `assign_datetime()`

* Add `.warn` parameter to `create_iso8601()` internals

* Remove lint issues

* Replace `.data` usage in tidyselect expressions

See tidyverse/tidyverse.org#600 for more details.

* Variable renaming

- `ct` to `ct_spec` (ct specification)
- `cl` to `ct_cltc` (codelist code)

* Finish pending renaming of variables

* Rename code-list to codelist

* Fix style

* Fix style

* Add assertions to `assign_datetime()`

* Add merge example to `assign_datetime()` doc

* Style changes

* Style changes (.Rd)

* Bump version and update news

* Update `ct_map()` doc example

* Make tibbles more readable in doc examples

* Rename `ct_cltc` to `ct_clst`

As per @rammprasad's suggestion.

* Fix bug in `assign_datetime`

- This bug is related to the support of input is in two different variables (date and time).
- A unit test was also added

* Linting

* Update styling

* Add example with date and time to `assign_datetime()` docs

* Avoid backslash hell (մերսի)

Credit goes to @edgar-manukya for the expression

* Update `ct_spec_vars()` docs' examples

`ct_spec_vars()` used to be an internal function but not anymore: so no need for `:::`.

---------

Co-authored-by: Ram Ganapathy <[email protected]>
Co-authored-by: ramiromagno <[email protected]>
  • Loading branch information
3 people authored May 14, 2024
1 parent 5fc61af commit 4371c41
Show file tree
Hide file tree
Showing 9 changed files with 473 additions and 11 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: sdtm.oak
Type: Package
Title: SDTM Data Transformation Engine
Version: 0.0.0.9002
Version: 0.0.0.9003
Authors@R: c(
person("Rammprasad", "Ganapathy", role = c("aut", "cre"),
email = "[email protected]"),
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

S3method(print,iso8601)
export(assign_ct)
export(assign_datetime)
export(assign_no_ct)
export(clear_cache)
export(create_iso8601)
Expand Down
6 changes: 6 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
# sdtm.oak 0.0.0.9003 (development version)

## New Features

* New function: `assign_datetime()` for deriving an ISO8601 date-time variable.

# sdtm.oak 0.0.0.9002 (development version)

## New Features
Expand Down
196 changes: 196 additions & 0 deletions R/assign_datetime.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,196 @@
#' Derive an ISO8601 date-time variable
#'
#' [assign_datetime()] maps one or more variables with date/time components in a
#' raw dataset to a target SDTM variable following the ISO8601 format.
#'
#' @param raw_dat The raw dataset (dataframe); must include the
#' variables passed in `id_vars` and `raw_var`.
#' @param raw_var The raw variable(s): a character vector indicating the name(s)
#' of the raw variable(s) in `raw_dat` with date or time components to be
#' parsed into a ISO8601 format variable in `tgt_var`.
#' @param raw_fmt A date/time parsing format. Either a character vector or a
#' list of character vectors. If a character vector is passed then each
#' element is taken as parsing format for each variable indicated in
#' `raw_var`. If a list is provided, then each element must be a character
#' vector of formats. The first vector of formats is used for parsing the
#' first variable in `raw_var`, and so on.
#' @param tgt_var The target SDTM variable: a single string indicating the name
#' of variable to be derived.
#' @param raw_unk A character vector of string literals to be regarded as
#' missing values during parsing.
#' @param tgt_dat Target dataset: a data frame to be merged against `raw_dat` by
#' the variables indicated in `id_vars`. This parameter is optional, see
#' section Value for how the output changes depending on this argument value.
#' @param id_vars Key variables to be used in the join between the raw dataset
#' (`raw_dat`) and the target data set (`raw_dat`).
#' @param .warn Whether to warn about parsing failures.
#'
#' @returns The returned data set depends on the value of `tgt_dat`:
#' - If no target dataset is supplied, meaning that `tgt_dat` defaults to
#' `NULL`, then the returned data set is `raw_dat`, selected for the variables
#' indicated in `id_vars`, and a new extra column: the derived variable, as
#' indicated in `tgt_var`.
#' - If the target dataset is provided, then it is merged with the raw data set
#' `raw_dat` by the variables indicated in `id_vars`, with a new column: the
#' derived variable, as indicated in `tgt_var`.
#'
#' @examples
#' # `md1`: an example raw data set.
#' md1 <-
#' tibble::tribble(
#' ~oak_id, ~raw_source, ~patient_number, ~MDBDR, ~MDEDR, ~MDETM,
#' 1L, "MD1", 375, NA, NA, NA,
#' 2L, "MD1", 375, "15-Sep-20", NA, NA,
#' 3L, "MD1", 376, "17-Feb-21", "17-Feb-21", NA,
#' 4L, "MD1", 377, "4-Oct-20", NA, NA,
#' 5L, "MD1", 377, "20-Jan-20", "20-Jan-20", "10:00:00",
#' 6L, "MD1", 377, "UN-UNK-2019", "UN-UNK-2019", NA,
#' 7L, "MD1", 377, "20-UNK-2019", "20-UNK-2019", NA,
#' 8L, "MD1", 378, "UN-UNK-2020", "UN-UNK-2020", NA,
#' 9L, "MD1", 378, "26-Jan-20", "26-Jan-20", "07:00:00",
#' 10L, "MD1", 378, "28-Jan-20", "1-Feb-20", NA,
#' 11L, "MD1", 378, "12-Feb-20", "18-Feb-20", NA,
#' 12L, "MD1", 379, "10-UNK-2020", "20-UNK-2020", NA,
#' 13L, "MD1", 379, NA, NA, NA,
#' 14L, "MD1", 379, NA, "17-Feb-20", NA
#' )
#'
#' # Using the raw data set `md1`, derive the variable CMSTDTC from MDBDR using
#' # the parsing format (`raw_fmt`) `"d-m-y"` (day-month-year), while allowing
#' # for the presence of special date component values (e.g. `"UN"` or `"UNK"`),
#' # indicating that these values are missing/unknown (unk).
#' cm1 <-
#' assign_datetime(
#' raw_dat = md1,
#' raw_var = "MDBDR",
#' raw_fmt = "d-m-y",
#' raw_unk = c("UN", "UNK"),
#' tgt_var = "CMSTDTC"
#' )
#'
#' cm1
#'
#' # Inspect parsing failures associated with derivation of CMSTDTC.
#' problems(cm1$CMSTDTC)
#'
#' # `cm_inter`: an example target data set.
#' cm_inter <-
#' tibble::tibble(
#' oak_id = 1L:14L,
#' raw_source = "MD1",
#' patient_number = c(
#' 375, 375, 376, 377, 377, 377, 377, 378,
#' 378, 378, 378, 379, 379, 379
#' ),
#' CMTRT = c(
#' "BABY ASPIRIN",
#' "CORTISPORIN",
#' "ASPIRIN",
#' "DIPHENHYDRAMINE HCL",
#' "PARCETEMOL",
#' "VOMIKIND",
#' "ZENFLOX OZ",
#' "AMITRYPTYLINE",
#' "BENADRYL",
#' "DIPHENHYDRAMINE HYDROCHLORIDE",
#' "TETRACYCLINE",
#' "BENADRYL",
#' "SOMINEX",
#' "ZQUILL"
#' ),
#' CMINDC = c(
#' "NA",
#' "NAUSEA",
#' "ANEMIA",
#' "NAUSEA",
#' "PYREXIA",
#' "VOMITINGS",
#' "DIARHHEA",
#' "COLD",
#' "FEVER",
#' "LEG PAIN",
#' "FEVER",
#' "COLD",
#' "COLD",
#' "PAIN"
#' )
#' )
#'
#' # Same derivation as above but now involving the merging with the target
#' # data set `cm_inter`.
#' cm2 <-
#' assign_datetime(
#' raw_dat = md1,
#' raw_var = "MDBDR",
#' raw_fmt = "d-m-y",
#' tgt_var = "CMSTDTC",
#' tgt_dat = cm_inter
#' )
#'
#' cm2
#'
#' # Inspect parsing failures associated with derivation of CMSTDTC.
#' problems(cm2$CMSTDTC)
#'
#' # Derive CMSTDTC using both MDEDR and MDETM variables.
#' # Note that the format `"d-m-y"` is used for parsing MDEDR and `"H:M:S"` for
#' # MDETM (correspondence is by positional matching).
#' cm3 <-
#' assign_datetime(
#' raw_dat = md1,
#' raw_var = c("MDEDR", "MDETM"),
#' raw_fmt = c("d-m-y", "H:M:S"),
#' raw_unk = c("UN", "UNK"),
#' tgt_var = "CMSTDTC"
#' )
#'
#' cm3
#'
#' # Inspect parsing failures associated with derivation of CMSTDTC.
#' problems(cm3$CMSTDTC)
#'
#' @export
assign_datetime <-
function(raw_dat,
raw_var,
raw_fmt,
tgt_var,
raw_unk = c("UN", "UNK"),
tgt_dat = NULL,
id_vars = oak_id_vars(),
.warn = TRUE) {
admiraldev::assert_character_vector(raw_var)
admiraldev::assert_character_scalar(tgt_var)
admiraldev::assert_character_vector(id_vars)
assertthat::assert_that(contains_oak_id_vars(id_vars),
msg = "`id_vars` must include the oak id vars."
)
admiraldev::assert_data_frame(raw_dat, required_vars = rlang::syms(c(id_vars, raw_var)))
admiraldev::assert_data_frame(tgt_dat, required_vars = rlang::syms(id_vars), optional = TRUE)
admiraldev::assert_character_vector(raw_unk)
admiraldev::assert_logical_scalar(.warn)

tgt_val <-
create_iso8601(!!!raw_dat[raw_var],
.format = raw_fmt,
.na = raw_unk,
.warn = .warn
)

der_dat <-
raw_dat |>
dplyr::select(c(id_vars, raw_var)) |>
dplyr::mutate("{tgt_var}" := tgt_val) |> # nolint object_name_linter()
dplyr::select(-raw_var)

der_dat <-
if (!is.null(tgt_dat)) {
der_dat |>
dplyr::right_join(y = tgt_dat, by = id_vars) |>
dplyr::relocate(tgt_var, .after = dplyr::last_col())
} else {
der_dat
}

der_dat
}
10 changes: 5 additions & 5 deletions R/ct.R
Original file line number Diff line number Diff line change
Expand Up @@ -16,17 +16,17 @@
#' @examples
#' # These two calls are equivalent and return all required variables in a
#' # controlled terminology data set.
#' sdtm.oak:::ct_spec_vars()
#' sdtm.oak:::ct_spec_vars("all")
#' ct_spec_vars()
#' ct_spec_vars("all")
#'
#' # "Codelist code" variable name.
#' sdtm.oak:::ct_spec_vars("ct_clst")
#' ct_spec_vars("ct_clst")
#'
#' # "From" variables
#' sdtm.oak:::ct_spec_vars("from")
#' ct_spec_vars("from")
#'
#' # The "to" variable.
#' sdtm.oak:::ct_spec_vars("to")
#' ct_spec_vars("to")
#'
#' @keywords internal
#' @export
Expand Down
1 change: 1 addition & 0 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ reference:
- assign
- harcode
- derive_study_day
- assign_datetime

- title: Controlled terminology
contents:
Expand Down
Loading

0 comments on commit 4371c41

Please sign in to comment.