Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of hardcode_no_ct(), hardcode_ct(), assign_no_ct() and assign_ct() #41

Merged
merged 68 commits into from
Apr 10, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
1c4501f
First mockup of `hardcode_no_ct()`
ramiromagno Feb 8, 2024
faef0b1
Update `hardcode_no_ct()`
ramiromagno Feb 17, 2024
fd63b37
Align `hardcode_no_ct()` code style with Ramm's expectations
ramiromagno Feb 21, 2024
80d3943
Add `hardcode_*()` and `assign_*()` functions
ramiromagno Feb 22, 2024
ec5a9e4
hardcode_no_ct algorithm code changes (#45)
rammprasad Mar 13, 2024
0333d95
Add `oak_id_vars()`
ramiromagno Mar 14, 2024
7fd7716
Fix typo in `recode()`
ramiromagno Mar 14, 2024
802aacc
Simplify `oak_id_vars()` docs
ramiromagno Mar 14, 2024
7fd07b4
Update `assign_*` and `hardcode_*` implementations
ramiromagno Mar 14, 2024
9cc26d1
Introduce memoisation of `ct_mappings()`
ramiromagno Mar 14, 2024
329feaa
Update of README introductory paragraph
ramiromagno Mar 14, 2024
29be830
Merge from main
ramiromagno Mar 14, 2024
7720c05
Update hardcode_* functions' interface
ramiromagno Mar 24, 2024
e87ca66
Add `contains_oak_id_vars()` function
ramiromagno Mar 24, 2024
a5e61f0
Update `contains_oak_id_vars()` doc examples
ramiromagno Mar 24, 2024
a60ccd6
Update `sdtm_harcode()` and dependant functions
ramiromagno Mar 24, 2024
cd89804
Update `assign_*` and `hardcore_*` related functions
ramiromagno Mar 25, 2024
ae2da80
Automatic renv profile update.
ramiromagno Mar 25, 2024
30857e3
Automatic renv profile update.
ramiromagno Mar 25, 2024
73ebe2d
Make `ct` and `cl` parameters mandatory for `assign_ct()`
ramiromagno Mar 27, 2024
0eb4677
Add functions ct importing
ramiromagno Mar 27, 2024
dfd7710
Bring `hardcode*()` and `assign*()` related assertions closer to user…
ramiromagno Mar 27, 2024
6652aae
Add lagging behind Rd for `ct_example()`
ramiromagno Mar 27, 2024
59bcc71
Add `assert_ct()`
ramiromagno Mar 27, 2024
7f9f388
Add ct assertions
ramiromagno Mar 27, 2024
4c81ae1
Merge branch '0040_hardcode_no_ct' of github.com:pharmaverse/sdtm.oak…
ramiromagno Mar 27, 2024
4ed5c41
Remove R/.gitkeep
ramiromagno Apr 1, 2024
ca26d22
Add unit tests for `ct_vars()`
ramiromagno Apr 1, 2024
0456d55
Update dependencies
ramiromagno Apr 1, 2024
0e1eab4
Export `ct_vars()`
ramiromagno Apr 1, 2024
84a4f7d
Update `assert_ct()` docs
ramiromagno Apr 1, 2024
7cf1072
Clarify `assign_ct()`/`assign_no_ct()` doc
ramiromagno Apr 1, 2024
7dff0aa
Improve grammar in doc
ramiromagno Apr 1, 2024
cb2f2e8
Remove last empty line from ct example file
ramiromagno Apr 1, 2024
454b7d8
Add documentation to `sdtm_assign()` and ct-related unit tests
ramiromagno Apr 1, 2024
fafe01b
Update hardcode-related fns
ramiromagno Apr 1, 2024
3a4b355
Changes to meet linter issues
ramiromagno Apr 1, 2024
37575b2
Code reformatting
ramiromagno Apr 1, 2024
c176654
Code reflow
ramiromagno Apr 1, 2024
dafcfef
Improve `assert_cl()` docs
ramiromagno Apr 1, 2024
e128779
Update `read_ct()` docs
ramiromagno Apr 1, 2024
0895764
Automatic renv profile update.
ramiromagno Apr 1, 2024
339039e
Automatic renv profile update.
ramiromagno Apr 1, 2024
ab9db14
Add units tests for `recode()`
ramiromagno Apr 1, 2024
52c52fa
Remove `are_to_recode()` function
ramiromagno Apr 1, 2024
229c0bd
Add units tests for `assert_ct()`
ramiromagno Apr 1, 2024
c83bfdf
Add one more test for `assert_ct()`
ramiromagno Apr 1, 2024
a362578
Add a basic unit test for `ct_mappings()`
ramiromagno Apr 1, 2024
934a15c
Fill in some doc details of ct-related functions
ramiromagno Apr 2, 2024
0dcf0fc
Remove leftover doc text in `assign`
ramiromagno Apr 2, 2024
a44c865
Update website's reference
ramiromagno Apr 2, 2024
efb423f
Styling update
ramiromagno Apr 2, 2024
365fa09
Bump version and update NEWS
ramiromagno Apr 2, 2024
b267610
Fix a few lintr issues
ramiromagno Apr 2, 2024
cbd38eb
Merge branch '0040_hardcode_no_ct' of github.com:pharmaverse/sdtm.oak…
ramiromagno Apr 2, 2024
9cb23f5
Add examples to `ct_map()` doc
ramiromagno Apr 2, 2024
1bebdd8
Fix typo in `problems()` doc
ramiromagno Apr 2, 2024
a8f1bf5
Fix typo
ramiromagno Apr 2, 2024
5987684
Remove lint issues
ramiromagno Apr 3, 2024
2791ef0
Replace `.data` usage in tidyselect expressions
ramiromagno Apr 3, 2024
2a8dbf5
Variable renaming
ramiromagno Apr 4, 2024
a718207
Finish pending renaming of variables
ramiromagno Apr 4, 2024
8cc8dcb
Rename code-list to codelist
ramiromagno Apr 4, 2024
609b60e
Fix style
ramiromagno Apr 4, 2024
e8beefc
Fix style
ramiromagno Apr 4, 2024
42d4d5a
Update `ct_map()` doc example
ramiromagno Apr 10, 2024
66644eb
Make tibbles more readable in doc examples
ramiromagno Apr 10, 2024
bb2e0d2
Rename `ct_cltc` to `ct_clst`
ramiromagno Apr 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
S3method(print,iso8601)
export(create_iso8601)
export(fmt_cmp)
export(hardcode_no_ct)
export(problems)
importFrom(rlang,":=")
importFrom(rlang,.data)
importFrom(tibble,tibble)
69 changes: 69 additions & 0 deletions R/hardcode_no_ct.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
#' Derive an SDTM variable with a hardcoded value
#'
#' [hardcode_no_ct()] maps a hardcoded value to a target SDTM variable that has
#' no terminology restrictions.
#'
#' @param raw_dataset The raw dataset.
#' @param raw_variable The raw variable.
#' @param target_sdtm_variable The target SDTM variable.
#' @param target_hardcoded_value Hardcoded value.
#' @param target_dataset Target dataset. By default the same as `raw_dataset`.
#' @param merge_to_topic_by If `target_dataset` is different than `raw_dataset`,
#' then this parameter defines keys to use in the join between `raw_dataset`
#' and `target_dataset`.
#'
#' @examples
#' MD1 <-
#' tibble::tribble(
#' ~oak_id, ~raw_source, ~patient_number, ~MDRAW,
#' 1L, "MD1", "PATNUM", "BABY ASPIRIN",
#' 2L, "MD1", "PATNUM", "CORTISPORIN",
#' 3L, "MD1", "PATNUM", NA_character_,
ramiromagno marked this conversation as resolved.
Show resolved Hide resolved
#' 4L, "MD1", "PATNUM", "DIPHENHYDRAMINE HCL"
#' )
#'
#' # Derive a new variable `CMCAT` by overwriting `MDRAW` with the
#' # hardcoded value "GENERAL CONCOMITANT MEDICATIONS".
#' hardcode_no_ct(
#' raw_dataset = MD1,
#' raw_variable = "MDRAW",
#' target_sdtm_variable = "CMCAT",
#' target_hardcoded_value = "GENERAL CONCOMITANT MEDICATIONS"
#' )
#'
#' CM_INTER <-
#' tibble::tribble(
#' ~oak_id, ~raw_source, ~patient_number, ~CMTRT, ~CMINDC,
#' 1L, "MD1", "PATNUM", "BABY ASPIRIN", NA,
#' 2L, "MD1", "PATNUM", "CORTISPORIN", "NAUSEA",
#' 3L, "MD1", "PATNUM", "ASPIRIN", "ANEMIA",
#' 4L, "MD1", "PATNUM", "DIPHENHYDRAMINE HCL", "NAUSEA",
#' 5L, "MD1", "PATNUM", "PARACETAMOL", "PYREXIA"
#' )
#'
#' # Derive a new variable `CMCAT` by overwriting `MDRAW` with the
#' # hardcoded value "GENERAL CONCOMITANT MEDICATIONS" with a prior join to
#' # `target_dataset`.
#'
#' hardcode_no_ct(
#' raw_dataset = MD1,
#' raw_variable = "MDRAW",
#' target_sdtm_variable = "CMCAT",
#' target_hardcoded_value = "GENERAL CONCOMITANT MEDICATIONS",
#' target_dataset = CM_INTER,
#' merge_to_topic_by = c("oak_id", "raw_source", "patient_number")
#' )
#'
#' @importFrom rlang :=
#' @export
hardcode_no_ct <- function(raw_dataset,
raw_variable,
target_sdtm_variable,
target_hardcoded_value,
target_dataset = raw_dataset,
merge_to_topic_by = NULL) {
dplyr::right_join(x = raw_dataset, y = target_dataset, by = merge_to_topic_by) |>
dplyr::mutate("{raw_variable}" := overwrite(!!rlang::sym(raw_variable), target_hardcoded_value)) |>
dplyr::rename("{target_sdtm_variable}" := raw_variable) |>
dplyr::relocate(target_sdtm_variable, .after = dplyr::last_col())
ramiromagno marked this conversation as resolved.
Show resolved Hide resolved
}
148 changes: 148 additions & 0 deletions R/recode.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
#' Overwrite values
#'
#' @description
#' [overwrite()] recodes values in `x` to a new set of values provided in `to`;
#' the values in `to` are recycled to match the length of `x`. By default,
#' missing values remain `NA`.
#'
#' @param x An atomic vector.
#' @param .na New value for missing values in `x`. Defaults to `NA`.
#'
#' @returns A vector of the same length of `x` with new values matching those
#' in `to`.
#'
#' @examples
#' x <- c(letters[1:4], NA, NA)
#' # Recode all values to `"x"` but keep `NA`.
#' sdtm.oak:::overwrite(x, to = "x")
#'
#' # Recode all values to `"x"` but recode `NA` to a new value.
#' sdtm.oak:::overwrite(x, to = "x", .na = "x")
#' sdtm.oak:::overwrite(x, to = "x", .na = "Absent")
#'
#' # If `to` is not a scalar, it is recycled and matched by position for
#' # replacement.
#' sdtm.oak:::overwrite(x, to = c("x", "y"))
#'
#' # `x` can be of other types besides `character`, e.g. replace integers to a
#' # hard-coded new integer value.
#' sdtm.oak:::overwrite(x = 1:5, to = 0)
#'
#' # Example involving `logical` vectors
#' sdtm.oak:::overwrite(x = c(TRUE, FALSE), to = FALSE)
#'
#' # Returned type will be a type compatible with both the types of `to` and
#' # `.na`.
#' sdtm.oak:::overwrite(x = c("sdtm", "adam"), to = 0)
#' sdtm.oak:::overwrite(
#' x = c("sdtm", "adam"),
#' to = 0,
#' .na = NA_character_
#' )
#' sdtm.oak:::overwrite(
#' x = c("sdtm", "adam"),
#' to = TRUE,
#' .na = NA_real_
#' )
#'
#' @keywords internal
overwrite <- function(x, to, .na = NA) {
# y <- rep_len(to, length(x))
y <- rlang::rep_along(x, to)
y[is.na(x)] <- .na

y
}

#' Determine Indices for Rewriting
#'
#' [index_for_rewrite()] identifies the positions of elements in `x` that match
#' any of the values specified in the `from` vector. This function is primarily
#' used to facilitate the rewriting of values by pinpointing which elements in
#' `x` correspond to the `from` values and thus need to be replaced or updated.
#'
#' @param x A vector of values in which to search for matches.
#' @param from A vector of values to match against the elements in `x`.
#' @return An integer vector of the same length as `x`, containing the indices
#' of the matched values from the `from` vector. If an element in `x` does not
#' match any value in `from`, the corresponding position in the output will be
#' `NA`. This index information is critical for subsequent rewrite operations.
#' @examples
#' sdtm.oak:::index_for_rewrite(x = 1:5, from = c(2, 4))
#'
#' @keywords internal
index_for_rewrite <- function(x, from) {
match(x, from)
}

#' Are values to be rewritten?
#'
#' `are_to_rewrite` is a helper function designed to determine if any values
#' in a vector `x` match the specified `from` values, indicating they are
#' candidates for recoding or rewriting.
#'
#' @param x A vector of values that will be checked against the `from` vector.
#' @param from A vector of values that `x` will be checked for matches against.
#' @return A logical vector of the same length as `x`, where `TRUE` indicates
#' that the corresponding value in `x` matches a value in `from` and
#' should be rewritten, and `FALSE` otherwise. If `x` is empty, returns
#' an empty logical vector. This function is intended for internal use
#' and optimization in data transformation processes.
#' @keywords internal
#' @examples
#' sdtm.oak:::are_to_rewrite(x = 1:5, from = c(2, 4))
#'
#' sdtm.oak:::are_to_rewrite(letters[1:3], from = c("a", "c"))
#'
#' @keywords internal
are_to_rewrite <- function(x, from) {
# match(x, from, nomatch = 0) != 0
!is.na(index_for_rewrite(x, from))
}

#' Rewrite values
#'
#' [rewrite()] recodes values in `x` by matching elements in `from` onto values
#' in `to`.
#'
#' @param x An atomic vector of values are to be recoded.
#' @param from A vector of values to be matched in `x` for rewriting.
#' @param to A vector of values to be used as replacement for values in `from`.
#' @param .no_match Value to be used as replacement when cases in `from` are not
#' matched.
#' @param .na Value to be used to recode missing values.
#'
#' @returns A vector of recoded values.
#'
#' @examples
#' x <- c("male", "female", "x", NA)
#' sdtm.oak:::rewrite(x,
#' from = c("male", "female"),
#' to = c("M", "F")
#' )
#' sdtm.oak:::rewrite(
#' x,
#' from = c("male", "female"),
#' to = c("M", "F"),
#' .no_match = "?"
#' )
#' sdtm.oak:::rewrite(
#' x,
#' from = c("male", "female"),
#' to = c("M", "F"),
#' .na = "missing"
#' )
#'
#' @keywords internal
rewrite <- function(x,
from,
to,
.no_match = x,
.na = NA) {
to <- rlang::rep_along(x, to)
index <- index_for_rewrite(x, from)
y <- ifelse(!is.na(index), to[index], .no_match)
y[is.na(x)] <- .na

y
}
3 changes: 3 additions & 0 deletions inst/WORDLIST
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,6 @@ funder
vectorized
ORCID
iso
hardcoded
CDISC
PMDA
32 changes: 32 additions & 0 deletions man/are_to_rewrite.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

77 changes: 77 additions & 0 deletions man/hardcode_no_ct.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

30 changes: 30 additions & 0 deletions man/index_for_rewrite.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading