Skip to content

Commit

Permalink
0015 derive seq (#53)
Browse files Browse the repository at this point in the history
* First mockup of `hardcode_no_ct()`

* Update `hardcode_no_ct()`

Update `hardcode_no_ct()` by allowing the rewriting of the `target_sdtm_variable` variable to preserve `NA`

* Align `hardcode_no_ct()` code style with Ramm's expectations

* Add `hardcode_*()` and `assign_*()` functions

* hardcode_no_ct algorithm code changes (#45)

* hardcode_no_ct algorithm code changes

* harcode_ct working as expected

* assign_ct and assign_no_ct works great.

* address review comments

* Add `oak_id_vars()`

* Fix typo in `recode()`

* Simplify `oak_id_vars()` docs

* Update `assign_*` and `hardcode_*` implementations

* Introduce memoisation of `ct_mappings()`

* Update of README introductory paragraph

* Update hardcode_* functions' interface

* Add `contains_oak_id_vars()` function

* Update `contains_oak_id_vars()` doc examples

* Update `sdtm_harcode()` and dependant functions

* Update `assign_*` and `hardcore_*` related functions

* Automatic renv profile update.

* Automatic renv profile update.

* Make `ct` and `cl` parameters mandatory for `assign_ct()`

* Add functions ct importing

- Adds three new user facing ct-related functions: `read_ct_example()`, `ct_example()` and `read_ct()`
- Provides a ct example file in inst/ct/

* Bring `hardcode*()` and `assign*()` related assertions closer to user calling functions

* Add lagging behind Rd for `ct_example()`

* Add `assert_ct()`

* Add ct assertions

* Remove R/.gitkeep

As it is no longer needed.

* Add unit tests for `ct_vars()`

* Update dependencies

* Export `ct_vars()`

Export `ct_vars()` such that we can cross-reference it from other functions' documentation.

* Update `assert_ct()` docs

* Clarify `assign_ct()`/`assign_no_ct()` doc

* Improve grammar in doc

* Remove last empty line from ct example file

* Add documentation to `sdtm_assign()` and ct-related unit tests

Although we had discussed to keep assertions only at the user facing functions, I am getting the feeling we would miss assertions also at the internal function... because of several reasons: firstly, the internal function is more flexible having more optional parameters, which requires extra assertion logic, and also because eventually we will be checking code coverage and we will regret not having done this now.

* Update hardcode-related fns

* Changes to meet linter issues

* Code reformatting

* Code reflow

* Improve `assert_cl()` docs

* Update `read_ct()` docs

* Automatic renv profile update.

* Automatic renv profile update.

* Add units tests for `recode()`

* Remove `are_to_recode()` function

Ended up not using this function.

* Add units tests for `assert_ct()`

* Add one more test for `assert_ct()`

* Add a basic unit test for `ct_mappings()`

* Fill in some doc details of ct-related functions

* Remove leftover doc text in `assign`

* Update website's reference

* Styling update

* Bump version and update NEWS

* Fix a few lintr issues

* Add examples to `ct_map()` doc

* Fix typo in `problems()` doc

* Fix typo

* Initial mockup of `assign_datetime()`

* Add `.warn` parameter to `create_iso8601()` internals

* Remove lint issues

* Replace `.data` usage in tidyselect expressions

See tidyverse/tidyverse.org#600 for more details.

* Variable renaming

- `ct` to `ct_spec` (ct specification)
- `cl` to `ct_cltc` (codelist code)

* Finish pending renaming of variables

* Rename code-list to codelist

* Fix style

* Fix style

* Add assertions to `assign_datetime()`

* Add merge example to `assign_datetime()` doc

* Style changes

* Style changes (.Rd)

* Bump version and update news

* Update `ct_map()` doc example

* Make tibbles more readable in doc examples

* Rename `ct_cltc` to `ct_clst`

As per @rammprasad's suggestion.

* Stash changes so far

* Fix bug in `assign_datetime`

- This bug is related to the support of input is in two different variables (date and time).
- A unit test was also added

* Linting

* Update styling

* Add renv profile for R version 4.4

* Fix a few issues in `domain_example()` docs

* Comment `derive_seq()`

Comment `derive_seq()` function for the moment for testing purposes with new R version.

* Add example with date and time to `assign_datetime()` docs

* Avoid backslash hell (մերսի)

Credit goes to @edgar-manukya for the expression

* Update `ct_spec_vars()` docs' examples

`ct_spec_vars()` used to be an internal function but not anymore: so no need for `:::`.

* Fix typo in `assign_datetime()` documentation

* Fix typo in `ct_map()` docs

* Reformating of docs of `derive_seq()`

* First mockup of `derive_seq()`

* Simplification of `derive_seq()` interface

`derive_seq()` no longer has a default value for parameter `id_vars` as per the meeting of 2024-05-15, so function's logic has been simplified accordingly.

* Bump dev version and update NEWS

* Automatic renv profile update.

* Update `derive_seq()`

- Two examples of domain data sets were added: vs and apsc
- An internal data set `domain_record_vars` provides the mapping between domain names and key variables (thank you @edgar-manukyan)
- `rec_vars()` was added to access those record-defining variables
- Likewise, `sbj_vars()` for subject-level defining rows

* Automatic renv profile update.

* Strip down the domain example data sets

* Remove `rec_vars()`

* Fix linting issues

* Linting issues (still)

* Fix code styling issues

* Addressing Edgar's review

- `derive_seq()` checks the name of `tgt_var`
- Added comment to clarify the code

* Add `sbj_vars()` to pkgdown reference list

---------

Co-authored-by: Ram Ganapathy <[email protected]>
Authored-by: ramiromagno <[email protected]>
  • Loading branch information
ramiromagno and rammprasad authored May 30, 2024
1 parent f9816fb commit e1aa479
Show file tree
Hide file tree
Showing 25 changed files with 1,504 additions and 381 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: sdtm.oak
Type: Package
Title: SDTM Data Transformation Engine
Version: 0.0.0.9003
Version: 0.0.0.9004
Authors@R: c(
person("Rammprasad", "Ganapathy", role = c("aut", "cre"),
email = "[email protected]"),
Expand Down
4 changes: 4 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,17 @@ export(create_iso8601)
export(ct_map)
export(ct_spec_example)
export(ct_spec_vars)
export(derive_seq)
export(derive_study_day)
export(domain_example)
export(fmt_cmp)
export(hardcode_ct)
export(hardcode_no_ct)
export(problems)
export(read_ct_spec)
export(read_ct_spec_example)
export(read_domain_example)
export(sbj_vars)
importFrom(rlang,"%||%")
importFrom(rlang,":=")
importFrom(rlang,.data)
Expand Down
6 changes: 6 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
# sdtm.oak 0.0.0.9004 (development version)

## New Features

* New function: `derive_seq()` for deriving a sequence number variable.

# sdtm.oak 0.0.0.9003 (development version)

## New Features
Expand Down
97 changes: 97 additions & 0 deletions R/derive_seq.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
#' Derive the sequence number (`--SEQ`) variable
#'
#' @description
#' [derive_seq()] creates a new identifier variable: the sequence number
#' (`--SEQ`).
#'
#' This function adds a newly derived variable to `tgt_dat`, namely the sequence
#' number (`--SEQ`) whose name is the one provided in `tgt_var`. An integer
#' sequence is generated that uniquely identifies each record within the domain.
#'
#' Prior to the derivation of `tgt_var`, the data frame `tgt_dat` is sorted
#' according to grouping variables indicated in `rec_vars`.
#'
#' @param tgt_dat The target dataset, a data frame.
#' @param tgt_var The target SDTM variable: a single string indicating the name
#' of the sequence number (`--SEQ`) variable, e.g. `"DSSEQ"`. Note that
#' supplying a name not ending in `"SEQ"` will raise a warning.
#' @param rec_vars A character vector of record-level identifier variables.
#' @param sbj_vars A character vector of subject-level identifier variables.
#' @param start_at The sequence numbering starts at this value (default is `1`).
#'
#' @returns Returns the data frame supplied in `tgt_dat` with the newly derived
#' variable, i.e. the sequence number (`--SEQ`), whose name is that passed in
#' `tgt_var`. This variable is of type integer.
#'
#' @examples
#' # A VS raw data set example
#' (vs <- read_domain_example("vs"))
#'
#' # Derivation of VSSEQ
#' rec_vars <- c("STUDYID", "USUBJID", "VSTESTCD", "VSDTC", "VSTPTNUM")
#' derive_seq(tgt_dat = vs, tgt_var = "VSSEQ", rec_vars = rec_vars)
#'
#' # An APSC raw data set example
#' (apsc <- read_domain_example("apsc"))
#'
#' # Derivation of APSEQ
#' derive_seq(
#' tgt_dat = apsc,
#' tgt_var = "APSEQ",
#' rec_vars = c("STUDYID", "RSUBJID", "SCTESTCD"),
#' sbj_vars = c("STUDYID", "RSUBJID")
#' )
#' @export
derive_seq <-
function(tgt_dat,
tgt_var,
rec_vars,
sbj_vars = sdtm.oak::sbj_vars(),
start_at = 1L) {
admiraldev::assert_character_scalar(tgt_var)
if (!is_seq_name(tgt_var)) {
rlang::warn("Target variable name (`tgt_var`) should end in 'SEQ'.")
}

admiraldev::assert_character_vector(rec_vars)
admiraldev::assert_character_vector(sbj_vars)
admiraldev::assert_data_frame(tgt_dat,
required_vars = rlang::syms(rec_vars),
optional = FALSE
)

admiraldev::assert_integer_scalar(start_at, subset = "non-negative")

tgt_dat |>
# Ensure that no prior grouping exists that alters ordering and new
# grouping.
dplyr::ungroup() |>
dplyr::arrange(dplyr::across(.cols = dplyr::all_of(rec_vars))) |>
dplyr::group_by(dplyr::across(dplyr::all_of(sbj_vars))) |>
dplyr::mutate("{tgt_var}" := dplyr::row_number() + start_at - 1L) |> # nolint object_name_linter()
dplyr::ungroup()
}

#' Is it a --SEQ variable name
#'
#' [is_seq_name()] returns which variable names end in `"SEQ"`.
#'
#' @param x A character vector.
#'
#' @returns A logical vector.
#'
#' @examples
#' # A valid SEQ name.
#' sdtm.oak:::is_seq_name("AESEQ")
#'
#' # Not valid sequence number (`--SEQ`) variable names.
#' # Case matters.
#' sdtm.oak:::is_seq_name("AEseq")
#'
#' # A valid name has to end in "SEQ".
#' sdtm.oak:::is_seq_name("AESEQUENCE")
#'
#' @keywords internal
is_seq_name <- function(x) {
stringr::str_detect(x, "SEQ$")
}
129 changes: 129 additions & 0 deletions R/domain_example.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
#' Find the path to an example SDTM domain file
#'
#' @description
#' [domain_example()] resolves the local path to a SDTM domain example file. The
#' domain examples files were imported from
#' [pharmaversesdtm](https://cran.r-project.org/package=pharmaversesdtm). See
#' Details section for available datasets.
#'
#' @details
#' Datasets were obtained from
#' [pharmaversesdtm](https://cran.r-project.org/package=pharmaversesdtm) but are
#' originally sourced from the [CDISC pilot
#' project](https://github.com/cdisc-org/sdtm-adam-pilot-project) or have been
#' constructed ad-hoc by the
#' [admiral](https://cran.r-project.org/package=admiral) team. These datasets
#' are bundled with `{sdtm.oak}`, thus obviating a dependence on
#' `{pharmaversesdtm}`.
#'
#' ### Example SDTM domains
#'
#' \describe{
#' \item{`"ae_ophtha"`}{Ophthalmology Adverse Events Dataset.}
#' \item{`"ae"`}{Adverse Events Dataset-updated.}
#' \item{`"ce_vaccine"`}{Clinical Events Dataset for Vaccine Studies.}
#' \item{`"cm"`}{Concomitant Medication Dataset.}
#' \item{`"dm_vaccine"`}{Demographics Dataset for Vaccine Studies.}
#' \item{`"dm"`}{Demography Dataset.}
#' \item{`"ds"`}{Disposition Dataset-updated.}
#' \item{`"eg"`}{ Electrocardiogram Dataset.}
#' \item{`"ex_ophtha"`}{Ophthalmology Exposure Dataset.}
#' \item{`"ex_vaccine"`}{Exposures Dataset for Vaccine Studies.}
#' \item{`"ex"`}{Exposure Dataset.}
#' \item{`"face_vaccine"`}{Findings About Clinical Events Dataset for Vaccine Studies.}
#' \item{`"is_vaccine"`}{Immunogenicity Specimen Assessments Dataset for Vaccine Studies.}
#' \item{`"lb"`}{Laboratory Measurements Dataset.}
#' \item{`"mh"`}{Medical History Dataset-updated.}
#' \item{`"oe_ophtha"`}{Ophthalmology Adverse Events Dataset.}
#' \item{`"pc"`}{Pharmacokinetics Concentrations Dataset.}
#' \item{`"pp"`}{Pharmacokinetics Parameters Dataset.}
#' \item{`"qs_ophtha"`}{Ophthalmology Questionnaire Dataset.}
#' \item{`"rs_onco_irecist"`}{Disease Response Dataset (iRECIST).}
#' \item{`"rs_onco"`}{Disease Response Dataset.}
#' }
#'
#' @param example A string with either the basename, file name, or relative path
#' to a SDTM domain example file bundled with `{stdm.oak}`, e.g. `"cm"`
#' (Concomitant Medication) or `"ae"` (Adverse Events).
#'
#' @returns The local path to an example file if `example` is supplied, or a
#' character vector of example file names.
#'
#' @examples
#' # If no example is provided it returns a vector of possible choices.
#' domain_example()
#'
#' # Get the local path to the Concomitant Medication dataset file.
#' domain_example("cm")
#'
#' # Local path to the Adverse Events dataset file.
#' domain_example("ae")
#'
#' @source See \url{https://cran.r-project.org/package=pharmaversesdtm}.
#'
#' @seealso [read_domain_example()]
#' @export
domain_example <- function(example) {
# If no example is requested, then return all available files.
if (missing(example)) {
domain_path <- system.file("domain", package = "sdtm.oak", mustWork = TRUE)
domain_files <- list.files(domain_path, pattern = "*.rds")
domains <- tools::file_path_sans_ext(basename(domain_files))
return(domains)
}

# Otherwise, resolve the local path to the example requested.
admiraldev::assert_character_scalar(example, optional = TRUE)
base_name <- tools::file_path_sans_ext(basename(example))
path <- file.path("domain", paste0(base_name, ".rds"))
local_path <- system.file(path, package = "sdtm.oak")

if (identical(local_path, "")) {
stop(
glue::glue(
"'{example}' does not match any domain example files. Run `domain_example()` for options."
),
call. = FALSE
)
} else {
local_path <-
system.file(path, package = "sdtm.oak", mustWork = TRUE)
return(local_path)
}
}

#' Read an example SDTM domain
#'
#' [read_domain_example()] imports one of the bundled SDTM domain examples
#' as a [tibble][tibble::tibble-package] into R. See [domain_example()] for
#' possible choices.
#'
#' @param example The name of SDTM domain example, e.g. `"cm"` (Concomitant
#' Medication) or `"ae"` (Adverse Events). Run `read_domain_example()` for
#' available example files.
#'
#' @returns A [tibble][tibble::tibble-package] with an SDTM domain dataset, or a
#' character vector of example file names.
#'
#' @examples
#' # Leave the `example` parameter as missing for available example files.
#' read_domain_example()
#'
#' # Read the example Concomitant Medication domain.
#' read_domain_example("cm")
#'
#' # Read the example Adverse Events domain.
#' read_domain_example("ae")
#'
#' @seealso [domain_example()]
#' @export
read_domain_example <- function(example) {
if (missing(example)) {
return(domain_example())
} else {
admiraldev::assert_character_scalar(example)
}

path <- domain_example(example)
readr::read_rds(file = path)
}
14 changes: 14 additions & 0 deletions R/sbj_vars.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
#' Subject-level key variables
#'
#' [sbj_vars()] returns the set of variable names that uniquely define
#' a subject.
#'
#' @returns A character vector of variable names.
#'
#' @examples
#' sbj_vars()
#'
#' @export
sbj_vars <- function() {
c("STUDYID", "USUBJID")
}
11 changes: 11 additions & 0 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,16 @@ reference:
contents:
- assign
- harcode
- derive_seq
- derive_study_day
- assign_datetime

- title: SDTM examples
desc: SDTM domain file examples
contents:
- domain_example
- read_domain_example

- title: Controlled terminology
contents:
- read_ct_spec
Expand All @@ -30,6 +37,10 @@ reference:
- dtc_formats
- problems

- title: Utils
contents:
- sbj_vars

- title: Package global state
contents:
- clear_cache
32 changes: 32 additions & 0 deletions data-raw/sdtm_domain_examples.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Title: SDTM domain example datasets.

library(pharmaversesdtm)
library(readr)
library(here)

path <- here::here("inst/domain")

vs <- tibble::tribble(
~STUDYID, ~DOMAIN, ~USUBJID, ~VSSPID, ~VSTESTCD, ~VSDTC, ~VSTPTNUM,
"ABC123", "VS", "ABC123-375", "/F:VTLS1-D:9795532-R:2", "DIABP", "2020-09-01T13:31", NA,
"ABC123", "VS", "ABC123-375", "/F:VTLS1-D:9795532-R:2", "TEMP", "2020-09-01T13:31", NA,
"ABC123", "VS", "ABC123-375", "/F:VTLS2-D:9795533-R:2", "DIABP", "2020-09-28T11:00", 2L,
"ABC123", "VS", "ABC123-375", "/F:VTLS2-D:9795533-R:2", "TEMP", "2020-09-28T11:00", 2L,
"ABC123", "VS", "ABC123-376", "/F:VTLS1-D:9795591-R:1", "DIABP", "2020-09-20", NA,
"ABC123", "VS", "ABC123-376", "/F:VTLS1-D:9795591-R:1", "TEMP", "2020-09-20", NA
)

apsc <- tibble::tribble(
~STUDYID, ~RSUBJID, ~SCTESTCD, ~DOMAIN, ~SREL, ~SCCAT,
"ABC123", "ABC123-210", "LVSBJIND", "APSC", "FRIEND", "CAREGIVERSTUDY",
"ABC123", "ABC123-210", "EDULEVEL", "APSC", "FRIEND", "CAREGIVERSTUDY",
"ABC123", "ABC123-210", "TMSPPT", "APSC", "FRIEND", "CAREGIVERSTUDY",
"ABC123", "ABC123-211", "CAREDUR", "APSC", "SIBLING", "CAREGIVERSTUDY",
"ABC123", "ABC123-211", "LVSBJIND", "APSC", "SIBLING", "CAREGIVERSTUDY",
"ABC123", "ABC123-212", "JOBCLAS", "APSC", "SPOUSE", "CAREGIVERSTUDY"
)

readr::write_rds(x = pharmaversesdtm::ae, file = file.path(path, "ae.rds"), compress = "xz")
readr::write_rds(x = pharmaversesdtm::cm, file = file.path(path, "cm.rds"), compress = "xz")
readr::write_rds(x = vs, file = file.path(path, "vs.rds"), compress = "xz")
readr::write_rds(x = apsc, file = file.path(path, "apsc.rds"), compress = "xz")
4 changes: 4 additions & 0 deletions inst/WORDLIST
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,7 @@ AE
AESTDY
CMSTDY
DM
Immunogenicity
Pharmacokinetics
iRECIST
pharmaversesdtm
Binary file added inst/domain/ae.rds
Binary file not shown.
Binary file added inst/domain/apsc.rds
Binary file not shown.
Binary file added inst/domain/cm.rds
Binary file not shown.
Binary file added inst/domain/vs.rds
Binary file not shown.
Loading

0 comments on commit e1aa479

Please sign in to comment.