From a8fde23ca9b60115e675782a56892fd762fc4500 Mon Sep 17 00:00:00 2001 From: Ram Ganapathy Date: Wed, 31 Jul 2024 10:45:33 -0700 Subject: [PATCH] CRAN release documentation update (#82) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Readme update * Documentation updates * Documentation updates * Fix link * whitelist words * Fix Slack URL in README * Whitelist word "GSK" * Make name of the package hyperlinked in pkgdown * Make VignetteIndexEntry agree with title * Add authors' hyperlinks to pkgdown * Fix inadvert double backticks * Updates * readme updates * Update _pkgdown.yml Co-authored-by: edgar-manukyan * Update .lycheeignore * Update .lycheeignore --------- Co-authored-by: Ramiro Magno Co-authored-by: Adam Foryś Co-authored-by: edgar-manukyan --- .lycheeignore | 1 + NEWS.md | 72 ++++++++-------- README.Rmd | 55 +++++++++++++ README.md | 98 ++++++++++++++++++++++ _pkgdown.yml | 10 +++ inst/WORDLIST | 7 ++ vignettes/study_sdtm_spec.Rmd | 149 +++++++++------------------------- 7 files changed, 248 insertions(+), 144 deletions(-) diff --git a/.lycheeignore b/.lycheeignore index 2727d327..220f2fda 100644 --- a/.lycheeignore +++ b/.lycheeignore @@ -8,3 +8,4 @@ roxygen2@7.3.1 roxygen2@7.3.2 styler@1.10.2 .*@users.noreply.github.com +https://www.linkedin.com/* diff --git a/NEWS.md b/NEWS.md index 0f9a8602..c6e3af6f 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,33 +1,39 @@ -# sdtm.oak 0.0.0.9005 (development version) - -* New function for creating conditioned data frames: `condition_add()`. -* New pipe operator: `%.>%` for explicit dot placeholder placement. -* `oak_id_vars()` is now an exported function. - -# sdtm.oak 0.0.0.9004 (development version) - -* New function: `derive_seq()` for deriving a sequence number variable. - -# sdtm.oak 0.0.0.9003 (development version) - -## New Features - -* New function: `assign_datetime()` for deriving an ISO8601 date-time variable. - -# sdtm.oak 0.0.0.9002 (development version) - -## New Features - -* New function: `derive_study_day()` for study day calculation. - -* New functions for basic SDTM derivations: ` assign_no_ct()`, `assign_ct()`, -`hardcode_no_ct()` and `hardcode_ct()`. - -* New functions for handling controlled terminologies: `read_ct_spec()`, -`read_ct_spec_example()`, `ct_spec_example()` and `ct_map()`. - -# sdtm.oak 0.0.0.9001 (development version) - -## New Features - -* New function `create_iso8601()` for conversion of vectors of dates, times or date-times to ISO8601 format. +# sdtm.oak V0.1.0 + +The V0.1.0 release of {sdtm.oak} users can create the majority of the SDTM domains. Domains that are *NOT* in scope for the V0.1.0 release are DM, Trial Design Domains, SV, SE, RELREC, Associated Person domains, and EPOCH Variable across all domains. + +- Functions for commonly used SDTM mapping Algorithms + - `assign_no_ct()` to process assign_no_ct algorithm + - `assign_ct()` to process assign_ct algorithm + - `hardcode_no_ct()` to process hardcode_no_ct algorithm + - `hardcode_ct()` to process hardcode_ct algorithm + - `assign_datetime()` to process assign_datetime algorithm + - `condition_add()` to process condition_add algorithm (if/else conditions) +- Functions for SDTM derived variables + - `derive_blfl()` to Derive Baseline Flag or Last Observation Before Exposure Flag + - `derive_seq()` to Derive the sequence number (--SEQ) variable + - `derive_study_day()` to Derive study day + - `create_iso8601()` for ISO8601 date, datetime conversion. +- Functions to support {sdtm.oak} + - ` generate_oak_id_vars()` to derive oak id variables + - `read_ct_spec()` to read the controlled terminology spec + - Functions to create conditioned dataframes to support if then else conditions in SDTM mappings +- Articles + - Algorithms + - Creating an Events SDTM domain + - Creating a Findings SDTM domain + - Conditioned Data Frames + - Converting dates, times or date-times to ISO 8601 + - Path to Automation + +## Further details on this Release + +- New function for creating conditioned data frames: `condition_add()`. +- New pipe operator: `%.>%` for explicit dot placeholder placement. +- `oak_id_vars()` is now an exported function. +- New function: `derive_seq()` for deriving a sequence number variable. +- New function: `assign_datetime()` for deriving an ISO8601 date-time variable. +- New function: `derive_study_day()` for study day calculation. +- New functions for basic SDTM derivations: `assign_no_ct()`, `assign_ct()`, `hardcode_no_ct()` and `hardcode_ct()`. +- New functions for handling controlled terminologies: `read_ct_spec()`, `read_ct_spec_example()`, `ct_spec_example()` and `ct_map()`. +- New function `create_iso8601()` for conversion of vectors of dates, times or date-times to ISO8601 format. diff --git a/README.Rmd b/README.Rmd index d24d7a47..7913631d 100644 --- a/README.Rmd +++ b/README.Rmd @@ -26,9 +26,64 @@ can automate SDTM creation based on the standard SDTM spec. ## Installation +The package is available from CRAN and can be installed with: + +```r +install.packages("sdtm.oak") +``` + You can install the development version of `{sdtm.oak}` from [GitHub](https://github.com/pharmaverse/sdtm.oak/) with: ``` r # install.packages("remotes") remotes::install_github("pharmaverse/sdtm.oak") ``` + +## Challenges with SDTM at the Industry Level + +* Raw Data Structure: Data from different EDC systems come in varying structures, with different variable names, dataset names, etc. + +* Varying Data Collection Standards: Despite the availability of CDASH, pharmaceutical companies still create different eCRFs using CDASH standards. + +Due to the differences in raw data structures and data collection standards, it may seem impossible to develop a common approach for programming SDTM datasets. + +## GOAL +`{sdtm.oak}` aims to address this issue by providing an EDC-agnostic, standards-agnostic solution. It is an open-source R package that offers a framework for the modular programming of SDTM in R. With future releases; it will also strive to automate the creation of SDTM datasets based on the metadata-driven approach using standard SDTM specifications. + +## Scope +Our goal is to use `{sdtm.oak}` to program most of the domains specified in SDTMIG (Study Data Tabulation Model Implementation Guide: Human Clinical Trials) and SDTMIG-AP (Study Data Tabulation Model Implementation Guide: Associated Persons). This R package is based on the core concept of `algorithms`, implemented as functions capable of carrying out the SDTM mappings for any domains listed in the CDISC SDTMIG and across different versions of SDTM IGs. The design of these functions allows users to specify a raw dataset and a variable name(s) as parameters, making it EDC (Electronic Data Capture) agnostic. As long as the raw dataset and variable name(s) exist, `{sdtm.oak}` will execute the SDTM mapping using the selected function. It's important to note that `{sdtm.oak}` may not handle sponsor-specific details related to managing metadata for LAB tests, unit conversions, and coding information, as many companies have unique business processes. With subsequent releases, strive to automate SDTM creation using a metadata-driven approach based on a standard SDTM specification format. + +## Road Map + +This Release: The V0.1.0 release of `{sdtm.oak}` users can create the majority of the SDTM domains. Domains that are NOT in scope for the V0.1.0 release are DM, Trial Design Domains, SV, SE, RELREC, Associated Person domains, and EPOCH Variable across all domains. + +Subsequent Releases: +We are planning to develop the below features in the subsequent releases.\ +- Functions required to derive reference date variables in the DM domain.\ +- Metadata driven automation based on the standardized SDTM specification.\ +- Functions required to program the EPOCH Variable.\ +- Functions to derive standard units and results based on metadata. + +## References and Documentation + +* Please go to [Algorithms](https://pharmaverse.github.io/sdtm.oak/articles/algorithms.html) article to learn about Algorithms. +* Please go to [Create Events Domain](https://pharmaverse.github.io/sdtm.oak/articles/events_domain.html) to learn about step by step process to create an Events domain. +* Please go to [Create Findings Domain](https://pharmaverse.github.io/sdtm.oak/articles/findings_domain.html) to learn about step by step process to create an Events domain. +* Please go to [Path to Automation](https://pharmaverse.github.io/sdtm.oak/articles/study_sdtm_spec.html) + to learn about how the foundational release sets up the stage for automation. + +## Feedback + +We ask users to follow the mentioned approach and try `{sdtm.oak}` to map any SDTM domains supported in this release. Users can also utilize the test data in the package to become familiar with the concepts before attempting on their own data. Please get in touch with us using one of the recommended approaches listed below: + +- [Slack](https://oakgarden.slack.com/) +- [GitHub](https://github.com/pharmaverse/sdtm.oak/issues) + +## Acknowledgments + +We thank the contributors and authors of the package. We also thank the CDISC COSA for sponsoring the `{sdtm.oak}`. Additionally, we would like to sincerely thank the volunteers from Roche, Pfizer, GSK, Vertex, and Merck for their valuable input as integral members of the CDISC COSA - OAK leadership team. + + + + + diff --git a/README.md b/README.md index f3e1c0c0..8d563fe2 100644 --- a/README.md +++ b/README.md @@ -17,6 +17,12 @@ standard SDTM spec. ## Installation +The package is available from CRAN and can be installed with: + +``` r +install.packages("sdtm.oak") +``` + You can install the development version of `{sdtm.oak}` from [GitHub](https://github.com/pharmaverse/sdtm.oak/) with: @@ -24,3 +30,95 @@ You can install the development version of `{sdtm.oak}` from # install.packages("remotes") remotes::install_github("pharmaverse/sdtm.oak") ``` + +## Challenges with SDTM at the Industry Level + +- Raw Data Structure: Data from different EDC systems come in varying + structures, with different variable names, dataset names, etc. + +- Varying Data Collection Standards: Despite the availability of CDASH, + pharmaceutical companies still create different eCRFs using CDASH + standards. + +Due to the differences in raw data structures and data collection +standards, it may seem impossible to develop a common approach for +programming SDTM datasets. + +## GOAL + +`{sdtm.oak}` aims to address this issue by providing an EDC-agnostic, +standards-agnostic solution. It is an open-source R package that offers +a framework for the modular programming of SDTM in R. With future +releases; it will also strive to automate the creation of SDTM datasets +based on the metadata-driven approach using standard SDTM +specifications. + +## Scope + +Our goal is to use `{sdtm.oak}` to program most of the domains specified +in SDTMIG (Study Data Tabulation Model Implementation Guide: Human +Clinical Trials) and SDTMIG-AP (Study Data Tabulation Model +Implementation Guide: Associated Persons). This R package is based on +the core concept of `algorithms`, implemented as functions capable of +carrying out the SDTM mappings for any domains listed in the CDISC +SDTMIG and across different versions of SDTM IGs. The design of these +functions allows users to specify a raw dataset and a variable name(s) +as parameters, making it EDC (Electronic Data Capture) agnostic. As long +as the raw dataset and variable name(s) exist, `{sdtm.oak}` will execute +the SDTM mapping using the selected function. It’s important to note +that `{sdtm.oak}` may not handle sponsor-specific details related to +managing metadata for LAB tests, unit conversions, and coding +information, as many companies have unique business processes. With +subsequent releases, strive to automate SDTM creation using a +metadata-driven approach based on a standard SDTM specification format. + +## Road Map + +This Release: The V0.1.0 release of `{sdtm.oak}` users can create the +majority of the SDTM domains. Domains that are NOT in scope for the +V0.1.0 release are DM, Trial Design Domains, SV, SE, RELREC, Associated +Person domains, and EPOCH Variable across all domains. + +Subsequent Releases: We are planning to develop the below features in +the subsequent releases. +- Functions required to derive reference date variables in the DM +domain. +- Metadata driven automation based on the standardized SDTM +specification. +- Functions required to program the EPOCH Variable. +- Functions to derive standard units and results based on metadata. + +## References and Documentation + +- Please go to + [Algorithms](https://pharmaverse.github.io/sdtm.oak/articles/algorithms.html) + article to learn about Algorithms. +- Please go to [Create Events + Domain](https://pharmaverse.github.io/sdtm.oak/articles/events_domain.html) + to learn about step by step process to create an Events domain. +- Please go to [Create Findings + Domain](https://pharmaverse.github.io/sdtm.oak/articles/findings_domain.html) + to learn about step by step process to create an Events domain. +- Please go to [Path to + Automation](https://pharmaverse.github.io/sdtm.oak/articles/study_sdtm_spec.html) + to learn about how the foundational release sets up the stage for + automation. + +## Feedback + +We ask users to follow the mentioned approach and try `{sdtm.oak}` to +map any SDTM domains supported in this release. Users can also utilize +the test data in the package to become familiar with the concepts before +attempting on their own data. Please get in touch with us using one of +the recommended approaches listed below: + +- [Slack](https://oakgarden.slack.com/) +- [GitHub](https://github.com/pharmaverse/sdtm.oak/issues) + +## Acknowledgments + +We thank the contributors and authors of the package. We also thank the +CDISC COSA for sponsoring the `{sdtm.oak}`. Additionally, we would like +to sincerely thank the volunteers from Roche, Pfizer, GSK, Vertex, and +Merck for their valuable input as integral members of the CDISC COSA - +OAK leadership team. diff --git a/_pkgdown.yml b/_pkgdown.yml index 3c6b239a..1bb0f092 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -57,3 +57,13 @@ reference: - title: Package global state contents: - clear_cache + +authors: + Ramiro Magno: + href: https://www.pattern.institute/team/rmagno/ + Pattern Institute: + href: https://www.pattern.institute + Edgar Manukyan: + href: https://www.linkedin.com/in/edgar-manukyan-20987927 + Shiyu Chen: + href: https://www.linkedin.com/in/shiyu-chen-55a55410a/ diff --git a/inst/WORDLIST b/inst/WORDLIST index bd48eb64..d95f17e7 100644 --- a/inst/WORDLIST +++ b/inst/WORDLIST @@ -65,3 +65,10 @@ RFXSTDTC TPT xxTPT APSC +CDASH +COSA +IGs +RELREC +SDTMIG +SV +GSK diff --git a/vignettes/study_sdtm_spec.Rmd b/vignettes/study_sdtm_spec.Rmd index 3f4715d2..c8423a1e 100644 --- a/vignettes/study_sdtm_spec.Rmd +++ b/vignettes/study_sdtm_spec.Rmd @@ -1,8 +1,8 @@ --- -title: "All about Metadata" +title: "Path to Automation" output: rmarkdown::html_vignette vignette: > - %\VignetteIndexEntry{All about Metadata} + %\VignetteIndexEntry{Path to Automation} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- @@ -25,8 +25,17 @@ knitr::opts_chunk$set( options(rmarkdown.html_vignette.check_title = FALSE) ``` -{sdtm.oak} is a metadata-driven solution that is designed to be Electronic Data -Capture (EDC) and standards agnostic. Throughout this article, the term "metadata" +The initial release of {sdtm.oak} provides a framework for modular programming of SDTM in R and sets the stage for potential automation of SDTM creation following the standardized SDTM specification. In the future, the automation workflow could involve preparing specifications and then making automated function calls to generate SDTM domains. + +The future workflow for automation could look like: + +1. Prepare SDTM specification: Users can define the raw data source, target SDTM domain, target SDTM variables, and algorithms used for automation. A template is still under development; details are also provided in this article. +2. Prepare SDTM-controlled Terminology: Users can define the SDTM-controlled terms applicable to the study. A template is still under development. +3. An automated process to read the specification and make {sdtm.oak} function calls can create the code required to generate SDTM datasets or the datasets themselves. + +This article provides an overview of metadata and a draft version of the standard SDTM specification. We plan to demonstrate the creation of standard SDTM specs from the CDISC library in collaboration with CDISC COSA. Sponsors may need to establish the necessary tools to generate this SDTM specification from their MDR to utilize the automation features of {sdtm.oak}. It's worth mentioning that this concept draws inspiration from Roche's existing implementation of the SDTM automation process using OAK. I would like to inform you that further development is required for this concept. + +Throughout this article, the term "metadata" is used several times. In this context, "metadata" refers to the specific metadata used by {sdtm.oak}. This article aims to provide users with a more detailed understanding of the {sdtm.oak} metadata. @@ -96,7 +105,6 @@ definition <- data.frame( "raw_variable_ordinal", "raw_variable_type", "raw_data_format", - "raw_codelist", "study_specific", "annotation_ordinal", "mapping_is_dataset", @@ -115,28 +123,14 @@ definition <- data.frame( "sub_algorithm", "target_hardcoded_value", "target_term_value", - "target_term_code", - "condition_ordinal", - "condition_group_ordinal", - "condition_left_raw_dataset", - "condition_left_raw_variable", - "condition_left_sdtm_domain", - "condition_left_sdtm_variable", - "condition_operator", - "condition_right_text_value", - "condition_right_sdtm_domain", - "condition_right_sdtm_variable", - "condition_right_raw_dataset", - "condition_right_raw_variable", - "condition_next_logical_operator", + "condition_add_raw_dat", + "condition_add_tgt_dat", "merge_type", "merge_left", "merge_right", "merge_condition", "unduplicate_keys", - "groupby_keys", - "target_resource_dataset", - "target_resource_variable" + "groupby_keys" ), `Description_of_the_variable` = c( "Study Number", @@ -152,10 +146,6 @@ definition <- data.frame( ), "Type of the Raw Variable", "Data format of the raw variable", - paste( - "Dictionary name which is assigned to the ", - "eCRF field or a eDT variable" - ), paste( "`TRUE` indicates that the source is study specific. ", "`FALSE` indicates that the raw variable is part of data standards" @@ -189,40 +179,12 @@ definition <- data.frame( "hardcoded text" ), paste( - "NCI code or sponsor code of the hardcoded value" - ), - paste( - "Ordinal of a (sub)condition, increasing when there ", - "are more than one sub-conditions (e.g. X AND Y)" - ), - paste( - "Ordinal of a group of sub-conditions, used to ", - "disambiguate complex conditions such as (A AND B) OR C. ", - "The ordinal increases in each group and gives the final ", - "precedence of the logical operators." - ), - "Name of the raw dataset on the left part of the condition", - "Name of the raw variable on the left part of the condition", - "Name of the SDTM variable used in the left part of the condition.", - paste( - "Name of the SDTM domain of the variable that is used in ", - "the left part of the condition." - ), - "Operator between the left and right part of the condition", - paste( - "A text that applies to the right part of the condition as ", - "indicated per `condition_operator`." - ), - "Name of the SDTM variable used in the right part of the condition.", - paste( - "Name of the SDTM domain of the variable that is used ", - "in the right part of the condition." + "Condition that has to be applied at a raw dataset ", + " before applying a mapping. Can be a valid R filter statement." ), - "Name of the raw dataset on the right part of the condition", - "Name of the raw variable on the right part of the condition", paste( - "The logical operator that applies to the next ", - "sub-conditions, typically AND, OR" + "Condition that has to be applied at a target dataset ", + " before applying a mapping. Can be a valid R filter statement." ), "Specifies the type of join", "Specifies the left component of the merge", @@ -239,16 +201,6 @@ definition <- data.frame( paste( "Raw Variables or aggregation functions (i.e. earliest, ", "latest) to group source data records before mapping to SDTM" - ), - paste( - "Raw dataset name of the raw variable. This will be used when ", - " values are assigned from a from a different source", - "other than the source the mapping is associated with" - ), - paste( - "Raw variable name. This will be used when ", - "values are assigned from a from a different source", - "other than the source the mapping is associated with" ) ), Example_Values = c( @@ -262,7 +214,6 @@ definition <- data.frame( "1, 2, 3, etc", "Text Box,
Date control", "$200,
dd MON YYYY", - "SEX, ETHNIC", "TRUE, FALSE", "1, 2, 3, etc", "TRUE, FALSE", @@ -274,36 +225,27 @@ definition <- data.frame( "(AGEU)
ISO 8601
(SEX)", "1, 2, 3", "Derived,
Assigned,
Collected,
Predecessor", - "DATASET_LEVEL
ASSIGN_CT
AE_AEREL
HARDCODE_CT", - "ASSIGN_NO_CT
HARCODE_CT", + "condition_add
assign_ct
ae_aerel
hardcode_ct", + "assign_no_ct
hardcode_ct", "ALZHEIMER'S DISEASE HISTORY", "Y,
beats/min,
INFORMED CONSENT OBTAINED", - "C49488", - "1, 2, 3", - "1, 2, 3", - "VTLS1", - "POSITION", - "AE", - "AEENRTPT", paste( - "Checked
Not_checked
Is_null
Is_not_null", - "
Equals_to
Different_to
is_numeric
in", - "
not_in" + "Map qualifier CMSTRTPT Annotation text is If MDPRIOR == 1 ", + "then CM.CMSTRTPT = 'BEFORE'", + "raw_dat parameter as condition_add(cm_raw, MDPRIOR == 1)" + ), + paste( + "Map qualifier CMDOSFRQ Annotation text is If CMTRT is not null", + " then map the collected value in raw dataset cm_raw and", + "raw variable MDFRQ to CMDOSFRQ", + "tgt_dat parameter as condition_add(., !is.na(CMTRT))" ), - "('Not Recovered/Not Resolved','Recovering/Resolving')
HOSPITALIZATION", - "AE", - "AETERM", - "SMKHX", - "SUNAM", - "and, or", "left_join
right_join
full_join
visit_join
subject_join", "VTLS1", "VACREC", "VTLS1.SUBJECT = VACREC.SUBJECT,
MD1.MDNUM = VACREC.MDNUM", "VTLS1.SUBJECT,
VTLS1.DATAPAGEID", - "TXINF1.DATAPGID,
Earliest", - "AEDE", - "DATAPAGEID" + "TXINF1.DATAPGID,
Earliest" ), Association_with_mapping_Algorithms = c( "Generic Use", @@ -316,7 +258,6 @@ definition <- data.frame( "Generic Use", "Required for all mapping algorithms", "Required for all mapping algorithms", - "Required for all mapping algorithms", "Generic Use", "Required for all mapping algorithms", "Required for all mapping algorithms", @@ -329,31 +270,17 @@ definition <- data.frame( "Required for all mapping algorithms", "Used for define.xml", "Required for all mapping algorithms", - "Only when Mapping Algorithm is
IF_THEN_ELSE
DATASET_LEVEL", - "ASSIGN_NO_CT
HARDCODE_NO_CT", - "HARDCODE_CT", - "IF_THEN_ELSE", - "IF_THEN_ELSE", - "IF_THEN_ELSE", - "IF_THEN_ELSE", - "IF_THEN_ELSE", - "IF_THEN_ELSE", - "IF_THEN_ELSE", - "IF_THEN_ELSE", - "IF_THEN_ELSE", - "IF_THEN_ELSE", - "IF_THEN_ELSE", - "IF_THEN_ELSE", - "IF_THEN_ELSE", - "IF_THEN_ELSE", + "Only when Mapping Algorithm is
condition_add
dataset_level", + "assign_no_ct
hardcode_no_ct", + "harcode_ct", + "condition_add", + "condition_add", "MERGE", "MERGE", "MERGE", "MERGE", "REMOVE_DUP", - "GROUP_BY", - "ASSIGN_NO_CT", - "ASSIGN_NO_CT" + "GROUP_BY" ), stringsAsFactors = TRUE )