Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

if_then_else support via "conditioned" data frames #55

Merged
merged 13 commits into from
Jun 18, 2024

Conversation

ramiromagno
Copy link
Collaborator

@ramiromagno ramiromagno commented May 24, 2024

  • Adds a new S3 class (cnd_df) for represented conditioned data frames, i.e. data frames that carry metadata about what records should be used for derivations

  • Adds support for basic pretty printing of cnd_df objects

  • Adds a user-facing function for creating such cnd_df objects: condition_by

  • Adds experimental "mutate"-version function for these conditioned data frames: derive_by_condition()

Thank you for your Pull Request! We have developed this task checklist from the
Development Process
Guide

to help with the final steps of the process. Completing the below tasks helps to
ensure our reviewers can maximize their time on your code as well as making sure
the oak codebase remains robust and consistent.

Please check off each taskbox as an acknowledgment that you completed the task
or check off that it is not relevant to your Pull Request. This checklist is
part of the Github Action workflows and the Pull Request will not be merged into
the devel branch until you have checked off each task.

  • Place Closes #<insert_issue_number> into the beginning of your Pull z
    Request Title (Use Edit button in top-right if you need to update)
  • Code is formatted according to the
    tidyverse style guide. Run
    styler::style_file() to style R and Rmd files
  • Updated relevant unit tests or have written new unit tests, which should
    consider realistic data scenarios and edge cases, e.g. empty datasets, errors,
    boundary cases etc. - See
    Unit Test Guide
  • If you removed/replaced any function and/or function parameters, did you
    fully follow the
    deprecation guidance?
  • Update to all relevant roxygen headers and examples, including keywords
    and families. Refer to the
    categorization of functions to tag appropriate keyword/family.
  • Run devtools::document() so all .Rd files in the man folder and the
    NAMESPACE file in the project root are updated appropriately
  • Address any updates needed for vignettes and/or templates
  • Update NEWS.md if the changes pertain to a user-facing function (i.e. it
    has an @export tag) or documentation aimed at users (rather than developers)
  • Build oak site pkgdown::build_site() and check that all affected
    examples are displayed correctly and that all new functions occur on the "Reference" page.
  • Address or fix all lintr warnings and errors - lintr::lint_package()
  • Run R CMD check locally and address all errors and warnings - devtools::check()
  • Link the issue in the Development Section on the right hand side.
  • Address all merge conflicts and resolve appropriately
  • Pat yourself on the back for a job well done! Much love to your accomplishment!

- Adds a new S3 class (cnd_df) for represented conditioned data frames, i.e. data frames that carry metadata about what records should be used for derivations

- Adds support for basic pretty printing of cnd_df objects

- Adds a user-facing function for creating such cnd_df objects: `condition_by`

- Adds experimental "mutate"-version function for these conditioned data frames: `derive_by_condition()`
@ramiromagno ramiromagno self-assigned this May 24, 2024
@ramiromagno ramiromagno linked an issue May 24, 2024 that may be closed by this pull request
Copy link

github-actions bot commented May 24, 2024

Code Coverage

Package Line Rate Health
sdtm.oak 88%
Summary 88% (736 / 836)

@ramiromagno
Copy link
Collaborator Author

Still work in progress.

- Joins by raw and target data sets are now aware of conditioned tibbles
- Transformation functions, namely `assign_datetime()`, `hardcode*()` and `assign*` are also conditioned-tibble aware
- Unit test coverage for most cases indicated at #54

I believe the essential components are here to support the if_then_else algorithm via conditioned tibbles. Now, further testing, assertions and documentation is needed.
@ramiromagno ramiromagno changed the title Basic support for "conditioned" data frames if_then_else support via "conditioned" data frames May 29, 2024
@rammprasad
Copy link
Collaborator

@ramiromagno -

A couple of items

  1. the condition_by is not working when applied at the target variable
  2. Also, can you show me how to apply condition_by when involving both source and target?

Here is the sample code.

study_ct <- read.csv(system.file("cm_domain/cm_sdtm_oak_ct.csv",
                                 package = "sdtm.oak"))

# Read in raw data

cm_raw <-  read.csv(system.file("cm_domain/cm_raw_data.csv",
                                package = "sdtm.oak")) %>%
  #derive oak_id as the row number
  dplyr::mutate(oak_id = structure(seq_len(nrow(.))),
                patient_number = PATNUM,
                raw_source = "ConMed") %>%  
  dplyr::select(oak_id_vars(), dplyr::everything())

# Create CM domain. The first step in creating CM domain is to create the topic variable

cm <-
  # Derive topic variable
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "MDRAW",
    tgt_var = "CMTRT"
  )  |>
  # Derive CMGRPID when CMTRT == "BABY ASPIRIN"
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "MDNUM",
    tgt_var = "CMGRPID",
    #use condition_by function and fiter when CMTRT is "BABY ASPIRIN"
   ### This condition is not working. It results in an error
    tgt_dat = condition_by(.,CMTRT == "BABY ASPIRIN"),
    id_vars = oak_id_vars()
  )

**Error message**
Error in assign_no_ct(assign_no_ct(raw_dat = cm_raw, raw_var = "MDRAW",  : 
  unused argument (assign_no_ct(raw_dat = cm_raw, raw_var = "MDRAW", tgt_var = "CMTRT"))

Can you give give an example on how to program this mapping?
  # Derive qualifier CMMODIFY -  If collected value in CMMODIFY
  # in cm_raw is different to CM domain CMTRT target variable then
  # assign the collected value to CMMODIFY in CM domain (CM.CMMODIFY)

@ramiromagno
Copy link
Collaborator Author

ramiromagno commented Jun 7, 2024

@ramiromagno -

A couple of items

1. the condition_by is not working when applied at the target variable

2. Also, can you show me how to apply condition_by when involving both source and target?

Here is the sample code.

study_ct <- read.csv(system.file("cm_domain/cm_sdtm_oak_ct.csv",
                                 package = "sdtm.oak"))

# Read in raw data

cm_raw <-  read.csv(system.file("cm_domain/cm_raw_data.csv",
                                package = "sdtm.oak")) %>%
  #derive oak_id as the row number
  dplyr::mutate(oak_id = structure(seq_len(nrow(.))),
                patient_number = PATNUM,
                raw_source = "ConMed") %>%  
  dplyr::select(oak_id_vars(), dplyr::everything())

# Create CM domain. The first step in creating CM domain is to create the topic variable

cm <-
  # Derive topic variable
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "MDRAW",
    tgt_var = "CMTRT"
  )  |>
  # Derive CMGRPID when CMTRT == "BABY ASPIRIN"
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "MDNUM",
    tgt_var = "CMGRPID",
    #use condition_by function and fiter when CMTRT is "BABY ASPIRIN"
   ### This condition is not working. It results in an error
    tgt_dat = condition_by(.,CMTRT == "BABY ASPIRIN"),
    id_vars = oak_id_vars()
  )

**Error message**
Error in assign_no_ct(assign_no_ct(raw_dat = cm_raw, raw_var = "MDRAW",  : 
  unused argument (assign_no_ct(raw_dat = cm_raw, raw_var = "MDRAW", tgt_var = "CMTRT"))

Can you give give an example on how to program this mapping?
  # Derive qualifier CMMODIFY -  If collected value in CMMODIFY
  # in cm_raw is different to CM domain CMTRT target variable then
  # assign the collected value to CMMODIFY in CM domain (CM.CMMODIFY)

Yeah, there are a couple of things at play here:

  1. The placeholder . is for use with magrittr's pipe %>% and _ for use with R native pipe |>.
  2. The usage of the placeholder in nested calls requires the use of braces, see https://magrittr.tidyverse.org/reference/pipe.html#using-the-dot-for-secondary-purposes. Not sure we want to surface this to the user.

So best approach might be that if we want to condition on the target data set, the one that is being passed along, we should perhaps move the condition_by() call one level up.

Here is a set of examples that hopefully illustrate the different variations:

library(sdtm.oak)
library(magrittr)
library(dplyr, warn.conflicts = FALSE)

study_ct <- read.csv(system.file("cm_domain/cm_sdtm_oak_ct.csv", package = "sdtm.oak"))
cm_raw <-  read.csv(system.file("cm_domain/cm_raw_data.csv", package = "sdtm.oak")) %>%
  #derive oak_id as the row number
  dplyr::mutate(
    oak_id = structure(seq_len(nrow(.))),
    patient_number = PATNUM,
    raw_source = "ConMed"
  ) %>%
  dplyr::select(sdtm.oak:::oak_id_vars(), dplyr::everything())

# Derive topic variable
assign_no_ct(raw_dat = cm_raw,
             raw_var = "MDRAW",
             tgt_var = "CMTRT")  %>%
  # Derive CMGRPID when CMTRT == "BABY ASPIRIN"
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "MDNUM",
    tgt_var = "CMGRPID",
    tgt_dat = condition_by(., CMTRT == "BABY ASPIRIN")
  )
#> Error in `admiraldev::assert_character_vector()` at sdtm.oak/R/assign.R:189:3:
#> ! `id_vars` must be a character vector but is a data frame


# Derive topic variable
assign_no_ct(raw_dat = cm_raw,
             raw_var = "MDRAW",
             tgt_var = "CMTRT")  |>
  condition_by(CMTRT == "BABY ASPIRIN") |>
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "MDNUM",
    tgt_var = "CMGRPID",
    tgt_dat = _
  )
#> # A tibble: 14 × 5
#>    oak_id raw_source patient_number CMTRT                         CMGRPID
#>     <int> <chr>               <int> <chr>                           <int>
#>  1      1 ConMed                375 BABY ASPIRIN                        1
#>  2      2 ConMed                375 CORTISPORIN                        NA
#>  3      3 ConMed                376 ASPIRIN                            NA
#>  4      4 ConMed                377 DIPHENHYDRAMINE HCL                NA
#>  5      5 ConMed                377 PARCETEMOL                         NA
#>  6      6 ConMed                377 VOMIKIND                           NA
#>  7      7 ConMed                377 ZENFLOX OZ                         NA
#>  8      8 ConMed                378 AMITRYPTYLINE                      NA
#>  9      9 ConMed                378 BENADRYL                           NA
#> 10     10 ConMed                378 DIPHENHYDRAMINE HYDROCHLORIDE      NA
#> 11     11 ConMed                378 TETRACYCLINE                       NA
#> 12     12 ConMed                379 BENADRYL                           NA
#> 13     13 ConMed                379 SOMINEX                            NA
#> 14     14 ConMed                379 ZQUILL                             NA


# Derive topic variable
assign_no_ct(raw_dat = cm_raw,
             raw_var = "MDRAW",
             tgt_var = "CMTRT")  |>
  condition_by(CMTRT == "BABY ASPIRIN") %>%
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "MDNUM",
    tgt_var = "CMGRPID",
    tgt_dat = .
  )
#> # A tibble: 14 × 5
#>    oak_id raw_source patient_number CMTRT                         CMGRPID
#>     <int> <chr>               <int> <chr>                           <int>
#>  1      1 ConMed                375 BABY ASPIRIN                        1
#>  2      2 ConMed                375 CORTISPORIN                        NA
#>  3      3 ConMed                376 ASPIRIN                            NA
#>  4      4 ConMed                377 DIPHENHYDRAMINE HCL                NA
#>  5      5 ConMed                377 PARCETEMOL                         NA
#>  6      6 ConMed                377 VOMIKIND                           NA
#>  7      7 ConMed                377 ZENFLOX OZ                         NA
#>  8      8 ConMed                378 AMITRYPTYLINE                      NA
#>  9      9 ConMed                378 BENADRYL                           NA
#> 10     10 ConMed                378 DIPHENHYDRAMINE HYDROCHLORIDE      NA
#> 11     11 ConMed                378 TETRACYCLINE                       NA
#> 12     12 ConMed                379 BENADRYL                           NA
#> 13     13 ConMed                379 SOMINEX                            NA
#> 14     14 ConMed                379 ZQUILL                             NA


# Derive topic variable
assign_no_ct(raw_dat = cm_raw,
             raw_var = "MDRAW",
             tgt_var = "CMTRT")  %>%
  {
    assign_no_ct(
      raw_dat = cm_raw,
      raw_var = "MDNUM",
      tgt_var = "CMGRPID",
      tgt_dat = condition_by(dat = ., CMTRT == "BABY ASPIRIN")
    )
  }
#> # A tibble: 14 × 5
#>    oak_id raw_source patient_number CMTRT                         CMGRPID
#>     <int> <chr>               <int> <chr>                           <int>
#>  1      1 ConMed                375 BABY ASPIRIN                        1
#>  2      2 ConMed                375 CORTISPORIN                        NA
#>  3      3 ConMed                376 ASPIRIN                            NA
#>  4      4 ConMed                377 DIPHENHYDRAMINE HCL                NA
#>  5      5 ConMed                377 PARCETEMOL                         NA
#>  6      6 ConMed                377 VOMIKIND                           NA
#>  7      7 ConMed                377 ZENFLOX OZ                         NA
#>  8      8 ConMed                378 AMITRYPTYLINE                      NA
#>  9      9 ConMed                378 BENADRYL                           NA
#> 10     10 ConMed                378 DIPHENHYDRAMINE HYDROCHLORIDE      NA
#> 11     11 ConMed                378 TETRACYCLINE                       NA
#> 12     12 ConMed                379 BENADRYL                           NA
#> 13     13 ConMed                379 SOMINEX                            NA
#> 14     14 ConMed                379 ZQUILL                             NA


# Derive topic variable
assign_no_ct(raw_dat = cm_raw,
             raw_var = "MDRAW",
             tgt_var = "CMTRT")  %>%
  {
    assign_no_ct(
      raw_dat = cm_raw,
      raw_var = "MDNUM",
      tgt_var = "CMGRPID",
      tgt_dat = condition_by(., CMTRT == "BABY ASPIRIN")
    )
  }
#> # A tibble: 14 × 5
#>    oak_id raw_source patient_number CMTRT                         CMGRPID
#>     <int> <chr>               <int> <chr>                           <int>
#>  1      1 ConMed                375 BABY ASPIRIN                        1
#>  2      2 ConMed                375 CORTISPORIN                        NA
#>  3      3 ConMed                376 ASPIRIN                            NA
#>  4      4 ConMed                377 DIPHENHYDRAMINE HCL                NA
#>  5      5 ConMed                377 PARCETEMOL                         NA
#>  6      6 ConMed                377 VOMIKIND                           NA
#>  7      7 ConMed                377 ZENFLOX OZ                         NA
#>  8      8 ConMed                378 AMITRYPTYLINE                      NA
#>  9      9 ConMed                378 BENADRYL                           NA
#> 10     10 ConMed                378 DIPHENHYDRAMINE HYDROCHLORIDE      NA
#> 11     11 ConMed                378 TETRACYCLINE                       NA
#> 12     12 ConMed                379 BENADRYL                           NA
#> 13     13 ConMed                379 SOMINEX                            NA
#> 14     14 ConMed                379 ZQUILL                             NA

Question 2

Regarding question 2 see: tests/testthat/test-assign.R towards the end.

@rammprasad
Copy link
Collaborator

Thank you, @ramiromagno. I got it to work. Having it inline will make sense for the users. We will just put both options out there.

cm <-
  # Derive topic variable
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "MDRAW",
    tgt_var = "CMTRT"
  )  %>%
  # Derive CMGRPID
  {
    assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "MDNUM",
    tgt_var = "CMGRPID",
    #use condition_by function and fiter when CMTRT is "BABY ASPIRIN"
    tgt_dat = condition_by(dat =., CMTRT == "BABY ASPIRIN"),
    id_vars = oak_id_vars()
  )
    } %>%
  {
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "CMMODIFY",
    tgt_var = "CMMODIFY",
    tgt_dat =  condition_by(. , CMMODIFY != CMTRT, .env = cm_raw),
    id_vars = oak_id_vars()
  )
  }

A couple of follow-up questions

  1. We need to be explain in the documentation why we need {} when we use the condition_by function.
  2. Can we rename .env parameter to something else more meaningful? We are comparing two datasets in this case, so can we rename it as .dat2
  {
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "CMMODIFY",
    tgt_var = "CMMODIFY",
    tgt_dat =  condition_by( dat = . , CMMODIFY != CMTRT,  dat2 = cm_raw),
    id_vars = oak_id_vars()
  )
  }
#option 2
  {
  assign_no_ct(
    raw_dat = condition_by(dat = cm_raw , CMMODIFY != CMTRT, dat2= .),
    raw_var = "CMMODIFY",
    tgt_var = "CMMODIFY",
    tgt_dat =  .,
    id_vars = oak_id_vars()
  )
  }

@ramiromagno
Copy link
Collaborator Author

ramiromagno commented Jun 8, 2024

Hi @rammprasad:

  1. The reason why we need braces {...} is explained here: https://magrittr.tidyverse.org/reference/pipe.html#using-the-dot-for-secondary-purposes. We could get rid of this requirement if tgt_dat is the first parameter, and I think we should, as it would simplify the overall piping syntax, making it easier on the user.

  2. Renaming .env to .dat2 (I'm adding the dot) is fine but .env is more aligned with what is customary in the tidyverse ecosystem for scopes where to look for variables, be it an actual environment (env), or a data frame, tibble or simply a list.

Copy link
Collaborator

@rammprasad rammprasad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greate Job, @ramiromagno . I have added my comments. We can additional test cases and examples.
Also, it will be good to change the way we compare two datasets, by adding an extra argument as suggested.

R/cnd_df.R Show resolved Hide resolved
R/cnd_df.R Show resolved Hide resolved
R/sdtm_join.R Show resolved Hide resolved
R/cnd_df.R Outdated Show resolved Hide resolved
@ramiromagno
Copy link
Collaborator Author

ramiromagno commented Jun 10, 2024

Greate Job, @ramiromagno . I have added my comments. We can additional test cases and examples. Also, it will be good to change the way we compare two datasets, by adding an extra argument as suggested.

Thanks @rammprasad. I am not sure what was your take on relocating tgt_dat and make it a first argument. If you agree, then perhaps moving also tgt_var and make it the second argument would also make sense I reckon.

@ramiromagno
Copy link
Collaborator Author

I thought we could circumvent the need for braces when using magrittr's pipe placeholder in nested calls if we moved tgt_dat to being the first argument but seemingly we cannot.

So I think we are left only with three options:

  1. Move the condition_add() one level up and have it in the chain, i.e. not inline within the assign_no_ct()
  2. Add braces around the assign_no_ct() call and use the placeholder where needed
  3. Study the possibility of having another pipe operator whose behavior is equivalent to %>% {...}, but with the advantage of not needing the braces. I asked a question about this here: Question: defining a pipe operator with equivalent behavior to %>% {...} tidyverse/magrittr#272.

In my opinion, the simplest and easiest for the user is option 1, i.e. moving condition_add() to an earlier position in the chain of commands.

library(sdtm.oak)
library(magrittr)
library(dplyr, warn.conflicts = FALSE)

study_ct <- read.csv(system.file("cm_domain/cm_sdtm_oak_ct.csv", package = "sdtm.oak"))
cm_raw <-  read.csv(system.file("cm_domain/cm_raw_data.csv", package = "sdtm.oak")) %>%
  #derive oak_id as the row number
  dplyr::mutate(
    oak_id = structure(seq_len(nrow(.))),
    patient_number = PATNUM,
    raw_source = "ConMed"
  ) %>%
  dplyr::select(oak_id_vars(), dplyr::everything())

# DOES NOT WORK (native pipe)
# assign_no_ct(raw_dat = cm_raw,
#              raw_var = "MDRAW",
#              tgt_var = "CMTRT")  |>
#   assign_no_ct(
#     tgt_dat = condition_add(_, CMTRT == "BABY ASPIRIN"),
#     tgt_var = "CMGRPID",
#     raw_dat = cm_raw,
#     raw_var = "MDNUM"
#   )

# DOES NOT WORK EITHER (magrittr's pipe)
assign_no_ct(raw_dat = cm_raw,
             raw_var = "MDRAW",
             tgt_var = "CMTRT")  %>%
  assign_no_ct(
    tgt_dat = condition_add(., CMTRT == "BABY ASPIRIN"),
    tgt_var = "CMGRPID",
    raw_dat = cm_raw,
    raw_var = "MDNUM"
  )
#> Error in `admiraldev::assert_character_vector()` at sdtm.oak/R/assign.R:191:3:
#> ! `id_vars` must be a character vector but is a data frame

# WORKS (native pipe)
assign_no_ct(raw_dat = cm_raw,
             raw_var = "MDRAW",
             tgt_var = "CMTRT")  |>
  condition_add(CMTRT == "BABY ASPIRIN") |>
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "MDNUM",
    tgt_var = "CMGRPID"
  )
#> # A tibble: 14 × 5
#>    oak_id raw_source patient_number CMTRT                         CMGRPID
#>     <int> <chr>               <int> <chr>                           <int>
#>  1      1 ConMed                375 BABY ASPIRIN                        1
#>  2      2 ConMed                375 CORTISPORIN                        NA
#>  3      3 ConMed                376 ASPIRIN                            NA
#>  4      4 ConMed                377 DIPHENHYDRAMINE HCL                NA
#>  5      5 ConMed                377 PARCETEMOL                         NA
#>  6      6 ConMed                377 VOMIKIND                           NA
#>  7      7 ConMed                377 ZENFLOX OZ                         NA
#>  8      8 ConMed                378 AMITRYPTYLINE                      NA
#>  9      9 ConMed                378 BENADRYL                           NA
#> 10     10 ConMed                378 DIPHENHYDRAMINE HYDROCHLORIDE      NA
#> 11     11 ConMed                378 TETRACYCLINE                       NA
#> 12     12 ConMed                379 BENADRYL                           NA
#> 13     13 ConMed                379 SOMINEX                            NA
#> 14     14 ConMed                379 ZQUILL                             NA

# WORKS (maggritr's pipe)
assign_no_ct(raw_dat = cm_raw,
             raw_var = "MDRAW",
             tgt_var = "CMTRT")  %>%
  condition_add(CMTRT == "BABY ASPIRIN") |>
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "MDNUM",
    tgt_var = "CMGRPID"
  )
#> # A tibble: 14 × 5
#>    oak_id raw_source patient_number CMTRT                         CMGRPID
#>     <int> <chr>               <int> <chr>                           <int>
#>  1      1 ConMed                375 BABY ASPIRIN                        1
#>  2      2 ConMed                375 CORTISPORIN                        NA
#>  3      3 ConMed                376 ASPIRIN                            NA
#>  4      4 ConMed                377 DIPHENHYDRAMINE HCL                NA
#>  5      5 ConMed                377 PARCETEMOL                         NA
#>  6      6 ConMed                377 VOMIKIND                           NA
#>  7      7 ConMed                377 ZENFLOX OZ                         NA
#>  8      8 ConMed                378 AMITRYPTYLINE                      NA
#>  9      9 ConMed                378 BENADRYL                           NA
#> 10     10 ConMed                378 DIPHENHYDRAMINE HYDROCHLORIDE      NA
#> 11     11 ConMed                378 TETRACYCLINE                       NA
#> 12     12 ConMed                379 BENADRYL                           NA
#> 13     13 ConMed                379 SOMINEX                            NA
#> 14     14 ConMed                379 ZQUILL                             NA

# DOES NOT WORK (native pipe)
# assign_no_ct(raw_dat = cm_raw,
#              raw_var = "MDRAW",
#              tgt_var = "CMTRT")  |>
#   {
#     assign_no_ct(
#       tgt_dat = condition_add(_, CMTRT == "BABY ASPIRIN"),
#       raw_dat = cm_raw,
#       raw_var = "MDNUM",
#       tgt_var = "CMGRPID"
#     )
#   }

# WORKS (maggritr's pipe)
assign_no_ct(raw_dat = cm_raw,
             raw_var = "MDRAW",
             tgt_var = "CMTRT")  %>%
  {
  assign_no_ct(
    tgt_dat = condition_add(., CMTRT == "BABY ASPIRIN"),
    raw_dat = cm_raw,
    raw_var = "MDNUM",
    tgt_var = "CMGRPID"
  )
  }
#> # A tibble: 14 × 5
#>    oak_id raw_source patient_number CMTRT                         CMGRPID
#>     <int> <chr>               <int> <chr>                           <int>
#>  1      1 ConMed                375 BABY ASPIRIN                        1
#>  2      2 ConMed                375 CORTISPORIN                        NA
#>  3      3 ConMed                376 ASPIRIN                            NA
#>  4      4 ConMed                377 DIPHENHYDRAMINE HCL                NA
#>  5      5 ConMed                377 PARCETEMOL                         NA
#>  6      6 ConMed                377 VOMIKIND                           NA
#>  7      7 ConMed                377 ZENFLOX OZ                         NA
#>  8      8 ConMed                378 AMITRYPTYLINE                      NA
#>  9      9 ConMed                378 BENADRYL                           NA
#> 10     10 ConMed                378 DIPHENHYDRAMINE HYDROCHLORIDE      NA
#> 11     11 ConMed                378 TETRACYCLINE                       NA
#> 12     12 ConMed                379 BENADRYL                           NA
#> 13     13 ConMed                379 SOMINEX                            NA
#> 14     14 ConMed                379 ZQUILL                             NA

- Move `tgt_dat` to the first position in the argument list for cleaner command pipes.

- Rename `condition_by()` to `condition_add()`.

- Export `oak_id_vars()` for direct user access.

- Update tidyselections to align with the latest practices.
@rammprasad
Copy link
Collaborator

Thank you, @ramiromagno. Regarding the condition_add options, let's stick with options 1 and 2. We will add examples for both scenarios.

  1. Move the condition_add() one level up and have it in the chain, i.e. not inline within the assign_no_ct()
  2. Add braces around the assign_no_ct() call and use the placeholder where needed

@rammprasad
Copy link
Collaborator

Let me know once it is ready for the final review.

@ramiromagno
Copy link
Collaborator Author

Thank you, @ramiromagno. Regarding the condition_add options, let's stick with options 1 and 2. We will add examples for both scenarios.

1. Move the condition_add() one level up and have it in the chain, i.e. not inline within the assign_no_ct()

2. Add braces around the assign_no_ct() call and use the placeholder where needed

I know we decided to go with options 1 and 2. But given that Lionel sympathetically answered promptly, I am leaving it here for future record:

library(sdtm.oak)
library(magrittr)
library(dplyr, warn.conflicts = FALSE)

`%>>%` <- function(data, expr) {
  eval(substitute(expr), list(. = data), parent.frame())
}

study_ct <- read.csv(system.file("cm_domain/cm_sdtm_oak_ct.csv", package = "sdtm.oak"))
cm_raw <-  read.csv(system.file("cm_domain/cm_raw_data.csv", package = "sdtm.oak")) %>%
  #derive oak_id as the row number
  dplyr::mutate(
    oak_id = structure(seq_len(nrow(.))),
    patient_number = PATNUM,
    raw_source = "ConMed"
  ) %>%
  dplyr::select(oak_id_vars(), dplyr::everything())

assign_no_ct(raw_dat = cm_raw,
             raw_var = "MDRAW",
             tgt_var = "CMTRT")  %>>%
  assign_no_ct(
    tgt_dat = condition_add(., CMTRT == "BABY ASPIRIN"),
    tgt_var = "CMGRPID",
    raw_dat = cm_raw,
    raw_var = "MDNUM"
  )
#> # A tibble: 14 × 5
#>    oak_id raw_source patient_number CMTRT                         CMGRPID
#>     <int> <chr>               <int> <chr>                           <int>
#>  1      1 ConMed                375 BABY ASPIRIN                        1
#>  2      2 ConMed                375 CORTISPORIN                        NA
#>  3      3 ConMed                376 ASPIRIN                            NA
#>  4      4 ConMed                377 DIPHENHYDRAMINE HCL                NA
#>  5      5 ConMed                377 PARCETEMOL                         NA
#>  6      6 ConMed                377 VOMIKIND                           NA
#>  7      7 ConMed                377 ZENFLOX OZ                         NA
#>  8      8 ConMed                378 AMITRYPTYLINE                      NA
#>  9      9 ConMed                378 BENADRYL                           NA
#> 10     10 ConMed                378 DIPHENHYDRAMINE HYDROCHLORIDE      NA
#> 11     11 ConMed                378 TETRACYCLINE                       NA
#> 12     12 ConMed                379 BENADRYL                           NA
#> 13     13 ConMed                379 SOMINEX                            NA
#> 14     14 ConMed                379 ZQUILL                             NA

Created on 2024-06-12 with reprex v2.1.0

@rammprasad
Copy link
Collaborator

rammprasad commented Jun 13, 2024

Shall we add this as the third option? This looks sleeker than using {}

- Documentation
- Examples
- New article about cnd_df (WIP)
@ramiromagno
Copy link
Collaborator Author

@rammprasad and @edgar-manukyan:

I think we are pretty close to having the code for conditioned data frames near completion.

  • A new pipe operator has been introduced %.>% check the docs for the details. This should allow to create chains of commands with less clutter, namely the usage of braces (%>% {...} can now be replaced simply with %.>% ...).
  • I've added more documentation across functions and examples.
  • I also added quite significant number of unit tests to most functions, but not all.
  • A new vignette has been created introducing conditioned data frames. It is currently incomplete because it has only one usage example with condition_add(). We should add those cases Ram mentioned where conditioning involves either the raw data set, the target, both independently, and both interdependently. For the most complicated case, both interdependently I reckon we will need to resort to using the sdtm_join() function explicitly by the user.

Take a look and give me your feedback!

@edgar-manukyan
Copy link
Collaborator

edgar-manukyan commented Jun 17, 2024

Thanks so much @ramiromagno 🙏 I will start the review shortly since I believe @rammprasad is happy with the MR and will approve it shortly. Let's refrain from adding any new features in this MR and open a new issue/MR instead.

@edgar-manukyan
Copy link
Collaborator

edgar-manukyan commented Jun 17, 2024

Simply brilliant @ramiromagno, thank you so much 🙏 🙏 🙏 for all your time and effort. I am sure the SDTM community is going to appreciate this. I also feel that admiral might grab your idea of conditioned data frames data as well 😉

Huge thanks for the tests 💯 💯 💯

R/assertions.R Show resolved Hide resolved
@rammprasad
Copy link
Collaborator

It looks good to me. Lets merge this to main, and I can take care of the documentation updates.

@ramiromagno
Copy link
Collaborator Author

@rammprasad and @edgar-manukyan : please do not merge yet as I am now doing styling and linting fixes.

ramiromagno and others added 7 commits June 18, 2024 01:12
- No need for S3 methods to be exported
- `condition_add()` now links to the appropriate article about conditioned data frames
- Documentation tweaks
- Version bump, NEWS update and pkgdown reference list update
- Add example for `condition_add()`
- Re-export S3 methods for `cnd_df`
- Update pkgdown reference list
Merge branch 'main' into 0054-condition-by

# Conflicts:
#	NAMESPACE
#	NEWS.md
#	_pkgdown.yml
#	inst/WORDLIST
#	renv/profiles/4.4/renv.lock
@ramiromagno ramiromagno merged commit 13644bd into main Jun 18, 2024
1 check passed
@ramiromagno ramiromagno deleted the 0054-condition-by branch June 18, 2024 01:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

Feature Request: raw_filter and tgt_filter parameters
3 participants