if_then_else support via "conditioned" data frames #55

ramiromagno · 2024-05-24T00:02:12Z

Adds a new S3 class (cnd_df) for represented conditioned data frames, i.e. data frames that carry metadata about what records should be used for derivations
Adds support for basic pretty printing of cnd_df objects
Adds a user-facing function for creating such cnd_df objects: condition_by
Adds experimental "mutate"-version function for these conditioned data frames: derive_by_condition()

Thank you for your Pull Request! We have developed this task checklist from the
Development Process
Guide
to help with the final steps of the process. Completing the below tasks helps to
ensure our reviewers can maximize their time on your code as well as making sure
the oak codebase remains robust and consistent.

Please check off each taskbox as an acknowledgment that you completed the task
or check off that it is not relevant to your Pull Request. This checklist is
part of the Github Action workflows and the Pull Request will not be merged into
the devel branch until you have checked off each task.

- Adds a new S3 class (cnd_df) for represented conditioned data frames, i.e. data frames that carry metadata about what records should be used for derivations - Adds support for basic pretty printing of cnd_df objects - Adds a user-facing function for creating such cnd_df objects: `condition_by` - Adds experimental "mutate"-version function for these conditioned data frames: `derive_by_condition()`

github-actions · 2024-05-24T00:07:08Z

Package	Line Rate	Health
sdtm.oak	88%	✔
Summary	88% (736 / 836)	✔

ramiromagno · 2024-05-24T00:11:29Z

Still work in progress.

- Joins by raw and target data sets are now aware of conditioned tibbles - Transformation functions, namely `assign_datetime()`, `hardcode*()` and `assign*` are also conditioned-tibble aware - Unit test coverage for most cases indicated at #54 I believe the essential components are here to support the if_then_else algorithm via conditioned tibbles. Now, further testing, assertions and documentation is needed.

rammprasad · 2024-06-07T06:26:49Z

@ramiromagno -

A couple of items

the condition_by is not working when applied at the target variable
Also, can you show me how to apply condition_by when involving both source and target?

Here is the sample code.

study_ct <- read.csv(system.file("cm_domain/cm_sdtm_oak_ct.csv",
                                 package = "sdtm.oak"))

# Read in raw data

cm_raw <-  read.csv(system.file("cm_domain/cm_raw_data.csv",
                                package = "sdtm.oak")) %>%
  #derive oak_id as the row number
  dplyr::mutate(oak_id = structure(seq_len(nrow(.))),
                patient_number = PATNUM,
                raw_source = "ConMed") %>%  
  dplyr::select(oak_id_vars(), dplyr::everything())

# Create CM domain. The first step in creating CM domain is to create the topic variable

cm <-
  # Derive topic variable
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "MDRAW",
    tgt_var = "CMTRT"
  )  |>
  # Derive CMGRPID when CMTRT == "BABY ASPIRIN"
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "MDNUM",
    tgt_var = "CMGRPID",
    #use condition_by function and fiter when CMTRT is "BABY ASPIRIN"
   ### This condition is not working. It results in an error
    tgt_dat = condition_by(.,CMTRT == "BABY ASPIRIN"),
    id_vars = oak_id_vars()
  )

**Error message**
Error in assign_no_ct(assign_no_ct(raw_dat = cm_raw, raw_var = "MDRAW",  : 
  unused argument (assign_no_ct(raw_dat = cm_raw, raw_var = "MDRAW", tgt_var = "CMTRT"))

Can you give give an example on how to program this mapping?
  # Derive qualifier CMMODIFY -  If collected value in CMMODIFY
  # in cm_raw is different to CM domain CMTRT target variable then
  # assign the collected value to CMMODIFY in CM domain (CM.CMMODIFY)

ramiromagno · 2024-06-07T11:56:55Z

@ramiromagno -

A couple of items

1. the condition_by is not working when applied at the target variable

2. Also, can you show me how to apply condition_by when involving both source and target?

Here is the sample code.

study_ct <- read.csv(system.file("cm_domain/cm_sdtm_oak_ct.csv",
                                 package = "sdtm.oak"))

# Read in raw data

cm_raw <-  read.csv(system.file("cm_domain/cm_raw_data.csv",
                                package = "sdtm.oak")) %>%
  #derive oak_id as the row number
  dplyr::mutate(oak_id = structure(seq_len(nrow(.))),
                patient_number = PATNUM,
                raw_source = "ConMed") %>%  
  dplyr::select(oak_id_vars(), dplyr::everything())

# Create CM domain. The first step in creating CM domain is to create the topic variable

cm <-
  # Derive topic variable
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "MDRAW",
    tgt_var = "CMTRT"
  )  |>
  # Derive CMGRPID when CMTRT == "BABY ASPIRIN"
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "MDNUM",
    tgt_var = "CMGRPID",
    #use condition_by function and fiter when CMTRT is "BABY ASPIRIN"
   ### This condition is not working. It results in an error
    tgt_dat = condition_by(.,CMTRT == "BABY ASPIRIN"),
    id_vars = oak_id_vars()
  )

**Error message**
Error in assign_no_ct(assign_no_ct(raw_dat = cm_raw, raw_var = "MDRAW",  : 
  unused argument (assign_no_ct(raw_dat = cm_raw, raw_var = "MDRAW", tgt_var = "CMTRT"))

Can you give give an example on how to program this mapping?
  # Derive qualifier CMMODIFY -  If collected value in CMMODIFY
  # in cm_raw is different to CM domain CMTRT target variable then
  # assign the collected value to CMMODIFY in CM domain (CM.CMMODIFY)

Yeah, there are a couple of things at play here:

The placeholder . is for use with magrittr's pipe %>% and _ for use with R native pipe |>.
The usage of the placeholder in nested calls requires the use of braces, see https://magrittr.tidyverse.org/reference/pipe.html#using-the-dot-for-secondary-purposes. Not sure we want to surface this to the user.

So best approach might be that if we want to condition on the target data set, the one that is being passed along, we should perhaps move the condition_by() call one level up.

Here is a set of examples that hopefully illustrate the different variations:

library(sdtm.oak)
library(magrittr)
library(dplyr, warn.conflicts = FALSE)

study_ct <- read.csv(system.file("cm_domain/cm_sdtm_oak_ct.csv", package = "sdtm.oak"))
cm_raw <-  read.csv(system.file("cm_domain/cm_raw_data.csv", package = "sdtm.oak")) %>%
  #derive oak_id as the row number
  dplyr::mutate(
    oak_id = structure(seq_len(nrow(.))),
    patient_number = PATNUM,
    raw_source = "ConMed"
  ) %>%
  dplyr::select(sdtm.oak:::oak_id_vars(), dplyr::everything())

# Derive topic variable
assign_no_ct(raw_dat = cm_raw,
             raw_var = "MDRAW",
             tgt_var = "CMTRT")  %>%
  # Derive CMGRPID when CMTRT == "BABY ASPIRIN"
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "MDNUM",
    tgt_var = "CMGRPID",
    tgt_dat = condition_by(., CMTRT == "BABY ASPIRIN")
  )
#> Error in `admiraldev::assert_character_vector()` at sdtm.oak/R/assign.R:189:3:
#> ! `id_vars` must be a character vector but is a data frame


# Derive topic variable
assign_no_ct(raw_dat = cm_raw,
             raw_var = "MDRAW",
             tgt_var = "CMTRT")  |>
  condition_by(CMTRT == "BABY ASPIRIN") |>
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "MDNUM",
    tgt_var = "CMGRPID",
    tgt_dat = _
  )
#> # A tibble: 14 × 5
#>    oak_id raw_source patient_number CMTRT                         CMGRPID
#>     <int> <chr>               <int> <chr>                           <int>
#>  1      1 ConMed                375 BABY ASPIRIN                        1
#>  2      2 ConMed                375 CORTISPORIN                        NA
#>  3      3 ConMed                376 ASPIRIN                            NA
#>  4      4 ConMed                377 DIPHENHYDRAMINE HCL                NA
#>  5      5 ConMed                377 PARCETEMOL                         NA
#>  6      6 ConMed                377 VOMIKIND                           NA
#>  7      7 ConMed                377 ZENFLOX OZ                         NA
#>  8      8 ConMed                378 AMITRYPTYLINE                      NA
#>  9      9 ConMed                378 BENADRYL                           NA
#> 10     10 ConMed                378 DIPHENHYDRAMINE HYDROCHLORIDE      NA
#> 11     11 ConMed                378 TETRACYCLINE                       NA
#> 12     12 ConMed                379 BENADRYL                           NA
#> 13     13 ConMed                379 SOMINEX                            NA
#> 14     14 ConMed                379 ZQUILL                             NA


# Derive topic variable
assign_no_ct(raw_dat = cm_raw,
             raw_var = "MDRAW",
             tgt_var = "CMTRT")  |>
  condition_by(CMTRT == "BABY ASPIRIN") %>%
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "MDNUM",
    tgt_var = "CMGRPID",
    tgt_dat = .
  )
#> # A tibble: 14 × 5
#>    oak_id raw_source patient_number CMTRT                         CMGRPID
#>     <int> <chr>               <int> <chr>                           <int>
#>  1      1 ConMed                375 BABY ASPIRIN                        1
#>  2      2 ConMed                375 CORTISPORIN                        NA
#>  3      3 ConMed                376 ASPIRIN                            NA
#>  4      4 ConMed                377 DIPHENHYDRAMINE HCL                NA
#>  5      5 ConMed                377 PARCETEMOL                         NA
#>  6      6 ConMed                377 VOMIKIND                           NA
#>  7      7 ConMed                377 ZENFLOX OZ                         NA
#>  8      8 ConMed                378 AMITRYPTYLINE                      NA
#>  9      9 ConMed                378 BENADRYL                           NA
#> 10     10 ConMed                378 DIPHENHYDRAMINE HYDROCHLORIDE      NA
#> 11     11 ConMed                378 TETRACYCLINE                       NA
#> 12     12 ConMed                379 BENADRYL                           NA
#> 13     13 ConMed                379 SOMINEX                            NA
#> 14     14 ConMed                379 ZQUILL                             NA


# Derive topic variable
assign_no_ct(raw_dat = cm_raw,
             raw_var = "MDRAW",
             tgt_var = "CMTRT")  %>%
  {
    assign_no_ct(
      raw_dat = cm_raw,
      raw_var = "MDNUM",
      tgt_var = "CMGRPID",
      tgt_dat = condition_by(dat = ., CMTRT == "BABY ASPIRIN")
    )
  }
#> # A tibble: 14 × 5
#>    oak_id raw_source patient_number CMTRT                         CMGRPID
#>     <int> <chr>               <int> <chr>                           <int>
#>  1      1 ConMed                375 BABY ASPIRIN                        1
#>  2      2 ConMed                375 CORTISPORIN                        NA
#>  3      3 ConMed                376 ASPIRIN                            NA
#>  4      4 ConMed                377 DIPHENHYDRAMINE HCL                NA
#>  5      5 ConMed                377 PARCETEMOL                         NA
#>  6      6 ConMed                377 VOMIKIND                           NA
#>  7      7 ConMed                377 ZENFLOX OZ                         NA
#>  8      8 ConMed                378 AMITRYPTYLINE                      NA
#>  9      9 ConMed                378 BENADRYL                           NA
#> 10     10 ConMed                378 DIPHENHYDRAMINE HYDROCHLORIDE      NA
#> 11     11 ConMed                378 TETRACYCLINE                       NA
#> 12     12 ConMed                379 BENADRYL                           NA
#> 13     13 ConMed                379 SOMINEX                            NA
#> 14     14 ConMed                379 ZQUILL                             NA


# Derive topic variable
assign_no_ct(raw_dat = cm_raw,
             raw_var = "MDRAW",
             tgt_var = "CMTRT")  %>%
  {
    assign_no_ct(
      raw_dat = cm_raw,
      raw_var = "MDNUM",
      tgt_var = "CMGRPID",
      tgt_dat = condition_by(., CMTRT == "BABY ASPIRIN")
    )
  }
#> # A tibble: 14 × 5
#>    oak_id raw_source patient_number CMTRT                         CMGRPID
#>     <int> <chr>               <int> <chr>                           <int>
#>  1      1 ConMed                375 BABY ASPIRIN                        1
#>  2      2 ConMed                375 CORTISPORIN                        NA
#>  3      3 ConMed                376 ASPIRIN                            NA
#>  4      4 ConMed                377 DIPHENHYDRAMINE HCL                NA
#>  5      5 ConMed                377 PARCETEMOL                         NA
#>  6      6 ConMed                377 VOMIKIND                           NA
#>  7      7 ConMed                377 ZENFLOX OZ                         NA
#>  8      8 ConMed                378 AMITRYPTYLINE                      NA
#>  9      9 ConMed                378 BENADRYL                           NA
#> 10     10 ConMed                378 DIPHENHYDRAMINE HYDROCHLORIDE      NA
#> 11     11 ConMed                378 TETRACYCLINE                       NA
#> 12     12 ConMed                379 BENADRYL                           NA
#> 13     13 ConMed                379 SOMINEX                            NA
#> 14     14 ConMed                379 ZQUILL                             NA

Question 2

Regarding question 2 see: tests/testthat/test-assign.R towards the end.

rammprasad · 2024-06-08T01:32:55Z

Thank you, @ramiromagno. I got it to work. Having it inline will make sense for the users. We will just put both options out there.

cm <-
  # Derive topic variable
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "MDRAW",
    tgt_var = "CMTRT"
  )  %>%
  # Derive CMGRPID
  {
    assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "MDNUM",
    tgt_var = "CMGRPID",
    #use condition_by function and fiter when CMTRT is "BABY ASPIRIN"
    tgt_dat = condition_by(dat =., CMTRT == "BABY ASPIRIN"),
    id_vars = oak_id_vars()
  )
    } %>%
  {
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "CMMODIFY",
    tgt_var = "CMMODIFY",
    tgt_dat =  condition_by(. , CMMODIFY != CMTRT, .env = cm_raw),
    id_vars = oak_id_vars()
  )
  }

A couple of follow-up questions

We need to be explain in the documentation why we need {} when we use the condition_by function.
Can we rename .env parameter to something else more meaningful? We are comparing two datasets in this case, so can we rename it as .dat2

  {
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "CMMODIFY",
    tgt_var = "CMMODIFY",
    tgt_dat =  condition_by( dat = . , CMMODIFY != CMTRT,  dat2 = cm_raw),
    id_vars = oak_id_vars()
  )
  }
#option 2
  {
  assign_no_ct(
    raw_dat = condition_by(dat = cm_raw , CMMODIFY != CMTRT, dat2= .),
    raw_var = "CMMODIFY",
    tgt_var = "CMMODIFY",
    tgt_dat =  .,
    id_vars = oak_id_vars()
  )
  }

ramiromagno · 2024-06-08T15:28:13Z

Hi @rammprasad:

The reason why we need braces {...} is explained here: https://magrittr.tidyverse.org/reference/pipe.html#using-the-dot-for-secondary-purposes. We could get rid of this requirement if tgt_dat is the first parameter, and I think we should, as it would simplify the overall piping syntax, making it easier on the user.
Renaming .env to .dat2 (I'm adding the dot) is fine but .env is more aligned with what is customary in the tidyverse ecosystem for scopes where to look for variables, be it an actual environment (env), or a data frame, tibble or simply a list.

rammprasad

Greate Job, @ramiromagno . I have added my comments. We can additional test cases and examples.
Also, it will be good to change the way we compare two datasets, by adding an extra argument as suggested.

R/cnd_df.R

R/sdtm_join.R

R/cnd_df.R

ramiromagno · 2024-06-10T22:51:34Z

Greate Job, @ramiromagno . I have added my comments. We can additional test cases and examples. Also, it will be good to change the way we compare two datasets, by adding an extra argument as suggested.

Thanks @rammprasad. I am not sure what was your take on relocating tgt_dat and make it a first argument. If you agree, then perhaps moving also tgt_var and make it the second argument would also make sense I reckon.

ramiromagno · 2024-06-12T15:14:27Z

I thought we could circumvent the need for braces when using magrittr's pipe placeholder in nested calls if we moved tgt_dat to being the first argument but seemingly we cannot.

So I think we are left only with three options:

Move the condition_add() one level up and have it in the chain, i.e. not inline within the assign_no_ct()
Add braces around the assign_no_ct() call and use the placeholder where needed
Study the possibility of having another pipe operator whose behavior is equivalent to %>% {...}, but with the advantage of not needing the braces. I asked a question about this here: Question: defining a pipe operator with equivalent behavior to %>% {...} tidyverse/magrittr#272.

In my opinion, the simplest and easiest for the user is option 1, i.e. moving condition_add() to an earlier position in the chain of commands.

library(sdtm.oak)
library(magrittr)
library(dplyr, warn.conflicts = FALSE)

study_ct <- read.csv(system.file("cm_domain/cm_sdtm_oak_ct.csv", package = "sdtm.oak"))
cm_raw <-  read.csv(system.file("cm_domain/cm_raw_data.csv", package = "sdtm.oak")) %>%
  #derive oak_id as the row number
  dplyr::mutate(
    oak_id = structure(seq_len(nrow(.))),
    patient_number = PATNUM,
    raw_source = "ConMed"
  ) %>%
  dplyr::select(oak_id_vars(), dplyr::everything())

# DOES NOT WORK (native pipe)
# assign_no_ct(raw_dat = cm_raw,
#              raw_var = "MDRAW",
#              tgt_var = "CMTRT")  |>
#   assign_no_ct(
#     tgt_dat = condition_add(_, CMTRT == "BABY ASPIRIN"),
#     tgt_var = "CMGRPID",
#     raw_dat = cm_raw,
#     raw_var = "MDNUM"
#   )

# DOES NOT WORK EITHER (magrittr's pipe)
assign_no_ct(raw_dat = cm_raw,
             raw_var = "MDRAW",
             tgt_var = "CMTRT")  %>%
  assign_no_ct(
    tgt_dat = condition_add(., CMTRT == "BABY ASPIRIN"),
    tgt_var = "CMGRPID",
    raw_dat = cm_raw,
    raw_var = "MDNUM"
  )
#> Error in `admiraldev::assert_character_vector()` at sdtm.oak/R/assign.R:191:3:
#> ! `id_vars` must be a character vector but is a data frame

# WORKS (native pipe)
assign_no_ct(raw_dat = cm_raw,
             raw_var = "MDRAW",
             tgt_var = "CMTRT")  |>
  condition_add(CMTRT == "BABY ASPIRIN") |>
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "MDNUM",
    tgt_var = "CMGRPID"
  )
#> # A tibble: 14 × 5
#>    oak_id raw_source patient_number CMTRT                         CMGRPID
#>     <int> <chr>               <int> <chr>                           <int>
#>  1      1 ConMed                375 BABY ASPIRIN                        1
#>  2      2 ConMed                375 CORTISPORIN                        NA
#>  3      3 ConMed                376 ASPIRIN                            NA
#>  4      4 ConMed                377 DIPHENHYDRAMINE HCL                NA
#>  5      5 ConMed                377 PARCETEMOL                         NA
#>  6      6 ConMed                377 VOMIKIND                           NA
#>  7      7 ConMed                377 ZENFLOX OZ                         NA
#>  8      8 ConMed                378 AMITRYPTYLINE                      NA
#>  9      9 ConMed                378 BENADRYL                           NA
#> 10     10 ConMed                378 DIPHENHYDRAMINE HYDROCHLORIDE      NA
#> 11     11 ConMed                378 TETRACYCLINE                       NA
#> 12     12 ConMed                379 BENADRYL                           NA
#> 13     13 ConMed                379 SOMINEX                            NA
#> 14     14 ConMed                379 ZQUILL                             NA

# WORKS (maggritr's pipe)
assign_no_ct(raw_dat = cm_raw,
             raw_var = "MDRAW",
             tgt_var = "CMTRT")  %>%
  condition_add(CMTRT == "BABY ASPIRIN") |>
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "MDNUM",
    tgt_var = "CMGRPID"
  )
#> # A tibble: 14 × 5
#>    oak_id raw_source patient_number CMTRT                         CMGRPID
#>     <int> <chr>               <int> <chr>                           <int>
#>  1      1 ConMed                375 BABY ASPIRIN                        1
#>  2      2 ConMed                375 CORTISPORIN                        NA
#>  3      3 ConMed                376 ASPIRIN                            NA
#>  4      4 ConMed                377 DIPHENHYDRAMINE HCL                NA
#>  5      5 ConMed                377 PARCETEMOL                         NA
#>  6      6 ConMed                377 VOMIKIND                           NA
#>  7      7 ConMed                377 ZENFLOX OZ                         NA
#>  8      8 ConMed                378 AMITRYPTYLINE                      NA
#>  9      9 ConMed                378 BENADRYL                           NA
#> 10     10 ConMed                378 DIPHENHYDRAMINE HYDROCHLORIDE      NA
#> 11     11 ConMed                378 TETRACYCLINE                       NA
#> 12     12 ConMed                379 BENADRYL                           NA
#> 13     13 ConMed                379 SOMINEX                            NA
#> 14     14 ConMed                379 ZQUILL                             NA

# DOES NOT WORK (native pipe)
# assign_no_ct(raw_dat = cm_raw,
#              raw_var = "MDRAW",
#              tgt_var = "CMTRT")  |>
#   {
#     assign_no_ct(
#       tgt_dat = condition_add(_, CMTRT == "BABY ASPIRIN"),
#       raw_dat = cm_raw,
#       raw_var = "MDNUM",
#       tgt_var = "CMGRPID"
#     )
#   }

# WORKS (maggritr's pipe)
assign_no_ct(raw_dat = cm_raw,
             raw_var = "MDRAW",
             tgt_var = "CMTRT")  %>%
  {
  assign_no_ct(
    tgt_dat = condition_add(., CMTRT == "BABY ASPIRIN"),
    raw_dat = cm_raw,
    raw_var = "MDNUM",
    tgt_var = "CMGRPID"
  )
  }
#> # A tibble: 14 × 5
#>    oak_id raw_source patient_number CMTRT                         CMGRPID
#>     <int> <chr>               <int> <chr>                           <int>
#>  1      1 ConMed                375 BABY ASPIRIN                        1
#>  2      2 ConMed                375 CORTISPORIN                        NA
#>  3      3 ConMed                376 ASPIRIN                            NA
#>  4      4 ConMed                377 DIPHENHYDRAMINE HCL                NA
#>  5      5 ConMed                377 PARCETEMOL                         NA
#>  6      6 ConMed                377 VOMIKIND                           NA
#>  7      7 ConMed                377 ZENFLOX OZ                         NA
#>  8      8 ConMed                378 AMITRYPTYLINE                      NA
#>  9      9 ConMed                378 BENADRYL                           NA
#> 10     10 ConMed                378 DIPHENHYDRAMINE HYDROCHLORIDE      NA
#> 11     11 ConMed                378 TETRACYCLINE                       NA
#> 12     12 ConMed                379 BENADRYL                           NA
#> 13     13 ConMed                379 SOMINEX                            NA
#> 14     14 ConMed                379 ZQUILL                             NA

- Move `tgt_dat` to the first position in the argument list for cleaner command pipes. - Rename `condition_by()` to `condition_add()`. - Export `oak_id_vars()` for direct user access. - Update tidyselections to align with the latest practices.

rammprasad · 2024-06-12T15:43:47Z

Thank you, @ramiromagno. Regarding the condition_add options, let's stick with options 1 and 2. We will add examples for both scenarios.

Move the condition_add() one level up and have it in the chain, i.e. not inline within the assign_no_ct()
Add braces around the assign_no_ct() call and use the placeholder where needed

rammprasad · 2024-06-12T15:43:58Z

Let me know once it is ready for the final review.

ramiromagno · 2024-06-12T22:10:29Z

Thank you, @ramiromagno. Regarding the condition_add options, let's stick with options 1 and 2. We will add examples for both scenarios.
1. Move the condition_add() one level up and have it in the chain, i.e. not inline within the assign_no_ct()

2. Add braces around the assign_no_ct() call and use the placeholder where needed

I know we decided to go with options 1 and 2. But given that Lionel sympathetically answered promptly, I am leaving it here for future record:

library(sdtm.oak)
library(magrittr)
library(dplyr, warn.conflicts = FALSE)

`%>>%` <- function(data, expr) {
  eval(substitute(expr), list(. = data), parent.frame())
}

study_ct <- read.csv(system.file("cm_domain/cm_sdtm_oak_ct.csv", package = "sdtm.oak"))
cm_raw <-  read.csv(system.file("cm_domain/cm_raw_data.csv", package = "sdtm.oak")) %>%
  #derive oak_id as the row number
  dplyr::mutate(
    oak_id = structure(seq_len(nrow(.))),
    patient_number = PATNUM,
    raw_source = "ConMed"
  ) %>%
  dplyr::select(oak_id_vars(), dplyr::everything())

assign_no_ct(raw_dat = cm_raw,
             raw_var = "MDRAW",
             tgt_var = "CMTRT")  %>>%
  assign_no_ct(
    tgt_dat = condition_add(., CMTRT == "BABY ASPIRIN"),
    tgt_var = "CMGRPID",
    raw_dat = cm_raw,
    raw_var = "MDNUM"
  )
#> # A tibble: 14 × 5
#>    oak_id raw_source patient_number CMTRT                         CMGRPID
#>     <int> <chr>               <int> <chr>                           <int>
#>  1      1 ConMed                375 BABY ASPIRIN                        1
#>  2      2 ConMed                375 CORTISPORIN                        NA
#>  3      3 ConMed                376 ASPIRIN                            NA
#>  4      4 ConMed                377 DIPHENHYDRAMINE HCL                NA
#>  5      5 ConMed                377 PARCETEMOL                         NA
#>  6      6 ConMed                377 VOMIKIND                           NA
#>  7      7 ConMed                377 ZENFLOX OZ                         NA
#>  8      8 ConMed                378 AMITRYPTYLINE                      NA
#>  9      9 ConMed                378 BENADRYL                           NA
#> 10     10 ConMed                378 DIPHENHYDRAMINE HYDROCHLORIDE      NA
#> 11     11 ConMed                378 TETRACYCLINE                       NA
#> 12     12 ConMed                379 BENADRYL                           NA
#> 13     13 ConMed                379 SOMINEX                            NA
#> 14     14 ConMed                379 ZQUILL                             NA

^{Created on 2024-06-12 with reprex v2.1.0}

rammprasad · 2024-06-13T06:12:15Z

Shall we add this as the third option? This looks sleeker than using {}

- Documentation - Examples - New article about cnd_df (WIP)

ramiromagno · 2024-06-16T22:56:11Z

@rammprasad and @edgar-manukyan:

I think we are pretty close to having the code for conditioned data frames near completion.

A new pipe operator has been introduced %.>% check the docs for the details. This should allow to create chains of commands with less clutter, namely the usage of braces (%>% {...} can now be replaced simply with %.>% ...).
I've added more documentation across functions and examples.
I also added quite significant number of unit tests to most functions, but not all.
A new vignette has been created introducing conditioned data frames. It is currently incomplete because it has only one usage example with condition_add(). We should add those cases Ram mentioned where conditioning involves either the raw data set, the target, both independently, and both interdependently. For the most complicated case, both interdependently I reckon we will need to resort to using the sdtm_join() function explicitly by the user.

Take a look and give me your feedback!

edgar-manukyan · 2024-06-17T15:18:50Z

Thanks so much @ramiromagno 🙏 I will start the review shortly since I believe @rammprasad is happy with the MR and will approve it shortly. Let's refrain from adding any new features in this MR and open a new issue/MR instead.

edgar-manukyan · 2024-06-17T17:29:57Z

Simply brilliant @ramiromagno, thank you so much 🙏 🙏 🙏 for all your time and effort. I am sure the SDTM community is going to appreciate this. I also feel that admiral might grab your idea of conditioned data frames data as well 😉

Huge thanks for the tests 💯 💯 💯

R/assertions.R

rammprasad · 2024-06-17T22:04:55Z

It looks good to me. Lets merge this to main, and I can take care of the documentation updates.

ramiromagno · 2024-06-17T22:43:47Z

@rammprasad and @edgar-manukyan : please do not merge yet as I am now doing styling and linting fixes.

- No need for S3 methods to be exported - `condition_add()` now links to the appropriate article about conditioned data frames - Documentation tweaks - Version bump, NEWS update and pkgdown reference list update

- Add example for `condition_add()` - Re-export S3 methods for `cnd_df` - Update pkgdown reference list

Merge branch 'main' into 0054-condition-by # Conflicts: # NAMESPACE # NEWS.md # _pkgdown.yml # inst/WORDLIST # renv/profiles/4.4/renv.lock

ramiromagno self-assigned this May 24, 2024

ramiromagno linked an issue May 24, 2024 that may be closed by this pull request

Feature Request: raw_filter and tgt_filter parameters #54

Closed

ramiromagno added 2 commits May 26, 2024 01:54

Basic support for conditioned data sets

d08794b

ramiromagno changed the title ~~Basic support for "conditioned" data frames~~ if_then_else support via "conditioned" data frames May 29, 2024

ramiromagno requested review from edgar-manukyan and rammprasad May 29, 2024 02:13

rammprasad reviewed Jun 10, 2024

View reviewed changes

R/cnd_df.R Show resolved Hide resolved

R/cnd_df.R Show resolved Hide resolved

R/sdtm_join.R Show resolved Hide resolved

R/cnd_df.R Outdated Show resolved Hide resolved

Ramm's feedback integration

0d7861a

- Move `tgt_dat` to the first position in the argument list for cleaner command pipes. - Rename `condition_by()` to `condition_add()`. - Export `oak_id_vars()` for direct user access. - Update tidyselections to align with the latest practices.

Update on conditioned data frames

fd722ba

- Documentation - Examples - New article about cnd_df (WIP)

edgar-manukyan approved these changes Jun 17, 2024

View reviewed changes

R/assertions.R Show resolved Hide resolved

rammprasad approved these changes Jun 17, 2024

View reviewed changes

Styling fixes

09a3921

ramiromagno and others added 7 commits June 18, 2024 01:12

Update linting and styling

a7bb91a

Tidying up

173f020

- No need for S3 methods to be exported - `condition_add()` now links to the appropriate article about conditioned data frames - Documentation tweaks - Version bump, NEWS update and pkgdown reference list update

Last tweaks

c8c581a

- Add example for `condition_add()` - Re-export S3 methods for `cnd_df` - Update pkgdown reference list

Merge from main

d40366f

Merge branch 'main' into 0054-condition-by # Conflicts: # NAMESPACE # NEWS.md # _pkgdown.yml # inst/WORDLIST # renv/profiles/4.4/renv.lock

Remove blank line

4350a38

Tweaks to %.>% docs

8148500

Automatic renv profile update.

d7352a5

ramiromagno merged commit 13644bd into main Jun 18, 2024
1 check passed

ramiromagno deleted the 0054-condition-by branch June 18, 2024 01:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

if_then_else support via "conditioned" data frames #55

if_then_else support via "conditioned" data frames #55

ramiromagno commented May 24, 2024 •

edited

Loading

github-actions bot commented May 24, 2024 •

edited

Loading

ramiromagno commented May 24, 2024

rammprasad commented Jun 7, 2024

ramiromagno commented Jun 7, 2024 •

edited

Loading

rammprasad commented Jun 8, 2024

ramiromagno commented Jun 8, 2024 •

edited

Loading

rammprasad left a comment

ramiromagno commented Jun 10, 2024 •

edited

Loading

ramiromagno commented Jun 12, 2024

rammprasad commented Jun 12, 2024

rammprasad commented Jun 12, 2024

ramiromagno commented Jun 12, 2024

rammprasad commented Jun 13, 2024 •

edited

Loading

ramiromagno commented Jun 16, 2024

edgar-manukyan commented Jun 17, 2024 •

edited

Loading

edgar-manukyan commented Jun 17, 2024 •

edited

Loading

rammprasad commented Jun 17, 2024

ramiromagno commented Jun 17, 2024

if_then_else support via "conditioned" data frames #55

if_then_else support via "conditioned" data frames #55

Conversation

ramiromagno commented May 24, 2024 • edited Loading

github-actions bot commented May 24, 2024 • edited Loading

ramiromagno commented May 24, 2024

rammprasad commented Jun 7, 2024

ramiromagno commented Jun 7, 2024 • edited Loading

Question 2

rammprasad commented Jun 8, 2024

ramiromagno commented Jun 8, 2024 • edited Loading

rammprasad left a comment

Choose a reason for hiding this comment

ramiromagno commented Jun 10, 2024 • edited Loading

ramiromagno commented Jun 12, 2024

rammprasad commented Jun 12, 2024

rammprasad commented Jun 12, 2024

ramiromagno commented Jun 12, 2024

rammprasad commented Jun 13, 2024 • edited Loading

ramiromagno commented Jun 16, 2024

edgar-manukyan commented Jun 17, 2024 • edited Loading

edgar-manukyan commented Jun 17, 2024 • edited Loading

rammprasad commented Jun 17, 2024

ramiromagno commented Jun 17, 2024

ramiromagno commented May 24, 2024 •

edited

Loading

github-actions bot commented May 24, 2024 •

edited

Loading

ramiromagno commented Jun 7, 2024 •

edited

Loading

ramiromagno commented Jun 8, 2024 •

edited

Loading

ramiromagno commented Jun 10, 2024 •

edited

Loading

rammprasad commented Jun 13, 2024 •

edited

Loading

edgar-manukyan commented Jun 17, 2024 •

edited

Loading

edgar-manukyan commented Jun 17, 2024 •

edited

Loading