remove chunks + merge_expression_srv #77

mhallal1 · 2022-06-29T18:22:00Z

closes #61
replaces #75 to have the same branch name as insightsengineering/teal#674

Dependency on datasets and chunks has been removed from data merge module is such a way:

New modules merge_expression_module and merge_expression_srv which are the new versions of data_merge_module and data_merge_srv. The new modules have an additional argument join_keys and the argument datasets accepts a list of reactive or non-reactive data.frames. So we have a separation between the old modules that accept FilteredData and the new modules that accept a named list of reactive/non-reactive data.frames.
Merge_datasets is modified to use a list of reactive data.frames instead of FilteredData and chunks was completely removed from it. It now returns a list of expr, columns_source, key and filter_info. No data or chunks are returned anymore. Be aware that expr now is a list of calls instead of character.
get_dplyr_call and get_merge_call have been modified accordingly.
data_merge_srv has been modified in a way to use the same merge_datasets, get_dplyr_call and get_merge_call scripts. It defines data and join_keys internally and calls merge_expression_srv where it appends data and chunks to the returned list.
join_keys argument replaces keys argument of data_extract_XXX. This is to avoid the confusion of having to use join_keys(data_merge_XXX) and keys (data_extract_XXX) in the same module. We extract internally the keys corresponding to every dataset from the join_keys input.
Tests are added and modified as needed.

Please test with the example apps in merge_expression_module.R and data_merge_module.R.

The design of the apps would be like this now:

server = function(input, output, session) {
#initialize chunks
chunks_h <- teal.code::chunks_new()

#push data in chunks

#call data_merge
merged_data <- merge_expression_srv(
      selector_list = selector_list,
      datasets = data_list,
      join_keys = join_keys,
      merge_function = "dplyr::left_join"
    )

#push data_merge code in chunks
ch_merge <- reactive({ 
      ch <- teal.code::chunks_deep_clone(chunks_h)
      for (chunk in merged_data()$expr) teal.code::chunks_push(chunks = ch, expression = chunk)
      ch$eval()
      ch
    })

#get output from chunks

}

Co-authored-by: Dawid Kałędkowski <[email protected]>

Co-authored-by: Pawel Rucki <[email protected]>

…nto 61_data_merge_dependency@main

github-actions · 2022-06-29T18:28:36Z

Unit Tests Summary

    1 files   24 suites 52s ⏱️
195 tests 194 ✔️ 1 💤 0 ❌
674 runs 673 ✔️ 1 💤 0 ❌

Results for commit 06e0c06.

♻️ This comment has been updated with latest results.

github-actions · 2022-06-29T18:29:22Z

Code Coverage Summary

Filename                          Stmts    Miss  Cover    Missing
------------------------------  -------  ------  -------  ---------------------------------------------------------------------
R/all_choices.R                       1       0  100.00%
R/call_utils.R                      156     124  20.51%   14-23, 64, 66, 68, 107-431
R/check_selector.R                   31       0  100.00%
R/choices_labeled.R                 202      61  69.80%   60, 71, 76, 87, 103, 221-225, 229-234, 264-277, 390-391, 393, 425-473
R/choices_selected.R                 81      11  86.42%   201-229, 260
R/column_functions.R                  3       3  0.00%    13-16
R/data_extract_datanames.R           32       8  75.00%   9-13, 64-66
R/data_extract_filter_module.R       92      11  88.04%   75-82, 84, 87-88, 141
R/data_extract_module.R             250      60  76.00%   3, 121, 126, 143, 146-151, 153, 172-175, 203-249, 445, 450, 481
R/data_extract_read_module.R        122      13  89.34%   28, 32-35, 102-107, 116, 133
R/data_extract_select_module.R       32      18  43.75%   31-48
R/data_extract_single_module.R       53       2  96.23%   29, 42
R/data_extract_spec.R                32       0  100.00%
R/data_merge_module.R                53      10  81.13%   101-113
R/filter_spec.R                     186       1  99.46%   373
R/format_data_extract.R              16       1  93.75%   49
R/get_dplyr_call.R                  299       0  100.00%
R/get_merge_call.R                  285      30  89.47%   29-36, 47, 210-219, 372, 388-400
R/include_css_js.R                    5       0  100.00%
R/input_checks.R                     11       2  81.82%   18-19
R/merge_data_utils.R                  2       0  100.00%
R/merge_datasets.R                  143       6  95.80%   72, 229-233
R/merge_expression_module.R          42       0  100.00%
R/resolve_delayed.R                  16       0  100.00%
R/resolve.R                         114      44  61.40%   229-312
R/select_spec.R                      49       8  83.67%   149, 213-220
R/utils.R                            15      11  26.67%   26-39
R/zzz.R                               2       2  0.00%    2-3
TOTAL                              2325     426  81.68%

Results for commit: cb426c7

Minimum allowed coverage is 80%

♻️ This comment has been updated with latest results

R/data_merge_module.R

nikolas-burkoff

A few comments from me in-line - mostly for work after this PR

nikolas-burkoff · 2022-06-30T10:20:30Z

R/merge_expression_module.R

+#' join_keys <- list(
+#'   ADSL = list(
+#'     ADSL = c(STUDYID = "STUDYID", USUBJID = "USUBJID"),
+#'     ADLB = c(STUDYID = "STUDYID", USUBJID = "USUBJID")
+#'   ),
+#'   ADLB = list(
+#'     ADLB = c(STUDYID = "STUDYID", USUBJID = "USUBJID", PARAMCD = "PARAMCD", AVISIT = "AVISIT"),
+#'     ADSL = c(STUDYID = "STUDYID", USUBJID = "USUBJID")
+#'   )


So this issue insightsengineering/teal.slice#53 is really important I think, we're going to end up creating duplication i.e. helper functions for creating join_keys embedded inside teal.data::cdisc_data, the same inside teal.slice and then also here.

nikolas-burkoff · 2022-06-30T10:23:30Z

R/data_extract_module.R

-      key_list <- sapply(X = datasets$datanames(), simplify = FALSE, FUN = function(x) {
-        datasets$get_keys(dataname = x)
-      })
+      join_keys_list <- datasets$get_join_keys()


If you have created filtered_data not from teal.data then the primary keys probably won't be in the join_keys

#won't have join_keys init_filtered_data(list(ADSL = ..., keys = c("USUBJID", "STUDYID")) # will do init_filtered_data(list(ADSL = ..., keys = c("USUBJID", "STUDYID"), join_keys = join_keys(...))

Issue create in teal.slice as discussed:
insightsengineering/teal.slice#58

nikolas-burkoff · 2022-06-30T10:24:37Z

R/data_extract_module.R

@@ -429,6 +428,12 @@ data_extract_srv.list <- function(id, datasets, data_extract_spec, keys = NULL,
        "data_extract_srv.list initialized with datasets: { paste(names(datasets), collapse = ', ') }."
      )

+      # get keys out of join_keys
+      is_nested_list <- any(sapply(join_keys, is.list))


hmm - so it feels weird to need this sort of thing - if we actually used a JoinKeys object throughout instead of nested lists (which to be fair are just as painful to specify) then we wouldn't need to care about if something is a nested list or not we could just call join_keys$get()

I guess this comes back to how we are splitting things up. Users need to specify the keys for merging/joining data.frames together, we then need to use this information throughout the code base and we probably need helper functions to create this information in a simpler way for standard cases such as cdisc.

This suggests we need a (possibly very lightweight) object to standardize creation and usage. At the moment we have both JoinKeys and a named nested list, which I think is confusing for everyone - for users as there are two distinct ways for creating the information (tbh both about as equally complex) and for us - as there are two different ways for accessing the information. Should we use JoinKeys throughout? Or should we abandon them everywhere in favour of a nested list and manually extract what's needed each time from it.

This sort of comment probably applies to other things as well - you can create a TealDataset or just create a data.frame with some attributes etc.

I have just simplified this step, there is no more check for a nested list. We can accept a nested join_keys list format only.

Yes, I have created an issue in TealData to simplify the JoinKeys and to make wrappers which can ease up the creation of the join_keys list.

insightsengineering/teal.data#78

nikolas-burkoff · 2022-06-30T10:25:09Z

R/data_extract_module.R

@@ -258,7 +258,7 @@ check_data_extract_spec_react <- function(datasets, data_extract) {
 #' @param datasets (`FilteredData` or `list` of `reactive` or non-`reactive` `data.frame`)\cr
 #'  object containing data either in the form of [teal.slice::FilteredData] or as a list of `data.frame`.
 #'  When passing a list of non-reactive `data.frame`s, they are converted to reactive `data.frame`s internally.
-#'  When passing a list of reactive or non-reactive `data.frame`s, the argument `keys` is required also.
+#'  When passing a list of reactive or non-reactive `data.frame`s, the argument `join_keys` is required also.


In the future you could imagine join_keys to be reactive...

nikolas-burkoff · 2022-06-30T10:46:06Z

tests/testthat/test-merge_expression_srv.R

+          "ADSL_FILTERED <- ADSL",
+          "ADLB_FILTERED <- ADLB",
+          "ANL_1 <- ADSL_FILTERED %>% dplyr::select(STUDYID, USUBJID, AGE)",
+          "ANL_2 <- ADLB_FILTERED %>% dplyr::select(STUDYID, USUBJID, AVAL, CHG)",


So in the longer term - should the merge expression actually contain the filter call - I guess the advantage is it contains everything needed to recreate the data at this point, the disadvantage is that if you have multiple data merges in the same module you'll end up with multiple expressions of the filtering and this could get very confusing/wrong if we are removing "_FILTERED" etc. as the datasets coming into a module may or may not have been filtered

So I wonder if the module developer should be in charge of what to do with the "pure" merge expression and if they wish to append it to some chunk which contains the filtered data expression then they can - and maybe for a second merge they wish to do something different

The dplyr::select stuff lives with the data_extract_spec doesn't it -> so you could imagine the merge call not actually returning them either (and the d-e-s doing so further decoupling things) - but I suspect that would be harder

decoupling the steps inside merge_datasets is necessary to avoid repeating expressions. I guess this would be a part of decoupling data_extract from data_merge modules initiative. @gogonzo will this be in action soon?

@nikolas-burkoff we need to continue discussion about filtering in data_extract_spec/data_merge and find the solution for this. I made a comment in my PR which adds more into this equation.

I think it is nuts that we need to depend on hardcoded _FILTERED in data_merge_expr but we have no alternative at this moment. We should rather keep the same name as in the reactive list (environment should be consisten with the code).

Let's re-initialize discussions about data_merge soon

yup, this was definitely not to be addressed in this PR...

gogonzo

Everything works with insightsengineering/teal#674
Former tests return the same output 👍
Please add the NEWS entry

R/data_merge_module.R

Co-authored-by: Dawid Kałędkowski <[email protected]>

…o 654_pass_reactive_data@main

nikolas-burkoff

A few comments - the ANL one is probably the key one

NEWS.md

R/data_extract_module.R

R/data_merge_module.R

mhallal1 and others added 13 commits June 17, 2022 09:51

remove datasets and chunks

3c45cef

docs

c0b2529

update examples

c56b6ab

deprecate data_merge_xxx

78058c0

docs

63dea3d

Merge branch 'main' into 61_data_merge_dependency@main

2a16a39

Update R/data_merge_module.R

e817162

Co-authored-by: Dawid Kałędkowski <[email protected]>

Update R/get_dplyr_call.R

a107a05

Co-authored-by: Pawel Rucki <[email protected]>

checks

aa880ff

Merge remote-tracking branch 'origin/61_data_merge_dependency@main' i…

6d1f067

…nto 61_data_merge_dependency@main

Merge branch 'main' into 61_data_merge_dependency@main

cfd1b66

update

be5b0d4

tests

52e32ac

mhallal1 added the core label Jun 29, 2022

pkgdown

a7aad6f

gogonzo reviewed Jun 30, 2022

View reviewed changes

R/data_merge_module.R Outdated Show resolved Hide resolved

nikolas-burkoff reviewed Jun 30, 2022

View reviewed changes

simplification

5d63f9e

gogonzo approved these changes Jun 30, 2022

View reviewed changes

R/data_merge_module.R Outdated Show resolved Hide resolved

mhallal1 and others added 5 commits June 30, 2022 16:57

Update R/data_merge_module.R

d9459fb

Co-authored-by: Dawid Kałędkowski <[email protected]>

NEWS

9a4fc14

Merge remote-tracking branch 'origin/654_pass_reactive_data@main' int…

7a8f620

…o 654_pass_reactive_data@main

docs

4de4355

docs

9016163

mhallal1 assigned gogonzo Jul 1, 2022

mhallal1 mentioned this pull request Jul 1, 2022

Add/update vignettes #78

Closed

3 tasks

nikolas-burkoff reviewed Jul 1, 2022

View reviewed changes

NEWS.md Outdated Show resolved Hide resolved

R/data_extract_module.R Outdated Show resolved Hide resolved

R/data_extract_module.R Outdated Show resolved Hide resolved

R/data_merge_module.R Outdated Show resolved Hide resolved

R/data_merge_module.R Outdated Show resolved Hide resolved

review Nik

06e0c06

mhallal1 merged commit 2140417 into main Jul 1, 2022

cicdguy deleted the 654_pass_reactive_data@main branch June 9, 2023 15:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove chunks + merge_expression_srv #77

remove chunks + merge_expression_srv #77

mhallal1 commented Jun 29, 2022

github-actions bot commented Jun 29, 2022 •

edited

Loading

github-actions bot commented Jun 29, 2022 •

edited

Loading

nikolas-burkoff left a comment

nikolas-burkoff Jun 30, 2022

nikolas-burkoff Jun 30, 2022

mhallal1 Jun 30, 2022

nikolas-burkoff Jun 30, 2022

nikolas-burkoff Jun 30, 2022

mhallal1 Jun 30, 2022

gogonzo Jun 30, 2022

nikolas-burkoff Jun 30, 2022

nikolas-burkoff Jun 30, 2022

mhallal1 Jun 30, 2022

gogonzo Jun 30, 2022

nikolas-burkoff Jun 30, 2022

gogonzo left a comment

nikolas-burkoff left a comment

remove chunks + merge_expression_srv #77

remove chunks + merge_expression_srv #77

Conversation

mhallal1 commented Jun 29, 2022

github-actions bot commented Jun 29, 2022 • edited Loading

Unit Tests Summary

github-actions bot commented Jun 29, 2022 • edited Loading

Code Coverage Summary

nikolas-burkoff left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gogonzo left a comment

Choose a reason for hiding this comment

nikolas-burkoff left a comment

Choose a reason for hiding this comment

github-actions bot commented Jun 29, 2022 •

edited

Loading

github-actions bot commented Jun 29, 2022 •

edited

Loading