Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tdata in tealdata@main #138

Closed
wants to merge 9 commits into from
Closed

Tdata in tealdata@main #138

wants to merge 9 commits into from

Conversation

mhallal1
Copy link
Contributor

@mhallal1 mhallal1 commented Feb 24, 2023

closes ##121
depends on insightsengineering/teal.slice#212 and insightsengineering/teal#808

-Moved tdata from teal to teal.data
-Added get_tdata method to TealData
-Updated downstreams to use TealData$get_tdata() output instead of a TealData object
-Updated tests

TODO:
-figure the code issue

@mhallal1 mhallal1 added the core label Feb 24, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Feb 24, 2023

badge

Code Coverage Summary

Filename                                 Stmts    Miss  Cover    Missing
-------------------------------------  -------  ------  -------  -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
R/as_cdisc.R                                39       4  89.74%   107-110
R/Callable.R                                45       0  100.00%
R/CallableCode.R                            36       2  94.44%   26, 63
R/CallableFunction.R                        88       3  96.59%   160-162
R/CallablePythonCode.R                      58      58  0.00%    21-223
R/cdisc_data.R                              45       1  97.78%   79
R/CDISCTealDataConnector.R                  20       3  85.00%   31, 36, 49
R/CDISCTealDataset.R                        46      11  76.09%   108-115, 208-210
R/CDISCTealDatasetConnector.R               26       1  96.15%   116
R/CodeClass.R                              111       1  99.10%   157
R/data_label.R                              36      13  63.89%   35-39, 58-65, 105
R/deep_clone_r6.R                            9       0  100.00%
R/get_attrs.R                                2       2  0.00%    12-47
R/get_code.R                               173      19  89.02%   87, 140-143, 196-197, 207-208, 266, 297, 333, 337, 372, 381-385
R/get_dataname.R                             4       0  100.00%
R/get_dataset_label.R                        3       0  100.00%
R/get_dataset.R                             13       8  38.46%   41, 58, 85-91
R/get_datasets.R                            10       2  80.00%   86, 110
R/get_key_duplicates.R                      37       7  81.08%   42-48, 55-56
R/get_keys.R                                15       7  53.33%   72-73, 130-150
R/get_raw_data.R                            24      11  54.17%   161-174
R/include_css_js.R                           9       1  88.89%   20
R/is_pulled.R                                4       0  100.00%
R/JoinKeys.R                               183       7  96.17%   134, 233, 327, 330, 388-427
R/load_dataset.R                            25      18  28.00%   60-65, 89-222
R/MAETealDataset.R                         138      57  58.70%   53, 115, 153-208, 224-229, 236-245, 282, 323-339
R/mutate_dataset.R                          18       0  100.00%
R/set_args.R                                10       5  50.00%   42-46
R/tdata.R                                   43      43  0.00%    49-187
R/teal_data.R                               31       2  93.55%   44, 52
R/TealData.R                               243     128  47.33%   201-216, 229, 241-310, 353-360, 393-398, 400, 402-407, 409, 426-471
R/TealDataAbstract.R                       232      24  89.66%   72, 85-88, 97-106, 215-218, 429, 454-458, 480, 486
R/TealDataConnection.R                     297     180  39.39%   58-59, 64, 67, 70, 106-163, 183, 186-188, 194-200, 205-207, 233, 238, 254-277, 287, 300, 321, 325-330, 333-336, 358-360, 364-371, 374-377, 392-406, 425-426, 446-517, 535-543, 545, 549-564, 567-570, 602, 608-612, 626, 661-663, 672-674
R/TealDataConnector.R                      196     102  47.96%   167, 179, 183, 196, 199-208, 210, 218-227, 310-314, 372-477
R/TealDataset.R                            367      23  93.73%   141-151, 383-387, 443-452, 504
R/TealDatasetConnector_constructors.R      300      53  82.33%   176-212, 262, 830-835, 1033-1109
R/TealDatasetConnector.R                   326      90  72.39%   169, 237, 251, 256, 270, 433, 456-495, 525, 540-570, 660, 670, 679-686, 699, 714-741
R/to_relational_data.R                      54       7  87.04%   35-36, 40, 99, 106, 112, 127
R/topological_sort.R                        32       0  100.00%
R/utils.R                                   56       9  83.93%   22-23, 27, 76-83
R/validate_data_args.R                      32       0  100.00%
R/zzz.R                                      6       6  0.00%    4-12
TOTAL                                     3442     908  73.62%

Diff against main

Filename        Stmts    Miss  Cover
------------  -------  ------  --------
R/tdata.R         +43     +43  +100.00%
R/TealData.R      +15     +15  -3.11%
TOTAL             +58     +58  -1.26%

Results for commit: 2baafc5

Minimum allowed coverage is 80%

♻️ This comment has been updated with latest results

@github-actions
Copy link
Contributor

github-actions bot commented Feb 24, 2023

Unit Test Performance Difference

Test Suite $Status$ Time on main $±Time$ $±Tests$ $±Skipped$ $±Failures$ $±Errors$
MAETealDataset 💚 $7.57$ $-1.02$ $0$ $0$ $0$ $0$

Results for commit b560ada

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 27, 2023

Unit Tests Summary

       1 files       27 suites   37s ⏱️
   366 tests    366 ✔️ 0 💤 0
1 019 runs  1 019 ✔️ 0 💤 0

Results for commit f7798a5.

♻️ This comment has been updated with latest results.

#' get_metadata(data, "iris")
#'
#' @export
new_tdata <- function(data, code = "", join_keys = NULL, metadata = NULL, check = FALSE, label) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the check argument here is not in its perfect place and is only passed downstream inside FilteredData to be called in teal::get_datasets_code to evaluate if an warning message should be returned in the SRC or not.
An alternative to this would be not to pass check at all but rather to evaluate if check is TRUE or FALSE upstream and throw a warning message in the console. This will not return a warnings message in the SRC as it is done now but will avoid us the passing an unnecessary argument all through the teal app setup.
@nikolas-burkoff @donyunardi @lcd2yyz

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#97

@asbates
Copy link
Contributor

asbates commented Feb 28, 2023

NEWS update? But I guess not if this is internal only.

#'
#' @return (`character`) code used in the `tdata` object.
#' @export
get_code_tdata <- function(data) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why there are 2 functions here. I could maybe see it if get_code.tdata was not exported, but it is. As a user I would be confused as to which one I should use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So (at least when tdata lived in teal) there was a name clash between get_code() in teal.data and get_code in teal.code and this was a way to force the modules to pull the get_code from teal.data -> if there's a nicer way to do this (without making teal.code and teal.data have a dependency) then happy to change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about axing get_code.tdata and just use get_code_tdata then? It feels like an unnecessary complexity to have both.

If we keep get_code.tdata though, the generic documentation should be updated. Right now the generic reads @param x ([TealDatasetConnector] or [TealDataset]). If of class character will be treated as file to read. It should also say tdata in this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about axing get_code.tdata and just use get_code_tdata then? It feels like an unnecessary complexity to have both.

Fair enough

#' @param x a `tdata` object
#' @param ... additional arguments for the generic
#' @export
get_code.tdata <- function(x, ...) { # nolint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me the home for this is R/get_code.R

#' @param data `tdata` - object to extract the join keys
#' @return Either `JoinKeys` object or `NULL` if no join keys
#' @export
get_join_keys <- function(data) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using S3 here and for metadata feels like overkill.

R/TealData.R Show resolved Hide resolved
@donyunardi
Copy link
Contributor

donyunardi commented Mar 1, 2023

I want to confirm the reason we have get_tdata and new_tdata.

  1. get_tdata is being used in teal::srv_teal function to pass tdata create FilteredData object.

    datasets <- teal.slice::init_filtered_data(raw_data()$get_tdata())
  2. In init_filtered_data.tdata generic class, we unpack the tdata so we can feed it in to init_filtered_data.default. At this point, the datasets in step 1 will return a FilteredData object.

  3. then new_tdata is being used in teal::srv_nested_tabs when.datasets_to_data function is run to create tdata from FilteredData object.

    data <- .datasets_to_data(modules, datasets, trigger_data)
    
    .datasets_to_data <- function(module, datasets, trigger_data = reactiveVal(1L)) {
      ...
      teal.data::new_tdata(
          data,
          eventReactive(
            trigger_data(),
            c(
              get_rcode_str_install(),
              get_rcode_libraries(),
              get_datasets_code(datanames, datasets, hashes),
              teal.slice::get_filter_expr(datasets, datanames)
            )
          ),
          datasets$get_join_keys(),
          metadata,
          check = datasets$get_check(),
          label = labels
        )
      ...
    }

If this is true, why do we need to feed tdata during init_filtered_data during step 1?
It felt like we're not doing anything special to the object during step 2.

@nikolas-burkoff
Copy link
Contributor

nikolas-burkoff commented Mar 1, 2023

@asbates - yup we should have NEWS

@donyunardi

If this is true, why do we need to feed tdata during init_filtered_data during step 1? It felt like we're not doing anything special to the object during step 2.

So the idea here is eventually we will not have a TealData class at all but only tdata (and a lightweight class to generate tdata in a delayed way) - it's way too big a change to do that in one go - and will involve big changes to teal::init's API. This PR is a first step along that path - being able to create a tdata from TealData directly

Eventually the data flow in teal::init will go something like:

  1. User specify tdata object (or instructions as to how to do it in a delayed way)
  2. Filter panel is created from this tdata object
  3. Filter panel has a datasets$get_tdata() method (so we don't need .datasets_to_data method) which returns a tdata object with the filter applied
  4. Modules take tdata object

Part 4) is already on main. Part 2) is in this collection of PRs, part 3) is waiting for teal.slice changes (and tbh .datasets_to_data is effectively doing this). And part 1) is very much at the early stages here

After all these changes, in order to use/test/maintain teal::init, individual modules or the filter panel you only need to create one tdata object which is used to create all these classes.

This should simplify things a lot - a year ago in order to test a module you needed to create a FilteredData object from which you needed to have created a [CDISC]TealData object which meant there was strong coupling between all the components and it was harder to understand and maintain.


Another benefit we can introduce here is that at the moment pulling the data can only occur once per shiny session - we can change this as some apps want to be able to pull the data multiple times - and also this can then remove the splash screen for DDL and the additional reactivity cycle coming from insertUI - see this old branch for inspiration


There are some complexities at this point which need to be handled - hence trying to do just this baby step first:

  1. the check argument to cdisc_data is currently passed all the way through so that the modules "show r code" can give a message saying "the app developer did not check the data" reproducing this behaviour is painful as it complicates our tdata object needing an extra check argument for imho very little benefit

Do we actually want to keep this behaviour (esp given the open "look at check again issue") - or can we avoid doing this - this is a question for @lcd2yyz

  1. The show r code for each module includes the code needed to get the data but not all of the data, only the data used in that module. This is good if we are treating each module's SRC as independent and we want to run the r code outside of the app (if there's 20 datasets but this module only uses 2 then no need to get hold of the other 18...) but adds a lot of complexity (as we need to know which lines of preprocessing code are used for which datasets) - and again would make tdata much more complex. If the SRC is mainly meant as a record of what could be run and we can live with all dataset preprocessing included in the SRC then the implementation becomes massively simpler.

Happy to talk about the above points in a meeting to explain in more detail. The correct solution to these really is teal 1.0:

Instead of everything (check, preprocessing code, filtering code, filter_panel_api for the formatted filter states) being passed into the modules to be included in the report/show r code you have the modules passing their code snippet into teal::init to handle the SRC/report cards. In practice this means SRC becomes teal getting information from teal.data, the filter panel and the modules and combining them together.

Teal is the right place for these things to live: "why does a module need to know the code used to filter the data that is being passed into it" and "why does every module need filter_panel_api passed into it purely to get the filter information in the report card - when the module itself has no concept of filtered data - it just has "data"?"

This then would allow much better reports (with say a "Data processing" section containing the getting the data and then "Outputs" section containing the module outputs - and the R code for each output would not need another copy of the "getting data" section as it currently does.

but I think this is quite a long way away - a decisions could be made to pause this work and attempt work on teal 1.0 first, or to abandon completely etc.

@lcd2yyz
Copy link

lcd2yyz commented Mar 2, 2023

There are some complexities at this point which need to be handled - hence trying to do just this baby step first:

  1. the check argument to cdisc_data is currently passed all the way through so that the modules "show r code" can give a message saying "the app developer did not check the data" reproducing this behaviour is painful as it complicates our tdata object needing an extra check argument for imho very little benefit

Do we actually want to keep this behaviour (esp given the open "look at check again issue") - or can we avoid doing this - this is a question for @lcd2yyz

I don't know the history of this message, I can only guess that it was added to limit potential liability on app developer.
I believe once we implement insightsengineering/teal#778, we should no longer need to worry about passing this message. So I'm fine with removing this check, consistent with the decision to remove this insightsengineering/teal.widgets#48

Teal is the right place for these things to live: "why does a module need to know the code used to filter the data that is being passed into it" and "why does every module need filter_panel_api passed into it purely to get the filter information in the report card - when the module itself has no concept of filtered data - it just has "data"?"

This then would allow much better reports (with say a "Data processing" section containing the getting the data and then "Outputs" section containing the module outputs - and the R code for each output would not need another copy of the "getting data" section as it currently does.

Agree with this, but sounds like major refactor again, something we need to come back to in the future. Can you help open a new issue to log down all your thoughts about this?

@nikolas-burkoff
Copy link
Contributor

Agree with this, but sounds like major refactor again, something we need to come back to in the future. Can you help open a new issue to log down all your thoughts about this?

So in some sense all the refactor work we've been doing for teal has been leading towards this - my thoughts are in agreement with @gogonzo and he has our thoughts together with a diagram here (the list of issues in the milestone is not complete)

@gogonzo
Copy link
Contributor

gogonzo commented Jul 13, 2023

Closing, we are choosing different solution

@gogonzo gogonzo closed this Jul 13, 2023
@gogonzo gogonzo deleted the tdata_in_tealdata@main branch July 13, 2023 06:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants