Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce the new DDL (ddl class) #926

Closed
wants to merge 55 commits into from
Closed

Introduce the new DDL (ddl class) #926

wants to merge 55 commits into from

Conversation

gogonzo
Copy link
Contributor

@gogonzo gogonzo commented Oct 4, 2023

Part of #937
Closes #895

install

Installation code
staged.dependencies::dependency_table(
  project = "insightsengineering/teal@https://github.com",
  project_type = "repo@host",
  ref = "ddl@teal_data@main",
  verbose = 1
) |> staged.dependencies::install_deps()

ddl class

unfold

Small class which contains ui and server. server calls run_ddl which evaluates a code using input and create teal_data object. Function ddl accepts several arguments which are used to generate proper ui and server.

tdata-DDL process (3)

Above diagram depict how class works. Please follow steps below and look at the diagram.

  1. One needs to specify ddl arguments to fit someone's needs. Take for example custom_connector. Let's assume that we need to call scda::synthetic_cdisc_data('latest') for two datasets.
data_code = paste(
    "adsl <- scda::synthetic_cdisc_data('latest')$adsl",
    "adtte<- scda::synthetic_cdisc_data('version')$adtte",
    collapse = "\n"
  )
)

Imagine that above datasets are on the remote server so we need to call open_conn function and close after pull. Imagine that open_conn and close_conn are in some external package.

code = paste(c(
  "open_conn(username = input$username, password = input$password)",
  "adsl <- scda::synthetic_cdisc_data('latest')$adsl",
  "adtte<- scda::synthetic_cdisc_data('version')$adtte",
  "close_conn()"
  ),
  collapse = "\n"
)

username and password should be specified by the app user. To do so we need to create ui which will create username and password inputs which are used during evaluation of above code.

ui = function(id) {
  ns <- NS(id)
  tagList(
    textInput(ns("username"), label = "Username"),
      passwordInput(ns("password"), label = "Password"),
      actionButton(ns("submit"), label = "Submit")
  )
}

If we have ui we need also server to utilize input. When submit button is clicked all input values are passed to ddl_run where all magic happens. ddl_run executes the code and substitutes all input$<name> with equivalent input_list element. server function needs to have ... in the formals where all necessary objects are passed through.

submit_button_server <- function(id, ...) {
  moduleServer(id, function(input, output, session) {
    tdata <- eventReactive(input$submit, {
      req(input$pass)
      ddl_run(input_list = reactiveValuesToList(input), ...)
    })

    # would need to make sure we handle reactivity correctly here as teal::init expects not reactive tdata...
    return(tdata)
  })
}

Because user username and password is not something we would like to include in the show R code. We need to replace them with some other code which can be run by anyone who receives the code. To replace args with some desired value or call we use offline_args to substitute code with these.

offline_args = list(
  username = quote(askpass::askpass("Please enter username")),
  password = quote(askpass::askpass("Please enter password"))
)  

At this point ddl_run evaluated "online" code, and has environment with the results of the evaluation. Also "offline" code is available after substitution. At the end teal_data is constructed from data and offline code

teal_data(data = env_list, code = masked_code, join_keys)

⚠️ As one can see ddl_run require only input_list. It is a simplification for potential app developer who doesn't need to handle additional arguments (code, input_mask) in the server. Consequence of this simplification is that server specified in ddl() have an enclosing environment changed to the environment where ddl_run, code, input_mask is available. Another consequence is that ddl_run theoretically doesn't exist anywhere - it is created on the fly when ddl object is created. It means that developing own ddl connector in external "pkg" can't use private functions from that "pkg".

Example teal app with ddl

teal app code

use teal.data, teal and teal.slice from refactor branch

library(scda)
library(teal)

data <- ddl(
  code = '
    open_conn(username = input$username, password = input$password)
    ADSL <- scda::synthetic_cdisc_data("latest")$adsl
    ADTTE <- scda::synthetic_cdisc_data("latest")$adtte
  ',
  input_mask = list(
    username = quote(askpass::askpass()),
    password = quote(askpass::askpass())
  ),
  ui = function(id) {
    ns <- NS(id)
    tagList(
      textInput(ns("username"), label = "Username"),
      passwordInput(ns("password"), label = "Password"),
      actionButton(ns("submit"), label = "Submit")
    )
  },
  server = function(id, ...) {
    moduleServer(id, function(input, output, session) {
      eventReactive(input$submit, {
        ddl_run(input = input, ...)
      })
    })
  },
  datanames = c("ADSL", "ADTTE"),
  join_keys = cdisc_join_keys("ADSL", "ADTTE")
)

app <- teal::init(
  data = data,
  modules = list(
    teal.modules.general::tm_data_table("Data Table")
  )
)

runApp(app)

Internal Connectors

insightsengineering/nestdevs-tasks#38

@gogonzo gogonzo changed the base branch from main to teal_data@main October 4, 2023 12:06
@github-actions
Copy link
Contributor

github-actions bot commented Oct 4, 2023

Unit Tests Summary

    1 files    17 suites   20s ⏱️
179 tests 179 ✔️ 0 💤 0
349 runs  349 ✔️ 0 💤 0

Results for commit 90c6f1f.

♻️ This comment has been updated with latest results.

@gogonzo gogonzo mentioned this pull request Oct 4, 2023
@gogonzo gogonzo linked an issue Oct 5, 2023 that may be closed by this pull request
@gogonzo gogonzo added the core label Oct 5, 2023
@gogonzo gogonzo changed the base branch from teal_data@main to main October 6, 2023 13:53
@gogonzo gogonzo changed the base branch from main to teal_data@main October 6, 2023 13:53
R/utils.R Outdated Show resolved Hide resolved
R/utils.R Outdated Show resolved Hide resolved
R/ddl.R Outdated Show resolved Hide resolved
R/ddl.R Outdated Show resolved Hide resolved
R/ddl.R Outdated Show resolved Hide resolved
R/ddl.R Outdated Show resolved Hide resolved
R/ddl.R Outdated Show resolved Hide resolved
R/ddl.R Outdated Show resolved Hide resolved
Copy link
Contributor

@chlebowa chlebowa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs and comments require some rephrasing.

R/ddl.R Outdated
#' evaluation and code masking process it is recommended to use `ddl_run`.
#' Package provides universal `username_password_server` which
#' runs [ddl_run] function, which returns `teal_data` object.
#' Details in the the example
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' Details in the the example
#' Details in the the example.

R/ddl.R Outdated
Comment on lines 51 to 53
if (!missing(expr) && !missing(code)) {
stop("Only one of `expr` or `code` should be specified")
}
Copy link
Contributor

@pawelru pawelru Oct 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rlang::check_exclusive

The advantage of this is that we will be having consistent error message that end-users might be already familiar with.

(please implement in other places if necessary)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not particularly familiar with rlang error messages myself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rlang error system ftw!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the main features of rlang@1.0.0 is better display of the error messages - link. Since the release of this version, most (if not all?) of tidyverse packages followed this. That's why I do believe that this might be more coherent across all the packages one might use hence: easier to understand and take appropriate action.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had issues in the past year with rlang handling errors. Needed to use rlang::cnd_message_lines(error) on Windows on every error message in tryCatch to get an accurate error message. Otherwise errors were always empty. If people have Windows and old rlang versions they may not see an error at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also vote up for striving away from tidyverse as they have a great history of breaking changes and tend to have multiple dependencies in it. I don't think general error messages in rlang are better than customized, personalized human-crafted ones that we thought through to inform the user in the most useful manner as possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do echo similar mixed feeling about tidyverse but this is actually about r-lib which I consider much more solid.

I don't think general error messages in rlang are better than customized, personalized human-crafted ones that we thought through to inform the user in the most useful manner as possible.

Yes and no. Yes because the role of the error message is to clearly indicate what's wrong and why to allow a quick fix by the end users. No, because if we will be too verbose then it won't be clear anymore. The other thing that I am actually trying to address here is consistency. Just a few examples of our codebase that I managed to find within few minutes:
https://github.com/insightsengineering/teal.slice/blob/686f1f76ac6130325df9314fea419800c139f35a/R/teal_slice.R#L134-L135
https://github.com/insightsengineering/teal.modules.general/blob/eef36c1dda918fcbebe948d8bef189bd7e545974/R/tm_g_bivariate.R#L762-L764
https://github.com/insightsengineering/teal.data/blob/c04b7edef44d80fd4cdf0c6196d5c9c3de7c84b1/R/utils.R#L21-L24
https://github.com/insightsengineering/teal.slice/blob/686f1f76ac6130325df9314fea419800c139f35a/R/teal_slice.R#L134-L135
If we can delegate the check as well as error message for a separate functionality called from all of those places then we would achieve some consistency. It's mostly done via checkmate for individual objects checks. I hope that rlang is a good candidate to fill in the gaps of what cannot be done via checkmate.
One might raise that we don't want to introduce a new dependency but rlang is very likely to be already there (directly or indirectly). What's great about it is that this is a zero-dependency package. That changed my view on this package quite significantly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those 4 examples are actually fine in my opinion. Very clear :)
If rlang has zero dependencies than I'm less against, but still it's better to have a package that has 8 dependencies, than a packages with 15 dependencies where 7 of them has zero dependencies. The overview of such package become repellent. Each package extends the time of the installation. And each package require us and some more advanced users who would like to read the code to understand another workflow. How many workflows can we use in a teal designed to be simple and seamless for R users?

R/ddl.R Outdated Show resolved Hide resolved
@vedhav
Copy link
Contributor

vedhav commented Oct 31, 2023

What about alternative solution of having special teal_module_data which can create ddl module?

I would be okay with such an implementation as long as it's clear that the ddl module is a "special" module. Ideally, I would like fewer changes to create a ddl class compared to the typical teal_data class that the users create. Right now (at least my thought is) all the teal modules are independent from the "code" and it's taken from the teal_data but if we implement a custom ddl module that violates these fundamental rules I would often question myself if there are exceptions to such fundamental rules about teal that I already learned. So, I would advocate for an additional arguments in the teal::init() for ddl.

@chlebowa
Copy link
Contributor

What about alternative solution of having special teal_module_data which can create ddl module?

Ideally, I would like fewer changes to create a ddl class compared to the typical teal_data class that the users create.

I agree.

So, I would advocate for an additional arguments in the teal::init() for ddl.

I disagree. init is the topmost user facing function in the entire framework and should not be modified lightly.

@chlebowa
Copy link
Contributor

chlebowa commented Oct 31, 2023

What about alternative solution of having special teal_module_data which can create ddl module?

Some more details for clarity's sake:
This would be a third special class for teal_module objects, like teal_module_previewer and teal_module_landing.
It would be created by its own module template function (with arguments: ui, server, code, datanames, join_keys, and optional input_mask).
The user may provide UI and server logic but with some constrains: 1) the server must return a reactive expression that returns a teal_data object; 2) if the user wishes to mask code, they must create the teal_data by calling ddl_run(*) and pass code, input_mask, datanames, and join_keys.
The module would create data that is supplied to the filter panel and on to analytical modules.

An important change would be that the data could be supplied to the application either by passing teal_data or a teal_module_data module. Whether we put the latter in the data argument or in modules, is on the table.

(*) the ddl_run function will be renamed

@m7pr
Copy link
Contributor

m7pr commented Oct 31, 2023

What about alternative solution of having special teal_module_data which can create ddl module?

Thanks @chlebowa for the explanation in your last comment!
I like the idea of the new module, which would be a new class. Because we would already have a template, this would have a separate documentation, this could get a separate tab as well and also this gets it's own arguments and assertions.

@m7pr
Copy link
Contributor

m7pr commented Oct 31, 2023

And I guess ddl_run is only used on Show R Code action only for this new module?

@chlebowa
Copy link
Contributor

And I guess ddl_run is only used on Show R Code action only for this new module?

No. ddl_run creates teal_data and masks code. What ends up in Show R Code is the masked code from teal_data (modified by the filter panel and analytical modules).

@m7pr
Copy link
Contributor

m7pr commented Oct 31, 2023

If Show R Code is the only place where we can expose the code to the user using the app, isn't that a good idea to work on the code only there and just work on avaluated that that is now just a character teal_data()/qenv@code? What's point of pre-masking if user might never hit Show R Code?

@chlebowa
Copy link
Contributor

Because it's difficult to arrange masking at the very end of the analysis as opposed to upon data creation.

* create ---> |oo| mask ---> + filter ---> + analysis ---> <o> view
* create ---> + filter ---> + analysis ---> |oo| mask ---> <o> view

@m7pr
Copy link
Contributor

m7pr commented Oct 31, 2023

I think it depends on where/when you pass masking parameters, instead passing to to ddl you could pass them to the module definition, that can pass that to Show R Code module.

@chlebowa
Copy link
Contributor

Oh, even better. Which module? You never want to show unmasked code so would you rather mask it once, upon creation, or multiple times, in analytical modules?

@m7pr
Copy link
Contributor

m7pr commented Oct 31, 2023

No, it's gonna be masked only in Show R Code Modal (show_rcode_modal()) that takes rcode which I presume is a character string (taken from teal_data/qenv@code) and you only substitute there. The masking could be a parameter of show_rcode_modal

@chlebowa
Copy link
Contributor

The code can still be added to the report, would it not be masked then?

@m7pr
Copy link
Contributor

m7pr commented Oct 31, 2023

Alrighty @chlebowa so on Show R Code and Add Report Card actions. Two actions would trigger the substitution, and you'd need to work on a character string (probably the same way as we are currently working on regexes in this PR).

@chlebowa
Copy link
Contributor

I don't see how this is in any wat better than masking in the teal_data upon creation.

@m7pr
Copy link
Contributor

m7pr commented Oct 31, 2023

You postpone the masking part until it's really necessary. If a person does not create a report or does not check the Shown R Code, then the substitution action is not triggered at all.

@chlebowa
Copy link
Contributor

chlebowa commented Oct 31, 2023

Conversely, if you do want it, you have to do it twice.
And the action probably takes microseconds.

@m7pr
Copy link
Contributor

m7pr commented Oct 31, 2023

Ok forget it. You can just not include it in DDL lol.

Comment on lines +68 to +74
# Replace input$ with .()
code_strings <- gsub("input\\$(\\w+\\.?\\w*)", "\\.(\\1)", code_strings)
code_strings <- gsub("(input\\$)(`[^`]+`)", "\\.(\\2)", code_strings)

# Replace input[[ with .()
code_strings <- gsub("(input\\[\\[\")(\\w+\\.?\\w*)(\"\\]\\])", "\\.(\\2\\)", code_strings)
code_strings <- gsub("(input\\[\\[\")(\\w+\\-\\w+)\"\\]\\]", ".(`\\2`)", code_strings)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Replace input$ with .()
code_strings <- gsub("input\\$(\\w+\\.?\\w*)", "\\.(\\1)", code_strings)
code_strings <- gsub("(input\\$)(`[^`]+`)", "\\.(\\2)", code_strings)
# Replace input[[ with .()
code_strings <- gsub("(input\\[\\[\")(\\w+\\.?\\w*)(\"\\]\\])", "\\.(\\2\\)", code_strings)
code_strings <- gsub("(input\\[\\[\")(\\w+\\-\\w+)\"\\]\\]", ".(`\\2`)", code_strings)
code_strings <- gsub("(input\\$)([a-z0-9_.`-]+)", "\\.(\\2\\)", code_strings)
code_strings <- gsub("(input\\[\\[\")([a-z0-9_.`-]+)(\"\\]\\])", "\\.(\\2\\)", code_strings)
code_strings <- gsub("\\.\\(([^)`]*?)-([^)`]*?)\\)", ".(`\\1-\\2`)", code_strings)

input[["var-name"]] are converted .(var_name), addtional regaular expression add backticks around variable names to to ensure they are valid for substitution in R.

@@ -114,11 +114,10 @@ init <- function(data,
footer = tags$p(),
id = character(0)) {
logger::log_trace("init initializing teal app with: data ({ class(data)[1] }).")

if (!inherits(data, c("TealData", "teal_data"))) {
if (!inherits(data, c("TealData", "teal_data", "ddl"))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to roxygen argument description should follow this change

@donyunardi
Copy link
Contributor

I like the masking option, and the combination of "ddl/ddl_run" makes sense to me, so I prefer 1.1.

Even though 1.2 and 2.2 are simpler, I think we should encourage the use of all DDL arguments correctly to effectively conceal the pre-processing aspect of DDL.

I am okay to create vignette of the simpler way to cater our more advance users.

What about alternative solution of having special teal_module_data which can create ddl module? tm_teal_data() function will have the same arguments as current ddl and in the future it could #860

This is interesting. Instead of using a splash screen, we will present the DDL as a module as a new tab. How does this affect other modules though? Since ddl() creates teal_data which feeds to modules, if we serve ddl module as a new tab together with the rest of modules, how do we ensure that this module will be process first?

I suppose we could implement a similar approach with the landing popup. We could assess the class of the module, and if it's a DDL module, we can temporarily block or disable the other modules. After the submit button is clicked and authentication is successful, we can then re-enable the other modules.

@gogonzo gogonzo changed the title Introduce the new DDL Introduce the new DDL (ddl class) Nov 1, 2023
@m7pr
Copy link
Contributor

m7pr commented Nov 2, 2023

@donyunardi

This is interesting. Instead of using a splash screen, we will present the DDL as a module as a new tab. How does this affect other modules though? Since ddl() creates teal_data which feeds to modules, if we serve ddl module as a new tab together with the rest of modules, how do we ensure that this module will be process first?

I suppose we could implement a similar approach with the landing popup. We could assess the class of the module, and if it's a DDL module, we can temporarily block or disable the other modules. After the submit button is clicked and authentication is successful, we can then re-enable the other modules.

I had the very same intuition and I follow your idea as well. Could be handled very similarly to the way we handle landing_popup now.

@gogonzo
Copy link
Contributor Author

gogonzo commented Nov 3, 2023

I think we have all agree that this class is less favourable and ddl_run does too much (creates teal_data, eval code, mask code, assign datanames and join_keys). This is not possible to extend and leaves us developers in the situation where potential feature request will need a refactor.

@m7pr
Copy link
Contributor

m7pr commented Nov 3, 2023

less favourable or more favourable ;)?

@gogonzo
Copy link
Contributor Author

gogonzo commented Nov 3, 2023

Closing this to focus on https://github.com/insightsengineering/teal/pull/957/files

@gogonzo gogonzo closed this Nov 3, 2023
@gogonzo gogonzo deleted the ddl@teal_data@main branch November 3, 2023 12:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[summary] data refactor
8 participants