Consider migration to `dm` #8

gogonzo · 2022-02-14T10:23:38Z

Blocked by insightsengineering/teal#628 - we need more insights about how chevron and teal fits together.

Research CDISCData, CDISCFilteredData to store dm object to manage relationships between data.frames object.
Research how MAE can be stored
Can dm be primary data object passed to the teal_module?

Some initial research

Easy initialization of the relational data

library(dm)

ADSL <- synthetic_cdisc_data("latest")$adsl
ADTTE <- synthetic_cdisc_data("latest")$adtte
ADRS <- synthetic_cdisc_data("latest")$adrs
IRIS <- iris
IRIS$id <- seq_len(nrow(IRIS))

dm <- dm(ADSL, ADTTE, ADRS, IRIS) |>
  dm_add_pk(ADSL, teal::get_cdisc_keys("ADSL")) |>
  dm_add_pk(ADTTE, teal::get_cdisc_keys("ADTTE")) |>
  dm_add_pk(ADRS, teal::get_cdisc_keys("ADRS")) |>
  dm_add_pk(IRIS, "id") |>
  dm_add_fk(table = ADTTE, columns = teal::get_cdisc_keys("ADSL"), ref_table = ADSL) |>
  dm_add_fk(table = ADRS, columns = teal::get_cdisc_keys("ADSL"), ref_table = ADSL)

Get keys

# get primary and foreign keys
dm |> dm_get_all_pks()
dm |> dm_get_all_fks()

Check metadata
dm throws an error if the primary keys are duplicated

# check if keys are unique
dm |> dm_examine_constraints()
dm(ADSL, ADTTE, ADRS, IRIS) |>
  dm_add_pk(ADSL, teal::get_cdisc_keys("ADSL")) |>
  dm_add_pk(ADTTE, teal::get_cdisc_keys("ADSL")) |>
  dm_examine_constraints()

dm holds data in list, so the data is copied by reference

# same objects
identical(dm[["ADSL"]], ADSL)

Filters can be added to metadata with the dm_filter

# apply filter
dm |>
  dm_filter(ADSL, SEX == "M") |>
  dm_filter(ADRS, PARAMCD %in% c("BESRSPI", "INVET")) |>
  dm_filter(ADRS, AVISIT %in% c("BASELINE", "SCREENING", "CYCLE2", "DAY1"))

# filter and select
dm |>
  dm_filter(ADSL, SEX == "M") |>
  dm_filter(ADRS, PARAMCD %in% c("BESRSPI", "INVET")) |>
  dm_apply_filters() |>
  dm_select(ADSL, USUBJID, STUDYID, AGE) |>
  dm_flatten_to_tbl(ADRS) |>
  nrow()

Joining is performed automatically by dm

# join tables
joined <- dm |>
  dm_filter(ADSL, SEX == "M") |>
    dm_filter(ADRS, PARAMCD %in% c("BESRSPI", "INVET")) |>
    dm_apply_filters() |>
    dm_select(ADSL, USUBJID, STUDYID, AGE) |>
    dm_select(ADRS, USUBJID, STUDYID, PARAMCD, AVISIT, AGE, SEX, AVAL) |>
    dm_join_to_tbl(ADSL, ADRS, join = left_join)

dm object is mutable

library(dm)
library(scda)


ADSL <- synthetic_cdisc_data("latest")$adsl
ADTTE <- synthetic_cdisc_data("latest")$adtte
ADRS <- synthetic_cdisc_data("latest")$adrs
IRIS <- iris
IRIS$id <- seq_len(nrow(IRIS))



dm <- dm(ADSL, ADTTE, ADRS, IRIS) 
dm2 <- dm |>
  dm_add_pk(ADSL, teal.data::get_cdisc_keys("ADSL")) |>
  dm_add_pk(ADTTE, teal.data::get_cdisc_keys("ADTTE")) |>
  dm_add_pk(ADRS, teal.data::get_cdisc_keys("ADRS")) |>
  dm_add_pk(IRIS, "id") |>
  dm_add_fk(table = ADTTE, columns = teal.data::get_cdisc_keys("ADSL"), ref_table = ADSL) |>
  dm_add_fk(table = ADRS, columns = teal.data::get_cdisc_keys("ADSL"), ref_table = ADSL)

lobstr::obj_addr(dm)
lobstr::obj_addr(dm2)

dm operations preserve attributes

dm_filtered <- dm |>
  dm_filter(ADSL, SEX == "M") |>
  dm_filter(ADRS, PARAMCD %in% c("BESRSPI", "INVET")) |>
  dm_apply_filters() |>
  dm_select(ADSL, USUBJID, STUDYID, AGE) |>
  dm_select(ADRS, USUBJID, STUDYID, PARAMCD, AVISIT, AGE, SEX, AVAL) |>
  dm_select_tbl(ADSL, ADRS) |>
  dm_flatten_to_tbl(ADRS, ADSL)


attr(joined, "label")
formatters::var_labels(joined)

Issues:

dm renames duplicated the columns automatically so we might have a problem with difference between input column names and output column names. input$select_from_adsl and input$select_from_adrs won't be found in colnames(data)

~~won't support same dataset selectors by default. Following will not possible~~
Not needed since we are going to fix data_merge Design data extract and data merge NEST-roadmap#36

ANL1 <- ADRS %>% filter(PARAMCD == "PARAM1") %>% select(<keys without PARAMCD>, PARAM1 = AVAL) 
ANL2 <- ADRS %>% filter(PARAMCD == "PARAM2") %>% select(<keys without, PARAMCD>, PARAM2 = AVAL)
ANL <- merge(ANL1, ANL2, by = c(<keys without PARAMCD>))

Maybe we should reconsider this problematic option and just to do this:

ANL <- ADRS %>% filter(PARAMCD %in% c("PARAM1", "PARAM2")_) %>% select(<keys>, AVAL)

The text was updated successfully, but these errors were encountered:

kpagacz · 2022-05-19T12:43:40Z

Instead of replacing what we currently have with dm let's think about how we can support dm in the filter panel. I recommend first looking into how to make filter panel work with dm and then go to the data models. It goes to what I think should be the API of a teal module (so raw data passed independently of the filter state with a possibility of the filtering state being applied to the data inside the module).

gogonzo · 2022-05-19T13:00:17Z

@kpagacz Do you think about having FilteredData.dm? Could you elaborate more how would you expect this to look like?

gogonzo · 2022-05-25T03:27:41Z

Having dm in the filter panel could be a obstacle to have independent datasets reactivity. dm is one object so every change in the object (filter in any dataset) will trigger reactivity of the whole object, as we can't separate each dataset from the reactivity map.
See the example - change the filter in IRIS and ADSL-related output will wait for computation. Currently in the filter panel only child/parents datasets reactively-depended, other are unconnected.

library(shiny)
library(dm)
library(scda)


ADSL <- synthetic_cdisc_data("latest")$adsl
ADTTE <- synthetic_cdisc_data("latest")$adtte
ADRS <- synthetic_cdisc_data("latest")$adrs
IRIS <- iris
IRIS$id <- seq_len(nrow(IRIS))

dm <- dm(ADSL, ADTTE, ADRS, IRIS) |>
  dm_add_pk(ADSL, teal.data::get_cdisc_keys("ADSL")) |>
  dm_add_pk(ADTTE, teal.data::get_cdisc_keys("ADTTE")) |>
  dm_add_pk(ADRS, teal.data::get_cdisc_keys("ADRS")) |>
  dm_add_pk(IRIS, "id") |>
  dm_add_fk(table = ADTTE, columns = teal.data::get_cdisc_keys("ADSL"), ref_table = ADSL) |>
  dm_add_fk(table = ADRS, columns = teal.data::get_cdisc_keys("ADSL"), ref_table = ADSL)


shinyApp(
  ui = fluidPage(
    sidebarPanel(
      selectInput("Species", "Select species", choices = unique(iris$Species), selected = "Setosa"),
      verbatimTextOutput("ADSL_rows"),
      verbatimTextOutput("iris_rows")
    )
  ),
  server = function(input, output, session) {
    filtered_data <- eventReactive(input$Species, {
      dm_filter(dm, IRIS, Species %in% input$Species)
    })

    filtered_adsl <- reactive(filtered_data()[["ADSL"]])
    filtered_iris <- reactive(filtered_data()[["IRIS"]])

    output$ADSL_rows <- renderText({
      print("ADSL triggered")
      Sys.sleep(2)
      nrow(filtered_adsl())
    })

    output$iris_rows <- renderText({
      print("IRIS triggered")
      Sys.sleep(2)
      nrow(filtered_iris())
    })
  }
)

gogonzo · 2022-05-25T03:36:12Z

I have an opinion that dm can fix some of our problems but will introduce other:

FilterPanel - We can't build any reactivity between datasets which will trigger re-computation of all reactives regardless of datasets they are using. Specifically, in our case it means that each module within app will be affected by the change in any dataset. We might consider though usage of the same datasets in all modules, which is not a good idea as users requested filter-panel display being module specific.
UPDATE: Above is not necessary true - bindCache can actually handle if particular datasets changed.
If we allow "choices_selected" from multiple datasets, then dm wouldn't help us so much with the merge - when one selects duplicated variable names from different datasets then dm renames them automatically, making them hard to track in the merged dataset (unless our merge_datasets module will provide this information about renamed variables)

gogonzo added the enhancement label Feb 14, 2022

pawelru added refactor and removed enhancement labels Feb 24, 2022

gogonzo transferred this issue from insightsengineering/teal Mar 30, 2022

gogonzo mentioned this issue Apr 14, 2022

Research interaction between data, filter panel and modules insightsengineering/NEST-roadmap#32

Closed

8 tasks

Polkas added the research label Apr 19, 2022

insightsengineering deleted a comment from Polkas May 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider migration to `dm` #8

Consider migration to `dm` #8

gogonzo commented Feb 14, 2022 •

edited

Loading

kpagacz commented May 19, 2022

gogonzo commented May 19, 2022

gogonzo commented May 25, 2022

gogonzo commented May 25, 2022 •

edited

Loading

Consider migration to dm #8

Consider migration to dm #8

Comments

gogonzo commented Feb 14, 2022 • edited Loading

Some initial research

kpagacz commented May 19, 2022

gogonzo commented May 19, 2022

gogonzo commented May 25, 2022

gogonzo commented May 25, 2022 • edited Loading

Consider migration to `dm` #8

Consider migration to `dm` #8

gogonzo commented Feb 14, 2022 •

edited

Loading

gogonzo commented May 25, 2022 •

edited

Loading