From d9156149ba744f640bc656a399f8ef8c2771bafa Mon Sep 17 00:00:00 2001 From: Melkiades Date: Fri, 21 Apr 2023 15:00:49 +0200 Subject: [PATCH 01/40] init --- _pkgdown.yml | 6 +++- vignettes/dg_split_machinery.Rmd | 49 ++++++++++++++++++++++++++++++++ 2 files changed, 54 insertions(+), 1 deletion(-) create mode 100644 vignettes/dg_split_machinery.Rmd diff --git a/_pkgdown.yml b/_pkgdown.yml index d911a6599..be51bd350 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -28,7 +28,11 @@ articles: - custom_appearance - advanced_usage - format_precedence - + - title: For Developers + desc: Vignettes aimed at package developers + contents: + - dg_split_machinery + reference: - title: Argument Conventions diff --git a/vignettes/dg_split_machinery.Rmd b/vignettes/dg_split_machinery.Rmd new file mode 100644 index 000000000..7a0e5daf2 --- /dev/null +++ b/vignettes/dg_split_machinery.Rmd @@ -0,0 +1,49 @@ +--- +title: "The Split Machinery" +author: "Davide Garolini" +date: '`r Sys.Date()`' +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{rtables Advanced Usage} + %\VignetteEncoding{UTF-8} + %\VignetteEngine{knitr::rmarkdown} +editor_options: + chunk_output_type: console +--- + +```{r setup, include=FALSE} +knitr::opts_chunk$set(echo = TRUE) +``` +## Disclaimer + +This vignette is currently under development. Any code or prose which +appears in a version of this vignette on the `main` branch of the +repository will work/be correct, but they likely are not in their +final form. + + +## The Split Machinery + +The scope of this vignette is understanding how `rtables` creates facets by splitting the incoming data into hierarchical groups that go from the root node to singular `rcell`s. The latter level, also called leaf-level, contains the final partition that is subjected to the analysis functions. + +```{r, message=FALSE} +library(rtables) +``` + +This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see . + +When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this: + +```{r cars} +summary(cars) +``` + +## Including Plots + +You can also embed plots, for example: + +```{r pressure, echo=FALSE} +plot(pressure) +``` + +Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot. From fd1df8182e8428c2668b9b4d769bb9e7eff7eaf3 Mon Sep 17 00:00:00 2001 From: Melkiades Date: Fri, 21 Apr 2023 15:01:08 +0200 Subject: [PATCH 02/40] init 2 --- vignettes/dg_split_machinery.Rmd | 23 +++++------------------ 1 file changed, 5 insertions(+), 18 deletions(-) diff --git a/vignettes/dg_split_machinery.Rmd b/vignettes/dg_split_machinery.Rmd index 7a0e5daf2..3649341b7 100644 --- a/vignettes/dg_split_machinery.Rmd +++ b/vignettes/dg_split_machinery.Rmd @@ -24,26 +24,13 @@ final form. ## The Split Machinery -The scope of this vignette is understanding how `rtables` creates facets by splitting the incoming data into hierarchical groups that go from the root node to singular `rcell`s. The latter level, also called leaf-level, contains the final partition that is subjected to the analysis functions. +The scope of this vignette is understanding how `rtables` creates facets by splitting the incoming data into hierarchical groups that go from the root node to singular `rcell`s. The latter level, also called leaf-level, contains the final partition that is subjected to the analysis functions. More details from the user perspective can be found in relevant vignette `split_functions` and function documentation like `?split_rows_by` and `?split_funcs`. -```{r, message=FALSE} -library(rtables) -``` - -This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see . - -When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this: +The following will describe how the split machinery mainly for the row domain. Further information on how columns are defined will follow. -```{r cars} -summary(cars) -``` - -## Including Plots -You can also embed plots, for example: +## `do_split` -```{r pressure, echo=FALSE} -plot(pressure) +```{r, message=FALSE} +library(rtables) ``` - -Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot. From ce4272decccb2fe1b2cd8bc5fbfd62a8c05502d5 Mon Sep 17 00:00:00 2001 From: Melkiades Date: Fri, 21 Apr 2023 15:54:34 +0200 Subject: [PATCH 03/40] first example --- vignettes/dg_split_machinery.Rmd | 26 ++++++++++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/vignettes/dg_split_machinery.Rmd b/vignettes/dg_split_machinery.Rmd index 3649341b7..6d5d415b8 100644 --- a/vignettes/dg_split_machinery.Rmd +++ b/vignettes/dg_split_machinery.Rmd @@ -24,13 +24,35 @@ final form. ## The Split Machinery -The scope of this vignette is understanding how `rtables` creates facets by splitting the incoming data into hierarchical groups that go from the root node to singular `rcell`s. The latter level, also called leaf-level, contains the final partition that is subjected to the analysis functions. More details from the user perspective can be found in relevant vignette `split_functions` and function documentation like `?split_rows_by` and `?split_funcs`. +The scope of this vignette is understanding how `rtables` creates facets by splitting the incoming data into hierarchical groups that go from the root node to singular `rcell`s. The latter level, also called leaf-level, contains the final partition that is subjected to the analysis functions. More details from the user perspective can be found in relevant vignette `split_functions`(link?) and function documentation like `?split_rows_by` and `?split_funcs`. -The following will describe how the split machinery mainly for the row domain. Further information on how columns are defined will follow. +The following vignette will describe how the split machinery works for the row domain. Further information on how columns are defined will follow soon. + +NB: we must remind the reader that `rtables` is still under active development, and it saw the efforts of multiple contributors across different years. Therefore, we could stumble upon legacy mechanisms and a couple of on-going transformations that could look different in the future. + + +## Process and Methods + +We invite the smart developer to use the provided examples as a way to get an "interactive" and dynamic vision of the internal algorithms, as they are routinely executed when constructing tables with `rtables`. This is achieved by heavy use of `browser()` and `debug()` on internal functions between the others (`rtables:::`), as we will see in a moment. We invite also the continuous and autonomous exploration of the multiple `S3` and `S4` objects that constitute the complexity and power of `rtables`. To do so we will use functions like `methods()` (?). + +We explore and analyze the split machinery now with a growing amount of complexity, always following relevant functions and methods throughout their execution. By incrementally discussing important and special cases, we hope to be able to give you a good understanding of how the split machinery works. + +In practice, the majority of the split engine resides in the source file `split_funs.R` with occasional incursion into `make_split_fun.R` for custom split function creation and rarer references to other more general tabulation files. ## `do_split` +The split machinery is so fundamental to `rtables` that relevant functions like `do_split` are executed, even when no split is requested. The following example shows how this is done. + ```{r, message=FALSE} library(rtables) +debugonce(rtables:::do_split) +basic_table() %>% + build_table(DM) ``` + +Now, looking at the first function called from `do_split` may give us a good overview of how the split itself is defined. This is, of course. a check-function (`check_validsplit`) that is used to verify if the split is valid for the data. If you search the package for `check_validsplit`, you will find that it is defined as a generic in `split_funs`, where it is applied to the following "split" classes: `VarLevelSplit`, `MultiVarSplit`, `VAnalyzeSplit`, `CompoundSplit`, and `Split`. The virtual class `VAnalyzeSplit` (convention: it starts with "V") defines the main parent of analysis split which we discuss in detail in related vignette (here ?). Already, we can intuit that the `analyze()` calls actually mimic split objects as they create different results under a specific final split (or node). Now, notice that `check_validsplit` is also called in another location, i.e. in the main `tt_dotabulation.R` source file. This is again something related to making the analyze rows as it mainly checks for `VAnalyzeSplit`. We will discuss the other classes when they will appear in our examples. + +For the moment, we see with `class(spl)` (from the main `do_split` browsing option) that we are dealing with an `AllSplit` object. + + From c6f2b005768d71deeb8b0e05aa9b315f811b91c4 Mon Sep 17 00:00:00 2001 From: Melkiades Date: Fri, 21 Apr 2023 17:43:08 +0200 Subject: [PATCH 04/40] more code --- vignettes/dg_split_machinery.Rmd | 171 +++++++++++++++++++++++++++++-- 1 file changed, 165 insertions(+), 6 deletions(-) diff --git a/vignettes/dg_split_machinery.Rmd b/vignettes/dg_split_machinery.Rmd index 6d5d415b8..9657d7062 100644 --- a/vignettes/dg_split_machinery.Rmd +++ b/vignettes/dg_split_machinery.Rmd @@ -22,7 +22,7 @@ repository will work/be correct, but they likely are not in their final form. -## The Split Machinery +# The Split Machinery The scope of this vignette is understanding how `rtables` creates facets by splitting the incoming data into hierarchical groups that go from the root node to singular `rcell`s. The latter level, also called leaf-level, contains the final partition that is subjected to the analysis functions. More details from the user perspective can be found in relevant vignette `split_functions`(link?) and function documentation like `?split_rows_by` and `?split_funcs`. @@ -33,7 +33,12 @@ NB: we must remind the reader that `rtables` is still under active development, ## Process and Methods -We invite the smart developer to use the provided examples as a way to get an "interactive" and dynamic vision of the internal algorithms, as they are routinely executed when constructing tables with `rtables`. This is achieved by heavy use of `browser()` and `debug()` on internal functions between the others (`rtables:::`), as we will see in a moment. We invite also the continuous and autonomous exploration of the multiple `S3` and `S4` objects that constitute the complexity and power of `rtables`. To do so we will use functions like `methods()` (?). +We invite the smart developer to use the provided examples as a way to get an "interactive" and dynamic vision of the internal algorithms, as they are routinely executed when constructing tables with `rtables`. This is achieved by heavy use of `browser()` and `debugonce()` on internal functions between the others (`rtables:::`), as we will see in a moment. We invite also the continuous and autonomous exploration of the multiple `S3` and `S4` objects that constitute the complexity and power of `rtables`. To do so we will use the following useful functions: + +* `methods()`: This function lists the methods that are available for a generic function. `showMethods()` is better for having more detailed information about each method (e.g. inheritance). +* `class()`: This function returns the class of an object. If the class is not one of the built-in classes in R, you can use this information to search for documentation or examples of how to work with the class (?). Also `help()` calls may be informative here, as it will call the documentation of the specific class. +* `str()`: This function provides a detailed summary of the structure of an object, including its class and the names and classes of its components. This can be problematic with some objects in `rtables` as they may depend on a cascade of other complex objects. Similarly, `attributes()` can be used to retrieve useful information, even if storing important variables in this way is currently discouraged and deprecated. +* `summary()`: This can be useful at times if used on objects as it reveals if they are `S4` or `S3`. This can be retrieved also with `mode()`. We explore and analyze the split machinery now with a growing amount of complexity, always following relevant functions and methods throughout their execution. By incrementally discussing important and special cases, we hope to be able to give you a good understanding of how the split machinery works. @@ -42,17 +47,171 @@ In practice, the majority of the split engine resides in the source file `split_ ## `do_split` -The split machinery is so fundamental to `rtables` that relevant functions like `do_split` are executed, even when no split is requested. The following example shows how this is done. +The split machinery is so fundamental to `rtables` that relevant functions like `do_split` are executed, even when no split is requested. The following example shows how we can enter `do_split` and start understanding the class hierarchy and the main split engine. ```{r, message=FALSE} library(rtables) -debugonce(rtables:::do_split) +# debugonce(rtables:::do_split) # Uncomment me to enter the function!!! basic_table() %>% build_table(DM) ``` +We will enter `do_split`. Here, we copied it so to allow the reader to go through the general structure with its enhanced comments and sections. Each section in the code reflects roughly a section of this vignette. + +```{r, eval=FALSE} +### NB This is called at EACH level of recursive splitting +do_split <- function(spl, + df, + vals = NULL, + labels = NULL, + trim = FALSE, + spl_context) { +# CHECKS # + ## this will error if, e.g., df doesn't have columns + ## required by spl, or generally any time the spl + ## can't be applied to df + check_validsplit(spl, df) + +# SPLIT FUNCTION # + ## note the <- here!!! + if(!is.null(splfun <- split_fun(spl))) { + ## Currently the contract is that split_functions take df, vals, labels and + ## return list(values=., datasplit=., labels = .), optionally with + ## an additional extras element + if(func_takes(splfun, ".spl_context")) { + ret <- tryCatch(splfun(df, spl, vals, labels, trim = trim, + .spl_context = spl_context), + error = function(e) e) ## rawvalues(spl_context )) + } else { + ret <- tryCatch(splfun(df, spl, vals, labels, trim = trim), + error = function(e) e) + } + if(is(ret, "error")) { + stop("Error applying custom split function: ", ret$message, "\n\tsplit: ", + class(spl), " (", payloadmsg(spl), ")\n", + "\toccured at path: ", + spl_context_to_disp_path(spl_context), "\n") + } + } else { +# .apply_split_inner # + # This is called when no split function is provided. Please note that also when provided, + # this function will be probably called, as far as the main splitting method is not willingly + # modified by the split function. + ret <- .apply_split_inner(df = df, spl = spl, vals = vals, labels = labels, trim = trim) + } + +# EXTRA # + ## this adds .ref_full and .in_ref_col + if(is(spl, "VarLevWBaselineSplit")) + ret <- .add_ref_extras(spl, df, ret) + +# FIXUPVALS # + ## this: + ## - guarantees that ret$values contains SplitValue objects + ## - removes the extras element since its redundant after the above + ## - Ensures datasplit and values lists are named according to labels + ## - ensures labels are character not factor + ret <- .fixupvals(ret) + +# RETURN # + ret +} +``` + + +### Checks and classes + +Now, looking at the first function called from `do_split` may give us a good overview of how the split itself is defined. This is, of course, the check-function (`check_validsplit`) that is used to verify if the split is valid for the data. In the following we will describe step-by-step the split-class hierarchy, but we invite the reader to explore this autonomously in future occasions. + +Lets then search the package for `check_validsplit`, you will find that it is defined as a generic in `split_funs`, where it is applied to the following "split" classes: `VarLevelSplit`, `MultiVarSplit`, `VAnalyzeSplit`, `CompoundSplit`, and `Split`. Another way to find this information, which is more useful for more spread out and complicated objects, is using `showMethods(check_validsplit)`. The virtual class `VAnalyzeSplit` (convention: it starts with "V") defines the main parent of analysis split which we discuss in detail in related vignette (here ?). Already, we can intuit that the `analyze()` calls actually mimic split objects as they create different results under a specific final split (or node). Now, notice that `check_validsplit` is also called in another location, i.e. in the main `tt_dotabulation.R` source file. This is again something related to making the analyze rows as it mainly checks for `VAnalyzeSplit`. We will discuss the other classes when they will appear in our examples. + +For the moment, we see with `class(spl)` (from the main `do_split` browsing option) that we are dealing with an `AllSplit` object. By calling `showMethods(check_validsplit)` we will produce the following: + +``` +Function: check_validsplit (package rtables) +spl="AllSplit" + (inherited from: spl="Split") +spl="CompoundSplit" +spl="MultiVarSplit" +spl="Split" +spl="VAnalyzeSplit" +spl="VarLevelSplit" +``` -Now, looking at the first function called from `do_split` may give us a good overview of how the split itself is defined. This is, of course. a check-function (`check_validsplit`) that is used to verify if the split is valid for the data. If you search the package for `check_validsplit`, you will find that it is defined as a generic in `split_funs`, where it is applied to the following "split" classes: `VarLevelSplit`, `MultiVarSplit`, `VAnalyzeSplit`, `CompoundSplit`, and `Split`. The virtual class `VAnalyzeSplit` (convention: it starts with "V") defines the main parent of analysis split which we discuss in detail in related vignette (here ?). Already, we can intuit that the `analyze()` calls actually mimic split objects as they create different results under a specific final split (or node). Now, notice that `check_validsplit` is also called in another location, i.e. in the main `tt_dotabulation.R` source file. This is again something related to making the analyze rows as it mainly checks for `VAnalyzeSplit`. We will discuss the other classes when they will appear in our examples. +We understand that `AllSplit` is a class parent of `Split`. Its definition and constructor resides in `00tabletrees.R`. Reading it structure can be useful to understand how the split object is constructed and handled. Please see the comments in the following: + +```{r, eval=FALSE} +setClass("AllSplit", contains = "Split") + +AllSplit <- function(split_label = "", + cfun = NULL, + cformat = NULL, + cna_str = NA_character_, + split_format = NULL, + split_na_str = NA_character_, + split_name = NULL, + extra_args = list(), + indent_mod = 0L, + cindent_mod = 0L, + cvar = "", + cextra_args = list(), + ...) { + if(is.null(split_name)) { # If the split has no name + if(nzchar(split_label)) # (std is "") + split_name <- split_label + else + split_name <- "all obs" # Nor label, a standard split with all + # observations is assigned. + } + new("AllSplit", split_label = split_label, + content_fun = cfun, + content_format = cformat, + content_na_str = cna_str, + split_format = split_format, + split_na_str = split_na_str, + name = split_name, + label_children = FALSE, + extra_args = extra_args, + indent_modifier = as.integer(indent_mod), + content_indent_modifier = as.integer(cindent_mod), + content_var = cvar, + split_label_position = "hidden", + content_extra_args = cextra_args, + page_title_prefix = NA_character_, + child_section_div = NA_character_) +} +``` -For the moment, we see with `class(spl)` (from the main `do_split` browsing option) that we are dealing with an `AllSplit` object. +Now lets see if we can find some of these values in our object: +```{r, eval=FALSE} +Browse[2]> str(spl) +Formal class 'AllSplit' [package "rtables"] with 17 slots + ..@ payload : NULL + ..@ name : chr "all obs" + ..@ split_label : chr "" + ..@ split_format : NULL + ..@ split_na_str : chr NA + ..@ split_label_position : chr "hidden" + ..@ content_fun : NULL + ..@ content_format : NULL + ..@ content_na_str : chr NA + ..@ content_var : chr "" + ..@ label_children : logi FALSE + ..@ extra_args : list() + ..@ indent_modifier : int 0 + ..@ content_indent_modifier: int 0 + ..@ content_extra_args : list() + ..@ page_title_prefix : chr NA + ..@ child_section_div : chr NA +``` + +We will describe some of these more in detail when they will be necessary in future examples. Now, we gave you a hint of the complex class hierarchy that makes up `rtables`, but we need to go forward in `do_split`. In our case, being `AllSplit` inherited from `Split`, we are sure that the called function will be the following (note the comment!): +```{r, eval=FALSE} +## default does nothing, add methods as they become +## required +setMethod("check_validsplit", "Split", + function(spl, df) + invisible(NULL)) +``` +### Split function and `.apply_split_inner` \ No newline at end of file From 7d2246aab8d5151d22223fb0d0d894bddc692f17 Mon Sep 17 00:00:00 2001 From: Melkiades Date: Fri, 21 Apr 2023 18:00:43 +0200 Subject: [PATCH 05/40] more --- vignettes/dg_split_machinery.Rmd | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/vignettes/dg_split_machinery.Rmd b/vignettes/dg_split_machinery.Rmd index 9657d7062..e51bbb029 100644 --- a/vignettes/dg_split_machinery.Rmd +++ b/vignettes/dg_split_machinery.Rmd @@ -116,6 +116,7 @@ do_split <- function(spl, ret } ``` +We will see how input parameters are used and where. The most important ones are `spl` and `df`; the split objects and the input `data.frame`. We invite the reader to try exploring a bit `spl` before continuing. ### Checks and classes @@ -214,4 +215,8 @@ setMethod("check_validsplit", "Split", invisible(NULL)) ``` -### Split function and `.apply_split_inner` \ No newline at end of file +### Split function and `.apply_split_inner` + +Before diving into custom split functions we need to take a moment to analyze how `.apply_split_inner` works. This function is routinely called, also in the case we do have a split function. Lets see why by entering it with `debugonce(.apply_split_inner)`. Of course, we are still in the `do_split` call coming from the first example. + + From 14f9cb68c35cc43d9c191fb1f1bc279e64755427 Mon Sep 17 00:00:00 2001 From: Melkiades Date: Fri, 21 Apr 2023 19:17:42 +0200 Subject: [PATCH 06/40] modifications after discussions --- vignettes/dg_split_machinery.Rmd | 53 ++++++++++++++++++++++++-------- 1 file changed, 40 insertions(+), 13 deletions(-) diff --git a/vignettes/dg_split_machinery.Rmd b/vignettes/dg_split_machinery.Rmd index e51bbb029..37c30883a 100644 --- a/vignettes/dg_split_machinery.Rmd +++ b/vignettes/dg_split_machinery.Rmd @@ -24,7 +24,7 @@ final form. # The Split Machinery -The scope of this vignette is understanding how `rtables` creates facets by splitting the incoming data into hierarchical groups that go from the root node to singular `rcell`s. The latter level, also called leaf-level, contains the final partition that is subjected to the analysis functions. More details from the user perspective can be found in relevant vignette `split_functions`(link?) and function documentation like `?split_rows_by` and `?split_funcs`. +The scope of this vignette is understanding how `rtables` creates facets by splitting the incoming data into hierarchical groups that go from the root node to singular `rcell`s. The latter level, also called leaf-level, contains the final partition that is subjected to the analysis functions. More details from the user perspective can be found in relevant vignette `split_functions`(link xxx) and function documentation like `?split_rows_by` and `?split_funcs`. The following vignette will describe how the split machinery works for the row domain. Further information on how columns are defined will follow soon. @@ -35,14 +35,14 @@ NB: we must remind the reader that `rtables` is still under active development, We invite the smart developer to use the provided examples as a way to get an "interactive" and dynamic vision of the internal algorithms, as they are routinely executed when constructing tables with `rtables`. This is achieved by heavy use of `browser()` and `debugonce()` on internal functions between the others (`rtables:::`), as we will see in a moment. We invite also the continuous and autonomous exploration of the multiple `S3` and `S4` objects that constitute the complexity and power of `rtables`. To do so we will use the following useful functions: -* `methods()`: This function lists the methods that are available for a generic function. `showMethods()` is better for having more detailed information about each method (e.g. inheritance). -* `class()`: This function returns the class of an object. If the class is not one of the built-in classes in R, you can use this information to search for documentation or examples of how to work with the class (?). Also `help()` calls may be informative here, as it will call the documentation of the specific class. -* `str()`: This function provides a detailed summary of the structure of an object, including its class and the names and classes of its components. This can be problematic with some objects in `rtables` as they may depend on a cascade of other complex objects. Similarly, `attributes()` can be used to retrieve useful information, even if storing important variables in this way is currently discouraged and deprecated. -* `summary()`: This can be useful at times if used on objects as it reveals if they are `S4` or `S3`. This can be retrieved also with `mode()`. +* `methods(generic.function)`: This function lists the methods that are available for a generic function. `showMethods(generic.function)` is better for having more detailed information about each method (e.g. inheritance). +* `class(object)`: This function returns the class of an object. If the class is not one of the built-in classes in R, you can use this information to search for documentation or examples of how to work with the class (xxx). Also `help()` calls may be informative here, as it will call the documentation of the specific class. +* `getClass` (xxx)`str(object, max.level = 2)`: This function provides a detailed summary of the structure of an object, including its class and the names and classes of its components. This can be problematic with some objects in `rtables` as they may depend on a cascade of other complex objects. Similarly, `attributes()` can be used to retrieve useful information, even if storing important variables in this way is currently discouraged and deprecated. +* `summary(object)`: This can be useful at times if used on objects as it reveals if they are `S4` or `S3`. This can be retrieved also with `mode()`. (`is.S4`xxx) We explore and analyze the split machinery now with a growing amount of complexity, always following relevant functions and methods throughout their execution. By incrementally discussing important and special cases, we hope to be able to give you a good understanding of how the split machinery works. -In practice, the majority of the split engine resides in the source file `split_funs.R` with occasional incursion into `make_split_fun.R` for custom split function creation and rarer references to other more general tabulation files. +In practice, the majority of the split engine resides in the source file `R/split_funs.R` with occasional incursion into `R/make_split_fun.R` for custom split function creation and rarer references to other more general tabulation files. ## `do_split` @@ -50,12 +50,13 @@ In practice, the majority of the split engine resides in the source file `split_ The split machinery is so fundamental to `rtables` that relevant functions like `do_split` are executed, even when no split is requested. The following example shows how we can enter `do_split` and start understanding the class hierarchy and the main split engine. ```{r, message=FALSE} +## utility fnc xxx library(rtables) # debugonce(rtables:::do_split) # Uncomment me to enter the function!!! basic_table() %>% build_table(DM) ``` -We will enter `do_split`. Here, we copied it so to allow the reader to go through the general structure with its enhanced comments and sections. Each section in the code reflects roughly a section of this vignette. +In the following, we copied it so to allow the reader to go through the general structure with its enhanced comments and sections. Each section in the code reflects roughly a section of this vignette. ```{r, eval=FALSE} ### NB This is called at EACH level of recursive splitting @@ -119,13 +120,13 @@ do_split <- function(spl, We will see how input parameters are used and where. The most important ones are `spl` and `df`; the split objects and the input `data.frame`. We invite the reader to try exploring a bit `spl` before continuing. -### Checks and classes +### Checks and Classes Now, looking at the first function called from `do_split` may give us a good overview of how the split itself is defined. This is, of course, the check-function (`check_validsplit`) that is used to verify if the split is valid for the data. In the following we will describe step-by-step the split-class hierarchy, but we invite the reader to explore this autonomously in future occasions. -Lets then search the package for `check_validsplit`, you will find that it is defined as a generic in `split_funs`, where it is applied to the following "split" classes: `VarLevelSplit`, `MultiVarSplit`, `VAnalyzeSplit`, `CompoundSplit`, and `Split`. Another way to find this information, which is more useful for more spread out and complicated objects, is using `showMethods(check_validsplit)`. The virtual class `VAnalyzeSplit` (convention: it starts with "V") defines the main parent of analysis split which we discuss in detail in related vignette (here ?). Already, we can intuit that the `analyze()` calls actually mimic split objects as they create different results under a specific final split (or node). Now, notice that `check_validsplit` is also called in another location, i.e. in the main `tt_dotabulation.R` source file. This is again something related to making the analyze rows as it mainly checks for `VAnalyzeSplit`. We will discuss the other classes when they will appear in our examples. +Lets then search the package for `check_validsplit`, you will find that it is defined as a generic in `split_funs`, where it is applied to the following "split" classes: `VarLevelSplit`, `MultiVarSplit`, `VAnalyzeSplit`, `CompoundSplit`, and `Split`. Another way to find this information, which is more useful for more spread out and complicated objects, is using `showMethods(check_validsplit)`. The virtual class `VAnalyzeSplit` (convention: it starts with "V") defines the main parent of analysis split which we discuss in detail in related vignette (here xxx). Already, we can intuit that the `analyze()` calls actually mimic split objects as they create different results under a specific final split (or node). Now, notice that `check_validsplit` is also called in another location, i.e. in the main `R/tt_dotabulation.R` source file. This is again something related to making the "analyze" rows as it mainly checks for `VAnalyzeSplit` (link to tabulation dev guide xxx). We will discuss the other classes when they will appear in our examples (link to class hierarchy xxx). -For the moment, we see with `class(spl)` (from the main `do_split` browsing option) that we are dealing with an `AllSplit` object. By calling `showMethods(check_validsplit)` we will produce the following: +For the moment, we see with `class(spl)` (from the main `do_split` function) that we are dealing with an `AllSplit` object. By calling `showMethods(check_validsplit)` we will produce the following: (xxx add getClass inheritance call, describe that we have specific method for multivarsplit but not allsplit -> only inherited from split) ``` Function: check_validsplit (package rtables) @@ -138,7 +139,7 @@ spl="VAnalyzeSplit" spl="VarLevelSplit" ``` -We understand that `AllSplit` is a class parent of `Split`. Its definition and constructor resides in `00tabletrees.R`. Reading it structure can be useful to understand how the split object is constructed and handled. Please see the comments in the following: +We understand that `AllSplit` is a class parent of `Split`. This is a virtual class (xxx). All class definitions and constructors reside in `R/00tabletrees.R`, and also `AllSplit` one (xxx rephrase). Reading it structure can be useful to understand how the split object is constructed and handled. Please see the comments in the following: ```{r, eval=FALSE} setClass("AllSplit", contains = "Split") @@ -182,7 +183,7 @@ AllSplit <- function(split_label = "", } ``` -Now lets see if we can find some of these values in our object: +Now lets see if we can find some of these values in our object: (getClass xxx) ```{r, eval=FALSE} Browse[2]> str(spl) Formal class 'AllSplit' [package "rtables"] with 17 slots @@ -205,7 +206,7 @@ Formal class 'AllSplit' [package "rtables"] with 17 slots ..@ child_section_div : chr NA ``` -We will describe some of these more in detail when they will be necessary in future examples. Now, we gave you a hint of the complex class hierarchy that makes up `rtables`, but we need to go forward in `do_split`. In our case, being `AllSplit` inherited from `Split`, we are sure that the called function will be the following (note the comment!): +We will describe some of these more in detail when they will be necessary in future examples. Now, we gave you a hint of the complex class hierarchy that makes up `rtables`, but we need to go forward in `do_split`. In our case, being `AllSplit` inherited from `Split`, we are sure that the called function will be the following (read the comment!): ```{r, eval=FALSE} ## default does nothing, add methods as they become @@ -219,4 +220,30 @@ setMethod("check_validsplit", "Split", Before diving into custom split functions we need to take a moment to analyze how `.apply_split_inner` works. This function is routinely called, also in the case we do have a split function. Lets see why by entering it with `debugonce(.apply_split_inner)`. Of course, we are still in the `do_split` call coming from the first example. +example with simple split +example with simple split function + + + +Final examples with `MultiVarSplit` & `CompoundSplit` +These generics do different things for different classes. the power of S4 +```{r, eval=FALSE} +setGeneric(".applysplit_rawvals", # these with dots are ONLY internals or for devs + function(spl, df) standardGeneric(".applysplit_rawvals")) + +setGeneric(".applysplit_datapart", + function(spl, df, vals) standardGeneric(".applysplit_datapart")) + +setGeneric(".applysplit_extras", + function(spl, df, vals) standardGeneric(".applysplit_extras")) + +setGeneric(".applysplit_partlabels", + function(spl, df, vals, labels) standardGeneric(".applysplit_partlabels")) + +setGeneric("check_validsplit", # may be useful to be called one day (not exported bc it may change) + function(spl, df) standardGeneric("check_validsplit")) + +setGeneric(".applysplit_ref_vals", + function(spl, df, vals) standardGeneric(".applysplit_ref_vals")) +``` From 7377e2e977ac8fc7f8f15a870243b3312a2a158e Mon Sep 17 00:00:00 2001 From: Melkiades Date: Thu, 27 Apr 2023 15:32:34 +0200 Subject: [PATCH 07/40] corrections from Gabe and rephrasing --- vignettes/dg_split_machinery.Rmd | 54 ++++++++++++++++++-------------- 1 file changed, 30 insertions(+), 24 deletions(-) diff --git a/vignettes/dg_split_machinery.Rmd b/vignettes/dg_split_machinery.Rmd index 37c30883a..6f24595a7 100644 --- a/vignettes/dg_split_machinery.Rmd +++ b/vignettes/dg_split_machinery.Rmd @@ -24,23 +24,22 @@ final form. # The Split Machinery -The scope of this vignette is understanding how `rtables` creates facets by splitting the incoming data into hierarchical groups that go from the root node to singular `rcell`s. The latter level, also called leaf-level, contains the final partition that is subjected to the analysis functions. More details from the user perspective can be found in relevant vignette `split_functions`(link xxx) and function documentation like `?split_rows_by` and `?split_funcs`. +The scope of this vignette is understanding how `rtables` creates facets by splitting the incoming data into hierarchical groups that go from the root node to singular `rcell`s. The latter level, also called leaf-level, contains the final partition that is subjected to the analysis functions. More details from the user perspective can be found in relevant vignette `vignette("split_functions")` and function documentation like `?split_rows_by` and `?split_funcs`. The following vignette will describe how the split machinery works for the row domain. Further information on how columns are defined will follow soon. -NB: we must remind the reader that `rtables` is still under active development, and it saw the efforts of multiple contributors across different years. Therefore, we could stumble upon legacy mechanisms and a couple of on-going transformations that could look different in the future. +NB: we must remind the reader that `rtables` is still under active development, and it has seen the efforts of multiple contributors across different years. Therefore, there may be legacy mechanisms and a couple of on-going transformations that could look different in the future. ## Process and Methods -We invite the smart developer to use the provided examples as a way to get an "interactive" and dynamic vision of the internal algorithms, as they are routinely executed when constructing tables with `rtables`. This is achieved by heavy use of `browser()` and `debugonce()` on internal functions between the others (`rtables:::`), as we will see in a moment. We invite also the continuous and autonomous exploration of the multiple `S3` and `S4` objects that constitute the complexity and power of `rtables`. To do so we will use the following useful functions: +We invite the smart developer to use the provided examples as a way to get an "interactive" and dynamic vision of the internal algorithms, as they are routinely executed when constructing tables with `rtables`. This is achieved by using `browser()` and `debugonce()` on internal and exported functions (`rtables:::` or `rtables::`), as we will see in a moment. We invite also the continuous and autonomous exploration of the multiple `S3` and `S4` objects that constitute the complexity and power of `rtables`. To do so we will use the following useful functions: -* `methods(generic.function)`: This function lists the methods that are available for a generic function. `showMethods(generic.function)` is better for having more detailed information about each method (e.g. inheritance). -* `class(object)`: This function returns the class of an object. If the class is not one of the built-in classes in R, you can use this information to search for documentation or examples of how to work with the class (xxx). Also `help()` calls may be informative here, as it will call the documentation of the specific class. -* `getClass` (xxx)`str(object, max.level = 2)`: This function provides a detailed summary of the structure of an object, including its class and the names and classes of its components. This can be problematic with some objects in `rtables` as they may depend on a cascade of other complex objects. Similarly, `attributes()` can be used to retrieve useful information, even if storing important variables in this way is currently discouraged and deprecated. -* `summary(object)`: This can be useful at times if used on objects as it reveals if they are `S4` or `S3`. This can be retrieved also with `mode()`. (`is.S4`xxx) +* `methods(generic.function)`: This function lists the methods that are available for a generic function. For `S4` generic functions, `showMethods(generic.function)` is giving a more detailed information about each method (e.g. inheritance). +* `class(object)`: This function returns the class of an object. If the class is not one of the built-in classes in R, you can use this information to search for its documentation and examples. Also `help(class)` calls may be informative here, as it will call the documentation of the specific class. +* `getClass(class)`: This describes in a compact way the type of class, the slots that it has, and the relationships that it may have with the other classes that may inherit or be inherited by it. With `getClass(object)`, instead, we can see to which values the slots of the object are assigned. It is possible to use `str(object, max.level = 2)` to see a less formal and more compact descriptions of the slots, but it may be problematic when there are one or more objects in the class slots. Hence, the maximum number of level should always be limited to 2 or 3 (`max.level = 2`). Similarly, `attributes()` can be used to retrieve some information, but we need to remember that storing important variables in this way is not encouraged. Information regarding the type of class can be retrieved with `mode()` and indirectly by `summary()` and `is.S4()`. -We explore and analyze the split machinery now with a growing amount of complexity, always following relevant functions and methods throughout their execution. By incrementally discussing important and special cases, we hope to be able to give you a good understanding of how the split machinery works. +We explore and analyze the split machinery now with a growing amount of complexity, always following relevant functions and methods throughout their execution. By going from basic to complex and by discussing important and special cases, we hope to be able to give you a good understanding of how the split machinery works. In practice, the majority of the split engine resides in the source file `R/split_funs.R` with occasional incursion into `R/make_split_fun.R` for custom split function creation and rarer references to other more general tabulation files. @@ -56,6 +55,7 @@ library(rtables) basic_table() %>% build_table(DM) ``` + In the following, we copied it so to allow the reader to go through the general structure with its enhanced comments and sections. Each section in the code reflects roughly a section of this vignette. ```{r, eval=FALSE} @@ -67,12 +67,17 @@ do_split <- function(spl, trim = FALSE, spl_context) { # CHECKS # - ## this will error if, e.g., df doesn't have columns - ## required by spl, or generally any time the spl - ## can't be applied to df + ## This will error if, e.g., df does not have columns + ## required by spl, or generally any time the split (spl) + ## can not be applied to df check_validsplit(spl, df) # SPLIT FUNCTION # + ## In special cases, we need to partition data (split) + ## in a very specific way, e.g. depending on the data or + ## external values. These can be achieved by using a custom + ## split function. + ## note the <- here!!! if(!is.null(splfun <- split_fun(spl))) { ## Currently the contract is that split_functions take df, vals, labels and @@ -94,9 +99,9 @@ do_split <- function(spl, } } else { # .apply_split_inner # - # This is called when no split function is provided. Please note that also when provided, - # this function will be probably called, as far as the main splitting method is not willingly - # modified by the split function. + ## This is called when no split function is provided. Please note that also when provided, + ## this function will be probably called, as far as the main splitting method is not willingly + ## modified by the split function. ret <- .apply_split_inner(df = df, spl = spl, vals = vals, labels = labels, trim = trim) } @@ -117,16 +122,16 @@ do_split <- function(spl, ret } ``` -We will see how input parameters are used and where. The most important ones are `spl` and `df`; the split objects and the input `data.frame`. We invite the reader to try exploring a bit `spl` before continuing. +We will see how input parameters are used and where. The most important ones are `spl` and `df`; the split objects and the input `data.frame`. ### Checks and Classes -Now, looking at the first function called from `do_split` may give us a good overview of how the split itself is defined. This is, of course, the check-function (`check_validsplit`) that is used to verify if the split is valid for the data. In the following we will describe step-by-step the split-class hierarchy, but we invite the reader to explore this autonomously in future occasions. +We will start by looking at the first function called from `do_split`. This may give us a good overview of how the split itself is defined. This is, of course, the check-function (`check_validsplit`) that is used to verify if the split is valid for the data. In the following we will describe step-by-step the split-class hierarchy, but we invite the reader to explore this autonomously in future occasions. -Lets then search the package for `check_validsplit`, you will find that it is defined as a generic in `split_funs`, where it is applied to the following "split" classes: `VarLevelSplit`, `MultiVarSplit`, `VAnalyzeSplit`, `CompoundSplit`, and `Split`. Another way to find this information, which is more useful for more spread out and complicated objects, is using `showMethods(check_validsplit)`. The virtual class `VAnalyzeSplit` (convention: it starts with "V") defines the main parent of analysis split which we discuss in detail in related vignette (here xxx). Already, we can intuit that the `analyze()` calls actually mimic split objects as they create different results under a specific final split (or node). Now, notice that `check_validsplit` is also called in another location, i.e. in the main `R/tt_dotabulation.R` source file. This is again something related to making the "analyze" rows as it mainly checks for `VAnalyzeSplit` (link to tabulation dev guide xxx). We will discuss the other classes when they will appear in our examples (link to class hierarchy xxx). +Lets then search the package for `check_validsplit`, you will find that it is defined as a generic in `R/split_funs.R`, where it is applied to the following "split" classes: `VarLevelSplit`, `MultiVarSplit`, `VAnalyzeSplit`, `CompoundSplit`, and `Split`. Another way to find this information, which is more useful for more spread out and complicated objects, is using `showMethods(check_validsplit)`. The virtual class `VAnalyzeSplit` (convention: it starts with "V") defines the main parent of analysis split which we discuss in detail in related vignette `vignette()` (xxx). From this, we can intuit that the `analyze()` calls actually mimic split objects as they create different results under a specific final split (or node). Now, notice that `check_validsplit` is also called in another location, i.e. in the main `R/tt_dotabulation.R` source file. This is again something related to making the "analyze" rows as it mainly checks for `VAnalyzeSplit` (link to tabulation dev guide xxx). We will discuss the other classes when they will appear in our examples (link to class hierarchy xxx). -For the moment, we see with `class(spl)` (from the main `do_split` function) that we are dealing with an `AllSplit` object. By calling `showMethods(check_validsplit)` we will produce the following: (xxx add getClass inheritance call, describe that we have specific method for multivarsplit but not allsplit -> only inherited from split) +For the moment, we see with `class(spl)` (from the main `do_split` function) that we are dealing with an `AllSplit` object. By calling `showMethods(check_validsplit)` we will produce the following: ``` Function: check_validsplit (package rtables) @@ -138,8 +143,8 @@ spl="Split" spl="VAnalyzeSplit" spl="VarLevelSplit" ``` - -We understand that `AllSplit` is a class parent of `Split`. This is a virtual class (xxx). All class definitions and constructors reside in `R/00tabletrees.R`, and also `AllSplit` one (xxx rephrase). Reading it structure can be useful to understand how the split object is constructed and handled. Please see the comments in the following: +(xxx add getClass inheritance call, describe that we have specific method for multivarsplit but not allsplit -> only inherited from split) +It means that each of the listed classes has a dedicated definition of `check_validsplit` that may largely differ from the others. Only the class `AllSplit` does not have its own function definition as it is inherited from the `Split` class. Therefore, we understand that `AllSplit` is a class parent of `Split`. This is one of the first definition of a virtual class in the package and it is the only one that does not present the "V" prefix. Any of these classes are defined along with their constructor in `R/00tabletrees.R`. Reading how `AllSplit` is structured can be an useful example to understand how split objects are expected to work. Please see the comments in the following: ```{r, eval=FALSE} setClass("AllSplit", contains = "Split") @@ -183,9 +188,10 @@ AllSplit <- function(split_label = "", } ``` -Now lets see if we can find some of these values in our object: (getClass xxx) +We can see also print this information by calling `getClass("AllSplit")` for the general slot definition, or by calling `getClass(spl)` for having also all the values. Note that the first call will give also a lot of information about the class hierarchy. We will discuss the majority of these by the end of this document. Now lets see if we can find some of the values described in the constructor in our object. To do so we will show here the more compact representation given by `str`. When there are multiple and hierarchical slots that contain objects themselves, calling `str` will be much less informative. + ```{r, eval=FALSE} -Browse[2]> str(spl) +Browse[2]> str(spl, max.level = 2) Formal class 'AllSplit' [package "rtables"] with 17 slots ..@ payload : NULL ..@ name : chr "all obs" @@ -206,7 +212,7 @@ Formal class 'AllSplit' [package "rtables"] with 17 slots ..@ child_section_div : chr NA ``` -We will describe some of these more in detail when they will be necessary in future examples. Now, we gave you a hint of the complex class hierarchy that makes up `rtables`, but we need to go forward in `do_split`. In our case, being `AllSplit` inherited from `Split`, we are sure that the called function will be the following (read the comment!): +Detail about these slots will be necessary in future examples, and we will deal with them at that time. Now, we gave you a hint of the complex class hierarchy that makes up `rtables`, and how to explore it autonomously. Lets go forward in `do_split`. In our case, being `AllSplit` inherited from `Split`, we are sure that the called function will be the following (read the comment!): ```{r, eval=FALSE} ## default does nothing, add methods as they become @@ -218,7 +224,7 @@ setMethod("check_validsplit", "Split", ### Split function and `.apply_split_inner` -Before diving into custom split functions we need to take a moment to analyze how `.apply_split_inner` works. This function is routinely called, also in the case we do have a split function. Lets see why by entering it with `debugonce(.apply_split_inner)`. Of course, we are still in the `do_split` call coming from the first example. +Before diving into custom split functions we need to take a moment to analyze how `.apply_split_inner` works. This function is routinely called, also in the case we do have a split function. Lets see why this can be the case by entering it with `debugonce(.apply_split_inner)`. Of course, we are still browsing `do_split` in debug mode from the first example. example with simple split From c504274decb1c0b26e00a138f73fafde9c38d0d0 Mon Sep 17 00:00:00 2001 From: Melkiades Date: Thu, 27 Apr 2023 16:04:05 +0200 Subject: [PATCH 08/40] into inner --- vignettes/dg_split_machinery.Rmd | 110 ++++++++++++++++++++++++++++++- 1 file changed, 108 insertions(+), 2 deletions(-) diff --git a/vignettes/dg_split_machinery.Rmd b/vignettes/dg_split_machinery.Rmd index 6f24595a7..7cd12c31a 100644 --- a/vignettes/dg_split_machinery.Rmd +++ b/vignettes/dg_split_machinery.Rmd @@ -143,7 +143,6 @@ spl="Split" spl="VAnalyzeSplit" spl="VarLevelSplit" ``` -(xxx add getClass inheritance call, describe that we have specific method for multivarsplit but not allsplit -> only inherited from split) It means that each of the listed classes has a dedicated definition of `check_validsplit` that may largely differ from the others. Only the class `AllSplit` does not have its own function definition as it is inherited from the `Split` class. Therefore, we understand that `AllSplit` is a class parent of `Split`. This is one of the first definition of a virtual class in the package and it is the only one that does not present the "V" prefix. Any of these classes are defined along with their constructor in `R/00tabletrees.R`. Reading how `AllSplit` is structured can be an useful example to understand how split objects are expected to work. Please see the comments in the following: ```{r, eval=FALSE} @@ -224,7 +223,114 @@ setMethod("check_validsplit", "Split", ### Split function and `.apply_split_inner` -Before diving into custom split functions we need to take a moment to analyze how `.apply_split_inner` works. This function is routinely called, also in the case we do have a split function. Lets see why this can be the case by entering it with `debugonce(.apply_split_inner)`. Of course, we are still browsing `do_split` in debug mode from the first example. +Before diving into custom split functions we need to take a moment to analyze how `.apply_split_inner` works. This function is routinely called, also in the case we do have a split function. Lets see why this can be the case by entering it with `debugonce(.apply_split_inner)`. Of course, we are still browsing `do_split` in debug mode from the first example. We printed and commented it in the following: + +```{r, eval=FALSE} +.apply_split_inner <- function(spl, df, vals = NULL, labels = NULL, trim = FALSE) { + + ## try to calculate values first. Most of the time we can + if(is.null(vals)) + vals <- .applysplit_rawvals(spl, df) + extr <- .applysplit_extras(spl, df, vals) + + if(is.null(vals)) { + return(list(values = list(), + datasplit = list(), + labels = list(), + extras = list())) + } + + dpart <- .applysplit_datapart(spl, df, vals) + + if(is.null(labels)) + labels <- .applysplit_partlabels(spl, df, vals, labels) + else + stopifnot(names(labels) == names(vals)) + ## get rid of columns that would not have any + ## observations. + ## + ## But only if there were any rows to start with + ## if not we're in a manually constructed table + ## column tree + if(trim) { + hasdata <- sapply(dpart, function(x) nrow(x) > 0) + if(nrow(df) > 0 && length(dpart) > sum(hasdata)) { #some empties + dpart <- dpart[hasdata] + vals <- vals[hasdata] + extr <- extr[hasdata] + labels <- labels[hasdata] + } + } + + if(is.null(spl_child_order(spl)) || is(spl, "AllSplit")) { + vord <- seq_along(vals) + } else { + vord <- match(spl_child_order(spl), + vals) + vord <- vord[!is.na(vord)] + } + + + ## FIXME: should be an S4 object, not a list + ret <- list(values = vals[vord], + datasplit = dpart[vord], + labels = labels[vord], + extras = extr[vord]) + ret +} +``` + +After reading `.apply_split_inner`, we see that there are some fundamental functions, defined strictly for internal use (convention: they start with ".") that are generics and depend on the kind of split in input. `R/split_funs.R` is very kind and group their generic definition at the beginning of the file. These functions are the main dispatcher for the majority of the split machinery. This is a clear example that shows how using `S4` logic helps clarity and flexibility in programming, allowing for easy extension of the program. For compactness we show also the `showMethods` result for each generic. + +```{r, eval=FALSE} +setGeneric(".applysplit_rawvals", + function(spl, df) standardGeneric(".applysplit_rawvals")) +# Browse[2]> showMethods(.applysplit_rawvals) +# Function: .applysplit_rawvals (package rtables) +# spl="AllSplit" +# spl="ManualSplit" +# spl="MultiVarSplit" +# spl="VAnalyzeSplit" +# spl="VarLevelSplit" +# spl="VarStaticCutSplit" +# Nothing here is inherited from the virtual class Split!!! + +setGeneric(".applysplit_datapart", + function(spl, df, vals) standardGeneric(".applysplit_datapart")) +# Same as .applysplit_rawvals + +setGeneric(".applysplit_extras", + function(spl, df, vals) standardGeneric(".applysplit_extras")) +# Browse[2]> showMethods(.applysplit_extras) +# Function: .applysplit_extras (package rtables) +# spl="AllSplit" +# (inherited from: spl="Split") +# spl="Split" +# This means there is only a function for the virtual class Split!!! + +setGeneric(".applysplit_partlabels", + function(spl, df, vals, labels) standardGeneric(".applysplit_partlabels")) +# Browse[2]> showMethods(.applysplit_partlabels) +# Function: .applysplit_partlabels (package rtables) +# spl="AllSplit" +# (inherited from: spl="Split") +# spl="MultiVarSplit" +# spl="Split" +# spl="VarLevelSplit" + +setGeneric("check_validsplit", # our friend + function(spl, df) standardGeneric("check_validsplit")) +# Note: check_validsplit is an internal function but it is not excluded that one +# day it will be exported. That is way it does not have the "." prefix. + +setGeneric(".applysplit_ref_vals", + function(spl, df, vals) standardGeneric(".applysplit_ref_vals")) +# Browse[2]> showMethods(.applysplit_ref_vals) +# Function: .applysplit_ref_vals (package rtables) +# spl="Split" +# spl="VarLevWBaselineSplit" + +``` example with simple split From 6b41e3c2bd7977964a05d4e5679e42d90d9f05c4 Mon Sep 17 00:00:00 2001 From: Melkiades Date: Wed, 3 May 2023 09:26:56 +0200 Subject: [PATCH 09/40] temporary update --- vignettes/dg_split_machinery.Rmd | 72 ++++++++++++++++++++++++++++---- 1 file changed, 63 insertions(+), 9 deletions(-) diff --git a/vignettes/dg_split_machinery.Rmd b/vignettes/dg_split_machinery.Rmd index 7cd12c31a..25e46c082 100644 --- a/vignettes/dg_split_machinery.Rmd +++ b/vignettes/dg_split_machinery.Rmd @@ -32,10 +32,11 @@ NB: we must remind the reader that `rtables` is still under active development, ## Process and Methods +(xxx reference to class vignette with this) We invite the smart developer to use the provided examples as a way to get an "interactive" and dynamic vision of the internal algorithms, as they are routinely executed when constructing tables with `rtables`. This is achieved by using `browser()` and `debugonce()` on internal and exported functions (`rtables:::` or `rtables::`), as we will see in a moment. We invite also the continuous and autonomous exploration of the multiple `S3` and `S4` objects that constitute the complexity and power of `rtables`. To do so we will use the following useful functions: -* `methods(generic.function)`: This function lists the methods that are available for a generic function. For `S4` generic functions, `showMethods(generic.function)` is giving a more detailed information about each method (e.g. inheritance). +* `methods(generic.function)`: This function lists the methods that are available for a generic function. For `S4` generic functions, `showMethods(generic.function)` is giving a more detailed information about each method (e.g. inheritance). (xxx methods and showMethods are not showing the same. Second is showing only S4) * `class(object)`: This function returns the class of an object. If the class is not one of the built-in classes in R, you can use this information to search for its documentation and examples. Also `help(class)` calls may be informative here, as it will call the documentation of the specific class. * `getClass(class)`: This describes in a compact way the type of class, the slots that it has, and the relationships that it may have with the other classes that may inherit or be inherited by it. With `getClass(object)`, instead, we can see to which values the slots of the object are assigned. It is possible to use `str(object, max.level = 2)` to see a less formal and more compact descriptions of the slots, but it may be problematic when there are one or more objects in the class slots. Hence, the maximum number of level should always be limited to 2 or 3 (`max.level = 2`). Similarly, `attributes()` can be used to retrieve some information, but we need to remember that storing important variables in this way is not encouraged. Information regarding the type of class can be retrieved with `mode()` and indirectly by `summary()` and `is.S4()`. @@ -49,10 +50,9 @@ In practice, the majority of the split engine resides in the source file `R/spli The split machinery is so fundamental to `rtables` that relevant functions like `do_split` are executed, even when no split is requested. The following example shows how we can enter `do_split` and start understanding the class hierarchy and the main split engine. ```{r, message=FALSE} -## utility fnc xxx library(rtables) # debugonce(rtables:::do_split) # Uncomment me to enter the function!!! -basic_table() %>% +basic_table() %>% build_table(DM) ``` @@ -122,7 +122,7 @@ do_split <- function(spl, ret } ``` -We will see how input parameters are used and where. The most important ones are `spl` and `df`; the split objects and the input `data.frame`. +We will see how input parameters are used and where. The most important ones are `spl` and `df`: the split objects and the input `data.frame`. ### Checks and Classes @@ -187,7 +187,7 @@ AllSplit <- function(split_label = "", } ``` -We can see also print this information by calling `getClass("AllSplit")` for the general slot definition, or by calling `getClass(spl)` for having also all the values. Note that the first call will give also a lot of information about the class hierarchy. We will discuss the majority of these by the end of this document. Now lets see if we can find some of the values described in the constructor in our object. To do so we will show here the more compact representation given by `str`. When there are multiple and hierarchical slots that contain objects themselves, calling `str` will be much less informative. +We can see also print this information by calling `getClass("AllSplit")` for the general slot definition, or by calling `getClass(spl)` for having also all the values. Note that the first call will give also a lot of information about the class hierarchy. For more information regarding class hierarchy, please refer to the relevant vignette (xxx). We will discuss the majority of the slots by the end of this document. Now lets see if we can find some of the values described in the constructor in our object. To do so we will show here the more compact representation given by `str`. When there are multiple and hierarchical slots that contain objects themselves, calling `str` will be much less informative or not at all informative if the maximum level of nesting is not set (e.g. `max.level = 2`). ```{r, eval=FALSE} Browse[2]> str(spl, max.level = 2) @@ -211,7 +211,7 @@ Formal class 'AllSplit' [package "rtables"] with 17 slots ..@ child_section_div : chr NA ``` -Detail about these slots will be necessary in future examples, and we will deal with them at that time. Now, we gave you a hint of the complex class hierarchy that makes up `rtables`, and how to explore it autonomously. Lets go forward in `do_split`. In our case, being `AllSplit` inherited from `Split`, we are sure that the called function will be the following (read the comment!): +Details about these slots will be necessary in future examples, and we will deal with them at that time. Now, we gave you a hint of the complex class hierarchy that makes up `rtables`, and how to explore it autonomously. Lets go forward in `do_split`. In our case, being `AllSplit` inherited from `Split`, we are sure that the called function will be the following (read the comment!): ```{r, eval=FALSE} ## default does nothing, add methods as they become @@ -227,12 +227,19 @@ Before diving into custom split functions we need to take a moment to analyze ho ```{r, eval=FALSE} .apply_split_inner <- function(spl, df, vals = NULL, labels = NULL, trim = FALSE) { - + # INPUTS # + # In this case .applysplit_rawvals will attempt finding the split values if vals is NULL. + # Please notice that they might be a non-mutually exclusive set or subset of elements that + # will constitute the split. + ## try to calculate values first. Most of the time we can if(is.null(vals)) vals <- .applysplit_rawvals(spl, df) + + # This call extracts extra parameters from the split, according to the split values extr <- .applysplit_extras(spl, df, vals) + # If there are no values to do the split upon, we return an empty final split if(is.null(vals)) { return(list(values = list(), datasplit = list(), @@ -283,7 +290,9 @@ Before diving into custom split functions we need to take a moment to analyze ho After reading `.apply_split_inner`, we see that there are some fundamental functions, defined strictly for internal use (convention: they start with ".") that are generics and depend on the kind of split in input. `R/split_funs.R` is very kind and group their generic definition at the beginning of the file. These functions are the main dispatcher for the majority of the split machinery. This is a clear example that shows how using `S4` logic helps clarity and flexibility in programming, allowing for easy extension of the program. For compactness we show also the `showMethods` result for each generic. ```{r, eval=FALSE} -setGeneric(".applysplit_rawvals", +# Retrieves the values that will constitute the splits (facets), not necessarily a unique list. +# They could come from the data cuts for example -> it can be anything if it produces a set of strings. +setGeneric(".applysplit_rawvals", function(spl, df) standardGeneric(".applysplit_rawvals")) # Browse[2]> showMethods(.applysplit_rawvals) # Function: .applysplit_rawvals (package rtables) @@ -295,10 +304,12 @@ setGeneric(".applysplit_rawvals", # spl="VarStaticCutSplit" # Nothing here is inherited from the virtual class Split!!! +# Contains the subset of the data (default, but these can overlap, can also NOT be mutually exclusive). setGeneric(".applysplit_datapart", function(spl, df, vals) standardGeneric(".applysplit_datapart")) # Same as .applysplit_rawvals +# Extract the extra parameter for the split setGeneric(".applysplit_extras", function(spl, df, vals) standardGeneric(".applysplit_extras")) # Browse[2]> showMethods(.applysplit_extras) @@ -306,8 +317,10 @@ setGeneric(".applysplit_extras", # spl="AllSplit" # (inherited from: spl="Split") # spl="Split" -# This means there is only a function for the virtual class Split!!! +# This means there is only a function for the virtual class Split. +# So all splits behaves the same!!! +# Split label retrieval and assignment if visible. setGeneric(".applysplit_partlabels", function(spl, df, vals, labels) standardGeneric(".applysplit_partlabels")) # Browse[2]> showMethods(.applysplit_partlabels) @@ -332,10 +345,51 @@ setGeneric(".applysplit_ref_vals", ``` +Now, we know that `.applysplit_extras` is the function that will be called first because we did not specify any `vals` and it is therefore `NULL`. This is a generic function as it can be seen by `showMethod(.applysplit_extras)`. It is indeed an `S4` generics and its source code can be determined by the following: + +```{r, eval=FALSE} +Browse[3]> getMethod(".applysplit_rawvals", "AllSplit") +Method Definition: + +function (spl, df) +obj_name(spl) + + + +Signatures: + spl +target "AllSplit" +defined "AllSplit" + +# What is obj_name -> slot in spl +Browse[3]> obj_name(spl) +[1] "all obs" + +# coming from +Browse[3]> getMethod("obj_name", "Split") +Method Definition: + +function (obj) +obj@name ##### Slot that we could see from str(spl, max.level = 2) + + + +Signatures: + obj +target "Split" +defined "Split" +``` + +Then we have `.applysplit_extras` that will be covered in later sections and simply extracts the extra arguments from the split objects and assign them to their relative split values. If no split values are still available, the function will exit here with an empty split. Otherwise the data will be divided in different splits or data subsets (facets) with `.applysplit_datapart`. In our current example the resulting list comprises the whole input data set (i.e. do `getMethod(".applysplit_datapart", "AllSplit")` and a list will be evident: `function (spl, df, vals) list(df)`). + +Next, split labels are checked. If they are not present split values (`vals`) will be used with `.applysplit_partlabels` that, in the case of it being applied to a `Split` object, it translates into `as.character(vals)`. Otherwise, the inserted labels are checked against the name of split values. + example with simple split example with simple split function +example with simple split function + extra args (exploration too) + Final examples with `MultiVarSplit` & `CompoundSplit` From a1c1d07a06223f0623017797555a7162cabc6f43 Mon Sep 17 00:00:00 2001 From: Melkiades Date: Wed, 3 May 2023 15:01:43 +0200 Subject: [PATCH 10/40] going further --- vignettes/dg_split_machinery.Rmd | 173 +++++++++++++++++++++++++++---- 1 file changed, 152 insertions(+), 21 deletions(-) diff --git a/vignettes/dg_split_machinery.Rmd b/vignettes/dg_split_machinery.Rmd index 25e46c082..b3c27c589 100644 --- a/vignettes/dg_split_machinery.Rmd +++ b/vignettes/dg_split_machinery.Rmd @@ -36,9 +36,10 @@ NB: we must remind the reader that `rtables` is still under active development, We invite the smart developer to use the provided examples as a way to get an "interactive" and dynamic vision of the internal algorithms, as they are routinely executed when constructing tables with `rtables`. This is achieved by using `browser()` and `debugonce()` on internal and exported functions (`rtables:::` or `rtables::`), as we will see in a moment. We invite also the continuous and autonomous exploration of the multiple `S3` and `S4` objects that constitute the complexity and power of `rtables`. To do so we will use the following useful functions: -* `methods(generic.function)`: This function lists the methods that are available for a generic function. For `S4` generic functions, `showMethods(generic.function)` is giving a more detailed information about each method (e.g. inheritance). (xxx methods and showMethods are not showing the same. Second is showing only S4) -* `class(object)`: This function returns the class of an object. If the class is not one of the built-in classes in R, you can use this information to search for its documentation and examples. Also `help(class)` calls may be informative here, as it will call the documentation of the specific class. +* `methods(generic_function)`: This function lists the methods that are available for a generic function. Specifically for `S4` generic functions, `showMethods(generic_function)` is giving a more detailed information about each method (e.g. inheritance). +* `class(object)`: This function returns the class of an object. If the class is not one of the built-in classes in R, you can use this information to search for its documentation and examples. Indeed, `help(class)` may be informative, as it will call the documentation of the specific class. Similarly, the `?` operator will call the disambiguation page that delivers you to different `S4` methods. For `S3` methods it is necessary to postfix the class name with a dot (e.g. `?summary.lm`). * `getClass(class)`: This describes in a compact way the type of class, the slots that it has, and the relationships that it may have with the other classes that may inherit or be inherited by it. With `getClass(object)`, instead, we can see to which values the slots of the object are assigned. It is possible to use `str(object, max.level = 2)` to see a less formal and more compact descriptions of the slots, but it may be problematic when there are one or more objects in the class slots. Hence, the maximum number of level should always be limited to 2 or 3 (`max.level = 2`). Similarly, `attributes()` can be used to retrieve some information, but we need to remember that storing important variables in this way is not encouraged. Information regarding the type of class can be retrieved with `mode()` and indirectly by `summary()` and `is.S4()`. +*`getAnywhere(function)` is very useful to get the source code of internal functions and specific generics. It works very well with `S3` methods, and will in any case display for each of the found methods, its relevant namespace. Similarly, `getMethod(S4_generic, S4_class)` can retrieve the source code of `S4` methods that are specific to a class. We explore and analyze the split machinery now with a growing amount of complexity, always following relevant functions and methods throughout their execution. By going from basic to complex and by discussing important and special cases, we hope to be able to give you a good understanding of how the split machinery works. @@ -227,15 +228,17 @@ Before diving into custom split functions we need to take a moment to analyze ho ```{r, eval=FALSE} .apply_split_inner <- function(spl, df, vals = NULL, labels = NULL, trim = FALSE) { - # INPUTS # + # - INPUTS - # # In this case .applysplit_rawvals will attempt finding the split values if vals is NULL. # Please notice that they might be a non-mutually exclusive set or subset of elements that # will constitute the split. + # - SPLIT VALS - # ## try to calculate values first. Most of the time we can if(is.null(vals)) vals <- .applysplit_rawvals(spl, df) + # - EXTRA PARAMETERS - # # This call extracts extra parameters from the split, according to the split values extr <- .applysplit_extras(spl, df, vals) @@ -247,12 +250,16 @@ Before diving into custom split functions we need to take a moment to analyze ho extras = list())) } + # - DATA SUBSETTING - # dpart <- .applysplit_datapart(spl, df, vals) + # - LABEL RETRIEVAL - # if(is.null(labels)) labels <- .applysplit_partlabels(spl, df, vals, labels) else stopifnot(names(labels) == names(vals)) + + # - TRIM - # ## get rid of columns that would not have any ## observations. ## @@ -269,6 +276,8 @@ Before diving into custom split functions we need to take a moment to analyze ho } } + # - ORDER RESULTS - # + # Finds relevant order depending on spl_child_order() if(is.null(spl_child_order(spl)) || is(spl, "AllSplit")) { vord <- seq_along(vals) } else { @@ -382,34 +391,156 @@ defined "Split" Then we have `.applysplit_extras` that will be covered in later sections and simply extracts the extra arguments from the split objects and assign them to their relative split values. If no split values are still available, the function will exit here with an empty split. Otherwise the data will be divided in different splits or data subsets (facets) with `.applysplit_datapart`. In our current example the resulting list comprises the whole input data set (i.e. do `getMethod(".applysplit_datapart", "AllSplit")` and a list will be evident: `function (spl, df, vals) list(df)`). -Next, split labels are checked. If they are not present split values (`vals`) will be used with `.applysplit_partlabels` that, in the case of it being applied to a `Split` object, it translates into `as.character(vals)`. Otherwise, the inserted labels are checked against the name of split values. +Next, split labels are checked. If they are not present split values (`vals`) will be used with `.applysplit_partlabels` that, in the case of it being applied to a `Split` object, it translates into `as.character(vals)`. Otherwise, the inserted labels are checked against the name of split values. -example with simple split +Lastly, the split values are ordered on the basis of `spl_child_order`. In our case, which concerns the general `AllSplit`, the sorting will not happen, i.e. it will be simply dependent on the number of split values `seq_along(vals)`. -example with simple split function +#### A simple split -example with simple split function + extra args (exploration too) +In the following, we demonstrate how row splits work according to the features that we have already described. We add two splits and see how `do_split` behavior changes. Note that if we do not add an `analyze` call, the split will behave as before, giving an empty table with all observations. As default, calling `analyze` on a variable will produce a mean for each data subset that has been generated by the splits. We want to go beyond the first call (xxx, why??) of `do_split` that is by design on all observation with the purpose of generating the root split that contains all data. To achieve this goal we need to use `debug(fnc)` instead of `debugonce(fnc)` as we will need to step in each of the splits. +```{r, message=FALSE} +library(rtables) +# debug(rtables:::do_split) # Uncomment me to enter the function!!! +basic_table() %>% + split_rows_by("SEX") %>% + split_rows_by("ARM") %>% + analyze("BMRKR1") %>% # analysis is needed to produce the splits !! + build_table(DM) %>% + prune_table() # only to take out NAs in print +# undebug(rtables:::do_split) # reset the debug mode +``` +Now, we might want to check the formal class of `spl` before anything else. -Final examples with `MultiVarSplit` & `CompoundSplit` -These generics do different things for different classes. the power of S4 ```{r, eval=FALSE} -setGeneric(".applysplit_rawvals", # these with dots are ONLY internals or for devs - function(spl, df) standardGeneric(".applysplit_rawvals")) +Browse[2]> str(spl, max.level = 2) +Formal class 'VarLevelSplit' [package "rtables"] with 20 slots + ..@ value_label_var : chr "SEX" + ..@ value_order : chr [1:4] "F" "M" "U" "UNDIFFERENTIATED" + ..@ split_fun : NULL + ..@ payload : chr "SEX" + ..@ name : chr "SEX" + ..@ split_label : chr "SEX" + ..@ split_format : NULL + ..@ split_na_str : chr NA + ..@ split_label_position : chr "hidden" + ..@ content_fun : NULL + ..@ content_format : NULL + ..@ content_na_str : chr NA + ..@ content_var : chr "" + ..@ label_children : logi NA + ..@ extra_args : list() + ..@ indent_modifier : int 0 + ..@ content_indent_modifier: int 0 + ..@ content_extra_args : list() + ..@ page_title_prefix : chr NA + ..@ child_section_div : chr NA +``` -setGeneric(".applysplit_datapart", - function(spl, df, vals) standardGeneric(".applysplit_datapart")) +From this, we can directly infer that the class is different now (`VarLevelSplit`) and understand that the split label will be hidden (`split_label_position` slot). Moreover, we see a specific value order with specific split values. Also, `VarLevelSplit` seems to have three more slots than `AllSplit`. What are they precisely? +```{r, eval=FALSE} +slots_as <- getSlots("AllSplit") # inherits virtual class Split and is general class for all splits +# getClass("CustomizableSplit") # -> Extends: "Split", Known Subclasses: Class "VarLevelSplit", directly +slots_cs <- getSlots("CustomizableSplit") +slots_vls <- getSlots("VarLevelSplit") + +slots_cs[!(names(slots_cs) %in% names(slots_as))] +# split_fun +# "functionOrNULL" +slots_vls[!(names(slots_vls) %in% names(slots_cs))] +# value_label_var value_order +# "character" "ANY" +``` -setGeneric(".applysplit_extras", - function(spl, df, vals) standardGeneric(".applysplit_extras")) +Remember always to check the constructor and class definition inside `R/00tabletrees.R` if exploratory tools do not suffice. Now, `check_validsplit(spl, df)` will dispatch to a different method than before (`getMethod("check_validsplit", "VarLevelSplit")`). Indeed, it uses the internal utility function `.checkvarsok` to check if the `vars`, i.e. the `payload` is actually present in `names(df)`. -setGeneric(".applysplit_partlabels", - function(spl, df, vals, labels) standardGeneric(".applysplit_partlabels")) +Now, the next relevant function will be `.apply_split_inner` where we want to see exactly what changes (`debugonce(.apply_split_inner)`). Of course, this function is directly called as no custom split function is provided. Being parameter `vals` not specified (`NULL`), the split values are retrieved from `df` by using the split payload to select specific columns (`varvec <- df[[spl_payload(spl)]]`). Every time no split values are specified, they will be retrieved from the selected column as unique values, if character, or levels, if factor. -setGeneric("check_validsplit", # may be useful to be called one day (not exported bc it may change) - function(spl, df) standardGeneric("check_validsplit")) +Next, `.applysplit_datapart` creates a named list of facets or data subsets. In this case, the result is actually a mutually exclusive partition of the data. `.applysplit_partlabels` is a bit less linear as it has to take into account the possibility of having specified labels in the payload (xxx). Beside looking at the function source code with `getMethod(".applysplit_partlabels", "VarLevelSplit")`, we can enter in debugging mode the `S4` generic function as follows: -setGeneric(".applysplit_ref_vals", - function(spl, df, vals) standardGeneric(".applysplit_ref_vals")) +```{r, eval=FALSE} +eval(debugcall(.applysplit_partlabels(spl, df, vals, labels))) +# We leave to the smart developer to see how the labels are assigned +``` +In our case, the final labels are `vals` as nothing different was specified. Their order is retrieved from the split object (`spl_child_order(spl)`) and matched with current split values. The returned list is then processed as it was before. + +If we continue with the next call of `do_split`, the same procedure is accomplished for the second `ARM` split. Before concluding this iteration, we take a moment to talk a bit more in detail about how `.fixupvals(partinfo)` works. It is not a generic function and the source code can be easily accessed as follows. We will not describe this into details, but we suggest to run through it with `debugonce(.fixupvals)` to understand what it does in practice. + +```{r, eval=FALSE} +# Note: comments are added to enhance readability +Browse[2]> .fixupvals +function(partinfo) { + # Guarantee of character labels + if(is.factor(partinfo$labels)) + partinfo$labels <- as.character(partinfo$labels) + + # General import of values (in theory this may be an S4 obj in the future) + vals <- partinfo$values + if(is.factor(vals)) + vals <- levels(vals)[vals] + extr <- partinfo$extras + dpart <- partinfo$datasplit + labels <- partinfo$labels + + # Assigning labels at any cost + if(is.null(labels)) { + if(!is.null(names(vals))) + labels <- names(vals) + else if(!is.null(names(dpart))) + labels <- names(dpart) + else if (!is.null(names(extr))) + labels <- names(extr) + } + + if(is.null(vals) && !is.null(extr)) + vals <- seq_along(extr) + + # No split will happen as there are no split values + if(length(vals) == 0) { + stopifnot(length(extr) == 0) + return(partinfo) + } + ## length(vals) > 0 from here down + + if(are(vals, "SplitValue") && !are(vals, "LevelComboSplitValue")) { + if(!is.null(extr)) { + warning("Got a partinfo list with values that are ", + "already SplitValue objects and non-null extras ", + "element. This shouldn't happen") + } + } else { + # reformatting of extra args + if(is.null(extr)) + extr <- rep(list(list()), length(vals)) +# ---> # Main constructor of SplitValue objects + vals <- make_splvalue_vec(vals, extr, labels = labels) + } + ## we're done with this so take it off + partinfo$extras <- NULL + + vnames <- value_names(vals) + names(vals) <- vnames + partinfo$values <- vals + + if(!identical(names(dpart), vnames)) { + names(dpart) <- vnames + partinfo$datasplit <- dpart + } + + + partinfo$labels <- labels + + # Check to have only single elements in partinfo (xxx) + stopifnot(length(unique(sapply(partinfo, NROW))) == 1) + partinfo ``` + +example with simple split function + +example with simple split function + extra args (exploration too) + + + +Final examples with `MultiVarSplit` & `CompoundSplit` + From d3ae1bdc1a27e74a7c3b8e1abcfd7349cd53710c Mon Sep 17 00:00:00 2001 From: Melkiades Date: Wed, 3 May 2023 17:54:49 +0200 Subject: [PATCH 11/40] some fixes --- vignettes/dg_split_machinery.Rmd | 91 +++++++++++++++++++++++++++++--- 1 file changed, 83 insertions(+), 8 deletions(-) diff --git a/vignettes/dg_split_machinery.Rmd b/vignettes/dg_split_machinery.Rmd index b3c27c589..46e51856e 100644 --- a/vignettes/dg_split_machinery.Rmd +++ b/vignettes/dg_split_machinery.Rmd @@ -39,7 +39,8 @@ We invite the smart developer to use the provided examples as a way to get an "i * `methods(generic_function)`: This function lists the methods that are available for a generic function. Specifically for `S4` generic functions, `showMethods(generic_function)` is giving a more detailed information about each method (e.g. inheritance). * `class(object)`: This function returns the class of an object. If the class is not one of the built-in classes in R, you can use this information to search for its documentation and examples. Indeed, `help(class)` may be informative, as it will call the documentation of the specific class. Similarly, the `?` operator will call the disambiguation page that delivers you to different `S4` methods. For `S3` methods it is necessary to postfix the class name with a dot (e.g. `?summary.lm`). * `getClass(class)`: This describes in a compact way the type of class, the slots that it has, and the relationships that it may have with the other classes that may inherit or be inherited by it. With `getClass(object)`, instead, we can see to which values the slots of the object are assigned. It is possible to use `str(object, max.level = 2)` to see a less formal and more compact descriptions of the slots, but it may be problematic when there are one or more objects in the class slots. Hence, the maximum number of level should always be limited to 2 or 3 (`max.level = 2`). Similarly, `attributes()` can be used to retrieve some information, but we need to remember that storing important variables in this way is not encouraged. Information regarding the type of class can be retrieved with `mode()` and indirectly by `summary()` and `is.S4()`. -*`getAnywhere(function)` is very useful to get the source code of internal functions and specific generics. It works very well with `S3` methods, and will in any case display for each of the found methods, its relevant namespace. Similarly, `getMethod(S4_generic, S4_class)` can retrieve the source code of `S4` methods that are specific to a class. +*`getAnywhere(function)` is very useful to get the source code of internal functions and specific generics. It works very well with `S3` methods, and will in any case display for each of the found methods, its relevant namespace. Similarly, `getMethod(S4_generic, S4_class)` can retrieve the source code of `S4` methods that are specific to a class. +* `eval(debugcall(generic_function(obj)))`: this is a very useful way to browser a `S4` method defined specifically for a defined object without having to manually insert `browser()` into the code. We explore and analyze the split machinery now with a growing amount of complexity, always following relevant functions and methods throughout their execution. By going from basic to complex and by discussing important and special cases, we hope to be able to give you a good understanding of how the split machinery works. @@ -401,13 +402,18 @@ In the following, we demonstrate how row splits work according to the features t ```{r, message=FALSE} library(rtables) +library(dplyr) +# This filter is added to avoid having too many calls to do_split +DM_tmp <- DM %>% + filter(SEX %in% c("M", "F")) %>% + mutate(SEX = factor(SEX)) # to drop unattended levels + # debug(rtables:::do_split) # Uncomment me to enter the function!!! basic_table() %>% split_rows_by("SEX") %>% split_rows_by("ARM") %>% analyze("BMRKR1") %>% # analysis is needed to produce the splits !! - build_table(DM) %>% - prune_table() # only to take out NAs in print + build_table(DM_tmp) # undebug(rtables:::do_split) # reset the debug mode ``` @@ -439,6 +445,7 @@ Formal class 'VarLevelSplit' [package "rtables"] with 20 slots ``` From this, we can directly infer that the class is different now (`VarLevelSplit`) and understand that the split label will be hidden (`split_label_position` slot). Moreover, we see a specific value order with specific split values. Also, `VarLevelSplit` seems to have three more slots than `AllSplit`. What are they precisely? + ```{r, eval=FALSE} slots_as <- getSlots("AllSplit") # inherits virtual class Split and is general class for all splits # getClass("CustomizableSplit") # -> Extends: "Split", Known Subclasses: Class "VarLevelSplit", directly @@ -462,10 +469,12 @@ Next, `.applysplit_datapart` creates a named list of facets or data subsets. In ```{r, eval=FALSE} eval(debugcall(.applysplit_partlabels(spl, df, vals, labels))) # We leave to the smart developer to see how the labels are assigned + +# PS: remember to undebugcall() similarly ``` In our case, the final labels are `vals` as nothing different was specified. Their order is retrieved from the split object (`spl_child_order(spl)`) and matched with current split values. The returned list is then processed as it was before. -If we continue with the next call of `do_split`, the same procedure is accomplished for the second `ARM` split. Before concluding this iteration, we take a moment to talk a bit more in detail about how `.fixupvals(partinfo)` works. It is not a generic function and the source code can be easily accessed as follows. We will not describe this into details, but we suggest to run through it with `debugonce(.fixupvals)` to understand what it does in practice. +If we continue with the next call of `do_split`, the same procedure is accomplished for the second `ARM` split. This is done on the partition that was already done in the first split. The only give out of this is the fact that the main `df` is constituted by a subset (facet) of the total data, according to the first split. This will be done iteratively for as many data split as requested. Before concluding this iteration, we take a moment to talk a bit more in detail about how `.fixupvals(partinfo)` works. It is not a generic function and the source code can be easily accessed as follows. We will not describe this into details, but we suggest to run through it with `debugonce(.fixupvals)` to understand what it does in practice. ```{r, eval=FALSE} # Note: comments are added to enhance readability @@ -513,10 +522,10 @@ function(partinfo) { # reformatting of extra args if(is.null(extr)) extr <- rep(list(list()), length(vals)) -# ---> # Main constructor of SplitValue objects +# ---> # Main list of SplitValue objects: iterative call of new("SplitValue", value = val, extra = extr, label = label) vals <- make_splvalue_vec(vals, extr, labels = labels) } - ## we're done with this so take it off + ## we're done with this so take it off (it is already in SplitValue) partinfo$extras <- NULL vnames <- value_names(vals) @@ -536,11 +545,77 @@ function(partinfo) { partinfo ``` -example with simple split function +#### Alreadymade split functions + +We start with a custom split function that is already defined in `rtables`. Its scope is filtering out specific values as follows: +```{r, message=FALSE} +library(rtables) +# debug(rtables:::do_split) # uncomment to see into the main split function +basic_table() %>% + split_rows_by("SEX", split_fun = drop_split_levels) %>% + split_rows_by("ARM") %>% + analyze("BMRKR1") %>% + build_table(DM) +# undebug(rtables:::do_split) + +# PS: this produces the same output as before with the filters +``` +After skipping the root split, we enter the split based on column `SEX`. As we specified a split function, we entered the following code block: +```{r, eval=FALSE} +# Note: the spl object still belongs to VarLevelSplit class + ## note the <- here!!! # Extracts the split function from split object + if(!is.null(splfun <- split_fun(spl))) { + ## Currently the contract is that split_functions take df, vals, labels and + ## return list(values=., datasplit=., labels = .), optionally with + ## an additional extras element # passed through by means of spl object + if(func_takes(splfun, ".spl_context")) { # Does splfun have .spl_context?? + # Does splfun use the .spl_context (which is always there, now as root) + ret <- tryCatch(splfun(df, spl, vals, labels, trim = trim, + .spl_context = spl_context), + error = function(e) e) ## rawvalues(spl_context )) + } else { + ret <- tryCatch(splfun(df, spl, vals, labels, trim = trim), + error = function(e) e) + } + if(is(ret, "error")) { + stop("Error applying custom split function: ", ret$message, "\n\tsplit: ", + class(spl), " (", payloadmsg(spl), ")\n", + "\toccured at path: ", + spl_context_to_disp_path(spl_context), "\n") + } + } # etc... +``` + +Here, we invite to always keep a keen eye on `spl_context`, as it is fundamental for more sophisticate splits, e.g. in the cases where the split itself depends mainly on preceding splits or values. Please, when the split function is called, take a moment to look at how `drop_split_levels` is built. You will see that it is fundamentally a wrapper of `.apply_split_inner` that drops empty factor levels, therefore avoiding empty split. + +```{r, eval=FALSE} +> drop_split_levels +function(df, + spl, + vals = NULL, + labels = NULL, + trim = FALSE) { + # Retrieve split column + var <- spl_payload(spl) + df2 <- df + + ## This call is exactly the one we did in the filtering to get rid of empty levels + df2[[var]] <- factor(df[[var]]) + + ## Our main function! + .apply_split_inner(spl, df2, vals = vals, + labels = labels, + trim = trim) +} +``` +#### Custom split functions + -example with simple split function + extra args (exploration too) +### Extra arguments - `extra_args` +(xxx - `analyze_colvars(my_afun, extra_args = list(ref_rowgroup = "V1"))`) +### (xxx - the other parameters... trim??) Final examples with `MultiVarSplit` & `CompoundSplit` From 32f32b5e84219cbc7e888cd03ded880f147779d5 Mon Sep 17 00:00:00 2001 From: Melkiades Date: Thu, 11 May 2023 12:11:58 +0200 Subject: [PATCH 12/40] update --- vignettes/dg_split_machinery.Rmd | 277 +++++++++++++++++-------------- 1 file changed, 151 insertions(+), 126 deletions(-) diff --git a/vignettes/dg_split_machinery.Rmd b/vignettes/dg_split_machinery.Rmd index 46e51856e..b25c82cc3 100644 --- a/vignettes/dg_split_machinery.Rmd +++ b/vignettes/dg_split_machinery.Rmd @@ -14,12 +14,10 @@ editor_options: ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` + ## Disclaimer -This vignette is currently under development. Any code or prose which -appears in a version of this vignette on the `main` branch of the -repository will work/be correct, but they likely are not in their -final form. +Any code or prose which appears in a version of this vignette on the `main` branch of the repository may reflect a specific state of things that can be more or less recent. This guide reflects very important pieces of the split machinery that are unlikely to change. We anyway invite the reader to think always that the current code may have drifted from the following document, and it is always the best practice to read directly the code on `main`. # The Split Machinery @@ -40,7 +38,7 @@ We invite the smart developer to use the provided examples as a way to get an "i * `class(object)`: This function returns the class of an object. If the class is not one of the built-in classes in R, you can use this information to search for its documentation and examples. Indeed, `help(class)` may be informative, as it will call the documentation of the specific class. Similarly, the `?` operator will call the disambiguation page that delivers you to different `S4` methods. For `S3` methods it is necessary to postfix the class name with a dot (e.g. `?summary.lm`). * `getClass(class)`: This describes in a compact way the type of class, the slots that it has, and the relationships that it may have with the other classes that may inherit or be inherited by it. With `getClass(object)`, instead, we can see to which values the slots of the object are assigned. It is possible to use `str(object, max.level = 2)` to see a less formal and more compact descriptions of the slots, but it may be problematic when there are one or more objects in the class slots. Hence, the maximum number of level should always be limited to 2 or 3 (`max.level = 2`). Similarly, `attributes()` can be used to retrieve some information, but we need to remember that storing important variables in this way is not encouraged. Information regarding the type of class can be retrieved with `mode()` and indirectly by `summary()` and `is.S4()`. *`getAnywhere(function)` is very useful to get the source code of internal functions and specific generics. It works very well with `S3` methods, and will in any case display for each of the found methods, its relevant namespace. Similarly, `getMethod(S4_generic, S4_class)` can retrieve the source code of `S4` methods that are specific to a class. -* `eval(debugcall(generic_function(obj)))`: this is a very useful way to browser a `S4` method defined specifically for a defined object without having to manually insert `browser()` into the code. +* `eval(debugcall(generic_function(obj)))`: this is a very useful way to browser a `S4` method defined specifically for a defined object without having to manually insert `browser()` into the code. It is also possible to do similarly with R > 3.4.0 where `debug*()` calls can have the triggering signature (class) specified. Both of these are modern and simplified wrappers of tracing function `trace()`. We explore and analyze the split machinery now with a growing amount of complexity, always following relevant functions and methods throughout their execution. By going from basic to complex and by discussing important and special cases, we hope to be able to give you a good understanding of how the split machinery works. @@ -61,6 +59,7 @@ basic_table() %>% In the following, we copied it so to allow the reader to go through the general structure with its enhanced comments and sections. Each section in the code reflects roughly a section of this vignette. ```{r, eval=FALSE} +# rtables 6.0.2 ### NB This is called at EACH level of recursive splitting do_split <- function(spl, df, @@ -136,6 +135,7 @@ Lets then search the package for `check_validsplit`, you will find that it is de For the moment, we see with `class(spl)` (from the main `do_split` function) that we are dealing with an `AllSplit` object. By calling `showMethods(check_validsplit)` we will produce the following: ``` +# rtables 6.0.2 Function: check_validsplit (package rtables) spl="AllSplit" (inherited from: spl="Split") @@ -148,6 +148,7 @@ spl="VarLevelSplit" It means that each of the listed classes has a dedicated definition of `check_validsplit` that may largely differ from the others. Only the class `AllSplit` does not have its own function definition as it is inherited from the `Split` class. Therefore, we understand that `AllSplit` is a class parent of `Split`. This is one of the first definition of a virtual class in the package and it is the only one that does not present the "V" prefix. Any of these classes are defined along with their constructor in `R/00tabletrees.R`. Reading how `AllSplit` is structured can be an useful example to understand how split objects are expected to work. Please see the comments in the following: ```{r, eval=FALSE} +# rtables 6.0.2 setClass("AllSplit", contains = "Split") AllSplit <- function(split_label = "", @@ -192,6 +193,7 @@ AllSplit <- function(split_label = "", We can see also print this information by calling `getClass("AllSplit")` for the general slot definition, or by calling `getClass(spl)` for having also all the values. Note that the first call will give also a lot of information about the class hierarchy. For more information regarding class hierarchy, please refer to the relevant vignette (xxx). We will discuss the majority of the slots by the end of this document. Now lets see if we can find some of the values described in the constructor in our object. To do so we will show here the more compact representation given by `str`. When there are multiple and hierarchical slots that contain objects themselves, calling `str` will be much less informative or not at all informative if the maximum level of nesting is not set (e.g. `max.level = 2`). ```{r, eval=FALSE} +# rtables 6.0.2 Browse[2]> str(spl, max.level = 2) Formal class 'AllSplit' [package "rtables"] with 17 slots ..@ payload : NULL @@ -216,6 +218,7 @@ Formal class 'AllSplit' [package "rtables"] with 17 slots Details about these slots will be necessary in future examples, and we will deal with them at that time. Now, we gave you a hint of the complex class hierarchy that makes up `rtables`, and how to explore it autonomously. Lets go forward in `do_split`. In our case, being `AllSplit` inherited from `Split`, we are sure that the called function will be the following (read the comment!): ```{r, eval=FALSE} +# rtables 6.0.2 ## default does nothing, add methods as they become ## required setMethod("check_validsplit", "Split", @@ -228,6 +231,7 @@ setMethod("check_validsplit", "Split", Before diving into custom split functions we need to take a moment to analyze how `.apply_split_inner` works. This function is routinely called, also in the case we do have a split function. Lets see why this can be the case by entering it with `debugonce(.apply_split_inner)`. Of course, we are still browsing `do_split` in debug mode from the first example. We printed and commented it in the following: ```{r, eval=FALSE} +# rtables 6.0.2 .apply_split_inner <- function(spl, df, vals = NULL, labels = NULL, trim = FALSE) { # - INPUTS - # # In this case .applysplit_rawvals will attempt finding the split values if vals is NULL. @@ -300,6 +304,7 @@ Before diving into custom split functions we need to take a moment to analyze ho After reading `.apply_split_inner`, we see that there are some fundamental functions, defined strictly for internal use (convention: they start with ".") that are generics and depend on the kind of split in input. `R/split_funs.R` is very kind and group their generic definition at the beginning of the file. These functions are the main dispatcher for the majority of the split machinery. This is a clear example that shows how using `S4` logic helps clarity and flexibility in programming, allowing for easy extension of the program. For compactness we show also the `showMethods` result for each generic. ```{r, eval=FALSE} +# rtables 6.0.2 # Retrieves the values that will constitute the splits (facets), not necessarily a unique list. # They could come from the data cuts for example -> it can be anything if it produces a set of strings. setGeneric(".applysplit_rawvals", @@ -358,13 +363,12 @@ setGeneric(".applysplit_ref_vals", Now, we know that `.applysplit_extras` is the function that will be called first because we did not specify any `vals` and it is therefore `NULL`. This is a generic function as it can be seen by `showMethod(.applysplit_extras)`. It is indeed an `S4` generics and its source code can be determined by the following: ```{r, eval=FALSE} +# rtables 6.0.2 Browse[3]> getMethod(".applysplit_rawvals", "AllSplit") Method Definition: function (spl, df) obj_name(spl) - - Signatures: spl @@ -381,8 +385,6 @@ Method Definition: function (obj) obj@name ##### Slot that we could see from str(spl, max.level = 2) - - Signatures: obj @@ -398,36 +400,42 @@ Lastly, the split values are ordered on the basis of `spl_child_order`. In our c #### A simple split -In the following, we demonstrate how row splits work according to the features that we have already described. We add two splits and see how `do_split` behavior changes. Note that if we do not add an `analyze` call, the split will behave as before, giving an empty table with all observations. As default, calling `analyze` on a variable will produce a mean for each data subset that has been generated by the splits. We want to go beyond the first call (xxx, why??) of `do_split` that is by design on all observation with the purpose of generating the root split that contains all data. To achieve this goal we need to use `debug(fnc)` instead of `debugonce(fnc)` as we will need to step in each of the splits. +In the following, we demonstrate how row splits work according to the features that we have already described. We add two splits and see how `do_split` behavior changes. Note that if we do not add an `analyze` call, the split will behave as before, giving an empty table with all observations. As default, calling `analyze` on a variable will produce a mean for each data subset that has been generated by the splits. We want to go beyond the first call of `do_split` that is by design on all observation with the purpose of generating the root split that contains all data and all the splits (indeed `AllSplit`). To achieve this goal we can use `debug(rtables:::do_split)` instead of `debugonce(rtables:::do_split)` as we will need to step in each of the splits. Alternatively, it is possible to use the more powerful `trace` function to enter specifically in the case the input is from a specific class. To do so the following can be used: `trace("do_split", quote(if(!is(spl, "AllSplit")) browser()), where = asNamespace("rtables"))`. Note that we had to specify the namespace with where. Multiple tracer elements can be added with `expression(E1, E2)` which is the same as `c(quote(E1), quote(E2))`. Specific steps can be specified with the `at` parameter. Remember to do `untrace("do_split", quote(if(!is(spl, "AllSplit")) browser()), where = asNamespace("rtables"))` to remove it. ```{r, message=FALSE} +# rtables 6.0.2 library(rtables) library(dplyr) + # This filter is added to avoid having too many calls to do_split DM_tmp <- DM %>% - filter(SEX %in% c("M", "F")) %>% - mutate(SEX = factor(SEX)) # to drop unattended levels + filter(ARM %in% names(table(DM$ARM)[1:2])) %>% # limit to two + filter(SEX %in% c("M", "F")) %>% # limit to two + mutate(SEX = factor(SEX), ARM = factor(ARM)) # to drop unattended levels -# debug(rtables:::do_split) # Uncomment me to enter the function!!! -basic_table() %>% - split_rows_by("SEX") %>% +# debug(rtables:::do_split) +lyt <- basic_table() %>% split_rows_by("ARM") %>% - analyze("BMRKR1") %>% # analysis is needed to produce the splits !! - build_table(DM_tmp) -# undebug(rtables:::do_split) # reset the debug mode + split_rows_by("SEX") %>% + analyze("BMRKR1") # analyze() is needed for the table to have non-label rows + +lyt %>% + build_table(DM_tmp) +# undebug(rtables:::do_split) ``` Now, we might want to check the formal class of `spl` before anything else. ```{r, eval=FALSE} +# rtables 6.0.2 Browse[2]> str(spl, max.level = 2) Formal class 'VarLevelSplit' [package "rtables"] with 20 slots - ..@ value_label_var : chr "SEX" - ..@ value_order : chr [1:4] "F" "M" "U" "UNDIFFERENTIATED" + ..@ value_label_var : chr "ARM" + ..@ value_order : chr [1:2] "A: Drug X" "B: Placebo" ..@ split_fun : NULL - ..@ payload : chr "SEX" - ..@ name : chr "SEX" - ..@ split_label : chr "SEX" + ..@ payload : chr "ARM" + ..@ name : chr "ARM" + ..@ split_label : chr "ARM" ..@ split_format : NULL ..@ split_na_str : chr NA ..@ split_label_position : chr "hidden" @@ -447,9 +455,10 @@ Formal class 'VarLevelSplit' [package "rtables"] with 20 slots From this, we can directly infer that the class is different now (`VarLevelSplit`) and understand that the split label will be hidden (`split_label_position` slot). Moreover, we see a specific value order with specific split values. Also, `VarLevelSplit` seems to have three more slots than `AllSplit`. What are they precisely? ```{r, eval=FALSE} +# rtables 6.0.2 slots_as <- getSlots("AllSplit") # inherits virtual class Split and is general class for all splits # getClass("CustomizableSplit") # -> Extends: "Split", Known Subclasses: Class "VarLevelSplit", directly -slots_cs <- getSlots("CustomizableSplit") +slots_cs <- getSlots("CustomizableSplit") # Adds split function slots_vls <- getSlots("VarLevelSplit") slots_cs[!(names(slots_cs) %in% names(slots_as))] @@ -464,137 +473,102 @@ Remember always to check the constructor and class definition inside `R/00tablet Now, the next relevant function will be `.apply_split_inner` where we want to see exactly what changes (`debugonce(.apply_split_inner)`). Of course, this function is directly called as no custom split function is provided. Being parameter `vals` not specified (`NULL`), the split values are retrieved from `df` by using the split payload to select specific columns (`varvec <- df[[spl_payload(spl)]]`). Every time no split values are specified, they will be retrieved from the selected column as unique values, if character, or levels, if factor. -Next, `.applysplit_datapart` creates a named list of facets or data subsets. In this case, the result is actually a mutually exclusive partition of the data. `.applysplit_partlabels` is a bit less linear as it has to take into account the possibility of having specified labels in the payload (xxx). Beside looking at the function source code with `getMethod(".applysplit_partlabels", "VarLevelSplit")`, we can enter in debugging mode the `S4` generic function as follows: +Next, `.applysplit_datapart` creates a named list of facets or data subsets. In this case, the result is actually a mutually exclusive partition of the data. This is because we did not specify any split values and the column content was used as such with unique call in case of a character vector or levels in case of factors. `.applysplit_partlabels` is a bit less linear as it has to take into account the possibility of having specified labels in the payload. Beside looking at the function source code with `getMethod(".applysplit_partlabels", "VarLevelSplit")`, we can enter in debugging mode the `S4` generic function as follows: ```{r, eval=FALSE} +# rtables 6.0.2 eval(debugcall(.applysplit_partlabels(spl, df, vals, labels))) # We leave to the smart developer to see how the labels are assigned # PS: remember to undebugcall() similarly ``` -In our case, the final labels are `vals` as nothing different was specified. Their order is retrieved from the split object (`spl_child_order(spl)`) and matched with current split values. The returned list is then processed as it was before. - -If we continue with the next call of `do_split`, the same procedure is accomplished for the second `ARM` split. This is done on the partition that was already done in the first split. The only give out of this is the fact that the main `df` is constituted by a subset (facet) of the total data, according to the first split. This will be done iteratively for as many data split as requested. Before concluding this iteration, we take a moment to talk a bit more in detail about how `.fixupvals(partinfo)` works. It is not a generic function and the source code can be easily accessed as follows. We will not describe this into details, but we suggest to run through it with `debugonce(.fixupvals)` to understand what it does in practice. -```{r, eval=FALSE} -# Note: comments are added to enhance readability -Browse[2]> .fixupvals -function(partinfo) { - # Guarantee of character labels - if(is.factor(partinfo$labels)) - partinfo$labels <- as.character(partinfo$labels) - - # General import of values (in theory this may be an S4 obj in the future) - vals <- partinfo$values - if(is.factor(vals)) - vals <- levels(vals)[vals] - extr <- partinfo$extras - dpart <- partinfo$datasplit - labels <- partinfo$labels - - # Assigning labels at any cost - if(is.null(labels)) { - if(!is.null(names(vals))) - labels <- names(vals) - else if(!is.null(names(dpart))) - labels <- names(dpart) - else if (!is.null(names(extr))) - labels <- names(extr) - } +In our case, the final labels are `vals` because they were not assigned. Their order is retrieved from the split object (`spl_child_order(spl)`) and matched with current split values. The returned list is then processed as it was before. - if(is.null(vals) && !is.null(extr)) - vals <- seq_along(extr) - - # No split will happen as there are no split values - if(length(vals) == 0) { - stopifnot(length(extr) == 0) - return(partinfo) - } - ## length(vals) > 0 from here down - - if(are(vals, "SplitValue") && !are(vals, "LevelComboSplitValue")) { - if(!is.null(extr)) { - warning("Got a partinfo list with values that are ", - "already SplitValue objects and non-null extras ", - "element. This shouldn't happen") - } - } else { - # reformatting of extra args - if(is.null(extr)) - extr <- rep(list(list()), length(vals)) -# ---> # Main list of SplitValue objects: iterative call of new("SplitValue", value = val, extra = extr, label = label) - vals <- make_splvalue_vec(vals, extr, labels = labels) - } - ## we're done with this so take it off (it is already in SplitValue) - partinfo$extras <- NULL - - vnames <- value_names(vals) - names(vals) <- vnames - partinfo$values <- vals - - if(!identical(names(dpart), vnames)) { - names(dpart) <- vnames - partinfo$datasplit <- dpart - } +If we continue with the next call of `do_split`, the same procedure is accomplished for the second `ARM` split. This is done on the partition that was already done in the first split. The only give out of this is the fact that the main `df` is constituted by a subset (facet) of the total data, according to the first split. This will be done iteratively for as many data split as requested. Before concluding this iteration, we take a moment to talk a bit more in detail about how `.fixupvals(partinfo)` works. It is not a generic function and the source code can be easily accessed as follows. We suggest to run through it with `debugonce(.fixupvals)` to understand what it does in practice. The fundamental aspects are listed in the following: +(xxx todo) + ## .fixupvals: + ## - ensures labels are character not factor + ## - Ensures datasplit and values lists are named according to labels + ## - guarantees that ret$values contains SplitValue objects + ## - removes the extras element since its redundant after the above (included in the SplitValue object) - partinfo$labels <- labels - - # Check to have only single elements in partinfo (xxx) - stopifnot(length(unique(sapply(partinfo, NROW))) == 1) - partinfo +```{r, eval=FALSE} +# rtables 6.0.2 + +# Can find the following core function: +# vals <- make_splvalue_vec(vals, extr, labels = labels) +# ---> Main list of SplitValue objects: iterative call of +# new("SplitValue", value = val, extra = extr, label = label) + +# Structure of ret before the function call +Browse[2]> str(ret, max.level = 2) +List of 4 + $ values : chr [1:2] "A: Drug X" "B: Placebo" + $ datasplit:List of 2 + ..$ A: Drug X : tibble [121 × 8] (S3: tbl_df/tbl/data.frame) + ..$ B: Placebo: tibble [106 × 8] (S3: tbl_df/tbl/data.frame) + $ labels : Named chr [1:2] "A: Drug X" "B: Placebo" + ..- attr(*, "names")= chr [1:2] "A: Drug X" "B: Placebo" + $ extras :List of 2 + ..$ : list() + ..$ : list() + +# Structure of ret after the function call +Browse[2]> str(.fixupvals(ret), max.level = 2) +List of 3 + $ values :List of 2 + ..$ A: Drug X :Formal class 'SplitValue' [package "rtables"] with 3 slots + ..$ B: Placebo:Formal class 'SplitValue' [package "rtables"] with 3 slots + $ datasplit:List of 2 + ..$ A: Drug X : tibble [121 × 8] (S3: tbl_df/tbl/data.frame) + ..$ B: Placebo: tibble [106 × 8] (S3: tbl_df/tbl/data.frame) + $ labels : Named chr [1:2] "A: Drug X" "B: Placebo" + ..- attr(*, "names")= chr [1:2] "A: Drug X" "B: Placebo" + +# The SplitValue object is fundamental +Browse[2]> str(ret$values) +List of 2 + $ A: Drug X :Formal class 'SplitValue' [package "rtables"] with 3 slots + .. ..@ extra: list() + .. ..@ value: chr "A: Drug X" + .. ..@ label: chr "A: Drug X" + $ B: Placebo:Formal class 'SplitValue' [package "rtables"] with 3 slots + .. ..@ extra: list() + .. ..@ value: chr "B: Placebo" + .. ..@ label: chr "B: Placebo" ``` -#### Alreadymade split functions + +#### Included split functions We start with a custom split function that is already defined in `rtables`. Its scope is filtering out specific values as follows: + ```{r, message=FALSE} library(rtables) # debug(rtables:::do_split) # uncomment to see into the main split function basic_table() %>% split_rows_by("SEX", split_fun = drop_split_levels) %>% - split_rows_by("ARM") %>% analyze("BMRKR1") %>% build_table(DM) # undebug(rtables:::do_split) # PS: this produces the same output as before with the filters ``` -After skipping the root split, we enter the split based on column `SEX`. As we specified a split function, we entered the following code block: -```{r, eval=FALSE} -# Note: the spl object still belongs to VarLevelSplit class - ## note the <- here!!! # Extracts the split function from split object - if(!is.null(splfun <- split_fun(spl))) { - ## Currently the contract is that split_functions take df, vals, labels and - ## return list(values=., datasplit=., labels = .), optionally with - ## an additional extras element # passed through by means of spl object - if(func_takes(splfun, ".spl_context")) { # Does splfun have .spl_context?? - # Does splfun use the .spl_context (which is always there, now as root) - ret <- tryCatch(splfun(df, spl, vals, labels, trim = trim, - .spl_context = spl_context), - error = function(e) e) ## rawvalues(spl_context )) - } else { - ret <- tryCatch(splfun(df, spl, vals, labels, trim = trim), - error = function(e) e) - } - if(is(ret, "error")) { - stop("Error applying custom split function: ", ret$message, "\n\tsplit: ", - class(spl), " (", payloadmsg(spl), ")\n", - "\toccured at path: ", - spl_context_to_disp_path(spl_context), "\n") - } - } # etc... -``` -Here, we invite to always keep a keen eye on `spl_context`, as it is fundamental for more sophisticate splits, e.g. in the cases where the split itself depends mainly on preceding splits or values. Please, when the split function is called, take a moment to look at how `drop_split_levels` is built. You will see that it is fundamentally a wrapper of `.apply_split_inner` that drops empty factor levels, therefore avoiding empty split. +After skipping the root split, we enter the split based on column `SEX`. As we specified a split function, we retrieve the split function by using `splfun <- split_fun(spl)` and use this to enter an error catching framework (xxx rephrase) that is designed to give informative errors. Later we will see exactly how it works. + +Here, we invite to always keep a keen eye on `spl_context`, as it is fundamental for more sophisticate splits, e.g. in the cases where the split itself depends mainly on preceding splits or values. Please, when the split function is called, take a moment to look at how `drop_split_levels` is defined. You will see that it is fundamentally a wrapper of `.apply_split_inner` that drops empty factor levels, therefore avoiding empty split. ```{r, eval=FALSE} +# rtables 6.0.2 > drop_split_levels function(df, - spl, - vals = NULL, - labels = NULL, - trim = FALSE) { + spl, + vals = NULL, + labels = NULL, + trim = FALSE) { # Retrieve split column var <- spl_payload(spl) df2 <- df @@ -608,14 +582,65 @@ function(df, trim = trim) } ``` + +There are many split functions already included in `rtables`. Lists of them can be found in `vignette("split_functions")`, `?split_funcs`, and `vignette("advanced_usage")`. We leave to the smart developer finding in detail how some of these work, in particular `trim_levels_to_map`. + #### Custom split functions +Now we try to create our custom split function. Firstly, we will see how the system manages error messages. For a general understanding of how we can provide custom split functions, please read `?custom_split_funs` in detail. In the following we use browser() to enter our custom split function. + +```{r, eval=FALSE} +# Table call with only function changing +simple_table <- function(DM, f){ + lyt <- basic_table() %>% + split_rows_by("ARM", split_fun = f) %>% + analyze("BMRKR1") + + lyt %>% + build_table(DM) +} +# First round will fail because there are unused arguments +exploratory_split_fun <- function(df, spl) NULL +# debug(rtables:::do_split) +simple_table(DM, exploratory_split_fun) +# (xxx) +# undebug(rtables:::do_split) + +exploratory_split <- function(df, spl, ...){ + # browser() + + my_payload <- "SEX" + vals <- levels(df[[my_payload]]) + datasplit <- lapply(seq_along(vals), function(i) { + df[df[[my_payload]] == vals[[i]], ] + }) + names(datasplit) <- as.character(vals) + + # Return a split result!! + make_split_result(vals, datasplits, vals) +# function(values, datasplit, labels, extras = NULL) { +# if(length(values) == 1 && is(datasplit, "data.frame")) +# datasplit <- list(datasplit) + + # the core is a list with same number of elements (xxx) + # (xxx it may go in more generally) + +# ret <- list(values = values, datasplit = datasplit, labels = labels) +# if(!is.null(extras)) +# ret$extras <- extras +# .fixupvals(ret) +# } +} + +simple_table(DM, exploratory_split) +``` -### Extra arguments - `extra_args` +### Extra arguments - `extra_args` in detail - lets see what is found (xxx - `analyze_colvars(my_afun, extra_args = list(ref_rowgroup = "V1"))`) -### (xxx - the other parameters... trim??) +### (xxx - the other parameters... trim?? Maybe it does not work, vestigial) +(xxx use trace to find if vals and labels are used) Final examples with `MultiVarSplit` & `CompoundSplit` From bcf161501316f59e663ff300d40d7cf0b110f9c2 Mon Sep 17 00:00:00 2001 From: Melkiades Date: Wed, 17 May 2023 17:51:00 +0200 Subject: [PATCH 13/40] small fix --- vignettes/dg_split_machinery.Rmd | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/vignettes/dg_split_machinery.Rmd b/vignettes/dg_split_machinery.Rmd index b25c82cc3..7e695e75e 100644 --- a/vignettes/dg_split_machinery.Rmd +++ b/vignettes/dg_split_machinery.Rmd @@ -487,12 +487,12 @@ In our case, the final labels are `vals` because they were not assigned. Their o If we continue with the next call of `do_split`, the same procedure is accomplished for the second `ARM` split. This is done on the partition that was already done in the first split. The only give out of this is the fact that the main `df` is constituted by a subset (facet) of the total data, according to the first split. This will be done iteratively for as many data split as requested. Before concluding this iteration, we take a moment to talk a bit more in detail about how `.fixupvals(partinfo)` works. It is not a generic function and the source code can be easily accessed as follows. We suggest to run through it with `debugonce(.fixupvals)` to understand what it does in practice. The fundamental aspects are listed in the following: -(xxx todo) - ## .fixupvals: - ## - ensures labels are character not factor - ## - Ensures datasplit and values lists are named according to labels - ## - guarantees that ret$values contains SplitValue objects - ## - removes the extras element since its redundant after the above (included in the SplitValue object) +* Ensures that labels are character and not factor. +* Ensures that the splits of data and list of values are named according to labels. +* Guarantees that `ret$values` contains `SplitValue` objects. +* Removes the list element `extra` since its now included in the `SplitValue`. + +Note that this function can occasionally be called more than once on the same return object (a named list for now). Of course, after the first call only checks are applied. ```{r, eval=FALSE} # rtables 6.0.2 From 8db24182afdb616163c11540cde6024a4737c384 Mon Sep 17 00:00:00 2001 From: Melkiades Date: Wed, 24 May 2023 19:02:37 +0200 Subject: [PATCH 14/40] recent update --- vignettes/dg_split_machinery.Rmd | 167 +++++++++++++++++++++++++------ 1 file changed, 136 insertions(+), 31 deletions(-) diff --git a/vignettes/dg_split_machinery.Rmd b/vignettes/dg_split_machinery.Rmd index 7e695e75e..3cd00072c 100644 --- a/vignettes/dg_split_machinery.Rmd +++ b/vignettes/dg_split_machinery.Rmd @@ -557,7 +557,7 @@ basic_table() %>% # PS: this produces the same output as before with the filters ``` -After skipping the root split, we enter the split based on column `SEX`. As we specified a split function, we retrieve the split function by using `splfun <- split_fun(spl)` and use this to enter an error catching framework (xxx rephrase) that is designed to give informative errors. Later we will see exactly how it works. +After skipping the root split, we enter the split based on column `SEX`. As we specified a split function, we retrieve the split function by using `splfun <- split_fun(spl)` and enter an `if-else` statement for the two possible cases where there is split contenxt or not. In both cases, an error catching framework is used so to give informative errors in case of failure. Later we will see better how it works. Here, we invite to always keep a keen eye on `spl_context`, as it is fundamental for more sophisticate splits, e.g. in the cases where the split itself depends mainly on preceding splits or values. Please, when the split function is called, take a moment to look at how `drop_split_levels` is defined. You will see that it is fundamentally a wrapper of `.apply_split_inner` that drops empty factor levels, therefore avoiding empty split. @@ -587,9 +587,10 @@ There are many split functions already included in `rtables`. Lists of them can #### Custom split functions -Now we try to create our custom split function. Firstly, we will see how the system manages error messages. For a general understanding of how we can provide custom split functions, please read `?custom_split_funs` in detail. In the following we use browser() to enter our custom split function. +Now we try to create our custom split function. Firstly, we will see how the system manages error messages. For a general understanding of how we can provide custom split functions, please read `?custom_split_funs` in detail. In the following we use browser() to enter our custom split functions. For the error cases, we invite the reader to activate `options(error = recover)` so to investigate the cases where we have an error. Note that you can retrieve original behavior by restarting `R` session or by caching the default option value. Another smart possibility is to use `callr` to retrieve the default as follows: `default_opts <- callr::r(function(){options()}); options(error = default_opts$error)`. -```{r, eval=FALSE} +```{r} +# rtables 6.0.2 # Table call with only function changing simple_table <- function(DM, f){ lyt <- basic_table() %>% @@ -603,44 +604,148 @@ simple_table <- function(DM, f){ exploratory_split_fun <- function(df, spl) NULL # debug(rtables:::do_split) simple_table(DM, exploratory_split_fun) -# (xxx) # undebug(rtables:::do_split) -exploratory_split <- function(df, spl, ...){ - # browser() - - my_payload <- "SEX" - vals <- levels(df[[my_payload]]) - datasplit <- lapply(seq_along(vals), function(i) { - df[df[[my_payload]] == vals[[i]], ] - }) - names(datasplit) <- as.character(vals) - - # Return a split result!! - make_split_result(vals, datasplits, vals) -# function(values, datasplit, labels, extras = NULL) { -# if(length(values) == 1 && is(datasplit, "data.frame")) -# datasplit <- list(datasplit) - - # the core is a list with same number of elements (xxx) - # (xxx it may go in more generally) - -# ret <- list(values = values, datasplit = datasplit, labels = labels) -# if(!is.null(extras)) -# ret$extras <- extras -# .fixupvals(ret) -# } +``` + +Commented debugging options can get you above and before the error. Nonetheless using the recover option will get you the possibility to select the frame number, i.e. the trace level to enter as debugging selecting the last one (10 in my case), will allow you to see the value of `ret` from `rtables:::do_split` that is the simple error and how the informative error message that follows is created. + +```{r, eval=FALSE} +# rtables 6.0.2 +# Debugging level +10: tt_dotabulation.R#627: do_split(spl, df, spl_context = spl_context) + +# Original call and final error +> simple_table(DM, exploratory_split_fun) +Error in do_split(spl, df, spl_context = spl_context) : + Error applying custom split function: unused arguments (vals, labels, trim = trim) # This is main error + split: VarLevelSplit (ARM) # Split reference + occured at path: root # Path level (where it happened) +``` + +The previous split function fails because not all arguments are present. A simple way to avoid this is to add `...` to the function call. Now lets construct an interesting split function (and error): + +```{r} +# rtables 6.0.2 +f_brakes_if <- function(split_col = NULL, error = FALSE){ + function(df, spl, ...){ # order matters! more than naming + # browser() # To check how it works + if (is.null(split_col)) { # Retrieves the default + split_col <- spl_variable(spl) # Internal accessor to split obj + } + my_payload <- split_col # Changing split column value + + vals <- levels(df[[my_payload]]) # Extracting values to split + datasplit <- lapply(seq_along(vals), function(i) { + df[df[[my_payload]] == vals[[i]], ] + }) + names(datasplit) <- as.character(vals) + + # Fantasy error + if (isTRUE(error)) { + # browser() # If you need to check how it works + mystery_error_values <- sapply(datasplit, function(x) mean(x$BMRKR1)) + if (any(mystery_error_values > 6)) { + stop("It should not be more than 6! Should it be? Found in split values: ", + names(datasplit)[which(mystery_error_values > 6)]) + } + } + + # Handy function to return a split result!! + make_split_result(vals, datasplit, vals) + } +} +simple_table(DM, f_brakes_if()) # works! +simple_table(DM, f_brakes_if(split_col = "STRATA1")) # works! +# Does not work, but in an informative way +simple_table(DM, f_brakes_if(error = TRUE)) +``` + +Now we will dwell a moment to the relatively new machinery to create custom split functions. Before doing so, please read the relevant documentation `?make_split_fun`. The majority of functions already included in `rtables` can be or will be written with `make_split_fun` as it is a more stable constructor for such functions. We invite the reader to take a look at `make_split_fun.R`. The majority of functions should be very understandable as fare as you got into this guide. We want to highlight that if no core split function is specified, which is commonly the case, `make_split_fun` calls directly `do_base_split` which is a minimal wrapper of our well known `do_split`. `drop_facet_levels` for example is a pre-processing function that at the core simply removes empty factor levels from the split "column", thus avoiding empty lines to be shown. It is possible, also to add a list of functions, as it can be seen in the examples of `?make_split_fun`. Note that all the inputs must be listed if there is a list of functions, otherwise the loop will not work (xxx, list only for pre and post, not core because you can have multiple sequential functions, write an issue informative error for not list xxx). Included post-processing functions are more interesting as they interact with the split object, e.g. by reordering the facets or by adding an overall facet (`add_overall_facet`). The smart reader will have noticed as the core function and many of the post processing functions rely on `make_split_result` which is a way to get the correct split return structure. Lastly it is possible to change the core split and the post according to the new custom split if needed. Note that this works only in the row space at the moment. + +#### `.spl_context` - a bit of context to our splits +The best way to understand what split context does and how to use it is to read relevant vignette (xxx advanced usage), and to use `browser()` to see how it is structured. + +```{r, eval=FALSE} +# rtables 6.0.2 +browsing_f <- function(df, spl, .spl_context, ...) { # (xxx needed for core) + browser() + # do_base_split(spl, df, ...) # order matters!! + do_base_split(spl = spl, df = df, vals = NULL, labels = NULL, trim = TRUE) + # trim works ONLY when there is a mixed table (some values are 0s and some have content -> trims the 0s) trim_levels_to_group and trim_levels_to_map are the replacement } -simple_table(DM, exploratory_split) +basic_table() %>% + split_rows_by("ARM") %>% + split_rows_by("STRATA1") %>% + split_rows_by_cuts("AGE", cuts = c(0, 50, 100), + cutlabels = c("young", "old")) %>% + split_rows_by("SEX", split_fun = make_split_fun( + # pre = list(drop_facet_levels), # This is dropping the sex levels (age is upper level) add issue for this -> split_rows_by() wrapper with spl_fun + core_split = browsing_f#, #(xxx why does this fail?) + # post = list(trim_levels_in_facets("AGE")) #(xxx why too fails? how to trim the empty levels?) + )) %>% + summarize_row_groups() %>% + build_table(DM) + +# The following is the .spl_contest printout: +Browse[1]> .spl_context + split value full_parent_df all_cols_n all obs +1 root root c("S1", .... 356 TRUE, TR.... +2 ARM A: Drug X c("S6", .... 121 TRUE, TR.... +3 STRATA1 A c("S14",.... 36 TRUE, TR.... +4 AGE young c("S14",.... 36 TRUE, TR.... + +# NOTE: make_split_fun(pre = list(drop_facet_levels)) and drop_split_levels +# do the same in this case ``` +Here we can see what is the split column variable (`split`, first column) at this level of the splitting procedure. `value` is the current split value that is being dealt with. Now, for the next column, lets see the number of rows of these dataframes: `sapply(.spl_context$full_parent_df, nrow) # [1] 356 121 36 36`. Indeed, the `root` level contains the full input dataframe, while the other levels are subgroups of the full data according to the split value. `all_cols_n` shows exactly the numbers just described. `all obs` (xxx is the column "filter" name). It is possible to use the same information to make complex splits also on the column space by using the full dataframe and the value splits to select the interested values. This is something we will fix when it will be a more apparent need. + +### Extra arguments `extra_args` +This functionality is well known and used in the setting of analysis functions (xxx vignette), but we show here how this can also apply to splits. +(xxx intent is to set them on the parent and then used in the analyze call - possible this does not work. Issue to investigate this and if we need it xxx) -### Extra arguments - `extra_args` in detail - lets see what is found -(xxx - `analyze_colvars(my_afun, extra_args = list(ref_rowgroup = "V1"))`) +```{r, eval=FALSE} +# rtables 6.0.2 + +# Lets use the tracer!! +my_tracer <- quote(if (!is(spl, "AllSplit") && + spl_variable(spl) == "SEX") browser()) +trace(what = "do_split", + tracer = my_tracer, + where = asNamespace("rtables")) + +basic_table() %>% + split_rows_by("ARM") %>% + split_rows_by("SEX", split_fun = drop_split_levels) %>% + analyze("BMRKR1", extra_args = list("a" = 3)) %>% + build_table(DM) + +untrace(what = "do_split", + where = asNamespace("rtables")) +``` ### (xxx - the other parameters... trim?? Maybe it does not work, vestigial) (xxx use trace to find if vals and labels are used) +```{r, eval=FALSE} +my_tracer <- quote(if (!isTRUE(trim)) browser()) +trace(what = "do_split", + tracer = my_tracer, + where = asNamespace("rtables")) + +wrapper_do_split <- function(){ + do_base_split(spl, df, vals = NULL, labels = NULL, trim = TRUE) +} +basic_table() %>% + split_rows_by("ARM") %>% + split_rows_by("SEX") %>% + analyze("BMRKR1") %>% + build_table(DM) +untrace(what = "do_split", + where = asNamespace("rtables")) +``` + Final examples with `MultiVarSplit` & `CompoundSplit` From c9d4a62f93d8118889c67a29213e5fa11dca4ab7 Mon Sep 17 00:00:00 2001 From: Melkiades Date: Fri, 9 Jun 2023 14:19:14 +0200 Subject: [PATCH 15/40] few mods --- vignettes/dg_split_machinery.Rmd | 32 +++++++++++++++++++++----------- 1 file changed, 21 insertions(+), 11 deletions(-) diff --git a/vignettes/dg_split_machinery.Rmd b/vignettes/dg_split_machinery.Rmd index 3cd00072c..debba8d86 100644 --- a/vignettes/dg_split_machinery.Rmd +++ b/vignettes/dg_split_machinery.Rmd @@ -603,9 +603,10 @@ simple_table <- function(DM, f){ # First round will fail because there are unused arguments exploratory_split_fun <- function(df, spl) NULL # debug(rtables:::do_split) -simple_table(DM, exploratory_split_fun) +err_msg <- tryCatch(simple_table(DM, exploratory_split_fun), error = function(e) e) # undebug(rtables:::do_split) +message(err_msg$message) ``` Commented debugging options can get you above and before the error. Nonetheless using the recover option will get you the possibility to select the frame number, i.e. the trace level to enter as debugging selecting the last one (10 in my case), will allow you to see the value of `ret` from `rtables:::do_split` that is the simple error and how the informative error message that follows is created. @@ -657,22 +658,31 @@ f_brakes_if <- function(split_col = NULL, error = FALSE){ } simple_table(DM, f_brakes_if()) # works! simple_table(DM, f_brakes_if(split_col = "STRATA1")) # works! + # Does not work, but in an informative way -simple_table(DM, f_brakes_if(error = TRUE)) +# simple_table(DM, f_brakes_if(error = TRUE)) + +# Error in do_split(spl, df, spl_context = spl_context) : +# Error applying custom split function: It should not be more than 6! Should it be? Found in split values: B: Placebo +# split: VarLevelSplit (ARM) +# occured at path: root ``` -Now we will dwell a moment to the relatively new machinery to create custom split functions. Before doing so, please read the relevant documentation `?make_split_fun`. The majority of functions already included in `rtables` can be or will be written with `make_split_fun` as it is a more stable constructor for such functions. We invite the reader to take a look at `make_split_fun.R`. The majority of functions should be very understandable as fare as you got into this guide. We want to highlight that if no core split function is specified, which is commonly the case, `make_split_fun` calls directly `do_base_split` which is a minimal wrapper of our well known `do_split`. `drop_facet_levels` for example is a pre-processing function that at the core simply removes empty factor levels from the split "column", thus avoiding empty lines to be shown. It is possible, also to add a list of functions, as it can be seen in the examples of `?make_split_fun`. Note that all the inputs must be listed if there is a list of functions, otherwise the loop will not work (xxx, list only for pre and post, not core because you can have multiple sequential functions, write an issue informative error for not list xxx). Included post-processing functions are more interesting as they interact with the split object, e.g. by reordering the facets or by adding an overall facet (`add_overall_facet`). The smart reader will have noticed as the core function and many of the post processing functions rely on `make_split_result` which is a way to get the correct split return structure. Lastly it is possible to change the core split and the post according to the new custom split if needed. Note that this works only in the row space at the moment. +Now we will dwell a moment to the relatively new machinery to create custom split functions. Before doing so, please read the relevant documentation `?make_split_fun`. The majority of functions already included in `rtables` can be or will be written with `make_split_fun` as it is a more stable constructor for such functions. We invite the reader to take a look at `make_split_fun.R`. The majority of functions should be very understandable as far as you got into this guide. We want to highlight that if no core split function is specified, which is commonly the case, `make_split_fun` calls directly `do_base_split` which is a minimal wrapper of our well known `do_split`. `drop_facet_levels` for example is a pre-processing function that at the core simply removes empty factor levels from the split "column", thus avoiding empty lines to be shown. + +It is possible, also to add a list of functions, as it can be seen in the examples of `?make_split_fun`. Note that pre and post processing need a list in input to support the possibility to combine multiple functions. The core splitting function, instead, must be a single function call as it is not expected to have stacked features. This needs rarely to be modified and the majority of the included split functions work with pre or post processing. Included post-processing functions are interesting as they interact with the split object, e.g. by reordering the facets or by adding an overall facet (`add_overall_facet`). The smart reader will have noticed as the core function rely somehow on `do_split` and many of the post processing functions rely on `make_split_result` which is the best way to get the correct split return structure. Note that modifying the core split works only in the row space at the moment. #### `.spl_context` - a bit of context to our splits -The best way to understand what split context does and how to use it is to read relevant vignette (xxx advanced usage), and to use `browser()` to see how it is structured. +The best way to understand what split context does and how to use it is to read relevant vignette (xxx advanced usage), and to use `browser()` a split function to see how it is structured. As `.spl_context` is needed for rewriting core functions, we propose here a wrapper of `do_base_split`, which is a handy redirection to the standard `do_split` without +the split function part, i.e. it is a wrapper of `.apply_split_inner`, the real core splitting machinery. For curiosity we set here `trim = TRUE`. This trimming works only when there is a mixed table (some values are 0s and some have content, there it trims the 0s). This is rarely the case and we encourage using the replacement functions `trim_levels_to_group` and `trim_levels_to_map`. Nowadays, it should even be impossible to set it differently from `trim = FALSE`. +(write an issue informative error for not list xxx). ```{r, eval=FALSE} # rtables 6.0.2 -browsing_f <- function(df, spl, .spl_context, ...) { # (xxx needed for core) - browser() - # do_base_split(spl, df, ...) # order matters!! +browsing_f <- function(df, spl, .spl_context, ...) { + # browser() + # do_base_split(df, spl, ...) # order matters!! This would fail if done do_base_split(spl = spl, df = df, vals = NULL, labels = NULL, trim = TRUE) - # trim works ONLY when there is a mixed table (some values are 0s and some have content -> trims the 0s) trim_levels_to_group and trim_levels_to_map are the replacement } basic_table() %>% @@ -681,9 +691,9 @@ basic_table() %>% split_rows_by_cuts("AGE", cuts = c(0, 50, 100), cutlabels = c("young", "old")) %>% split_rows_by("SEX", split_fun = make_split_fun( - # pre = list(drop_facet_levels), # This is dropping the sex levels (age is upper level) add issue for this -> split_rows_by() wrapper with spl_fun + pre = list(drop_facet_levels), # This is dropping the sex levels (age is upper level) add issue for this -> split_rows_by() wrapper with spl_fun core_split = browsing_f#, #(xxx why does this fail?) - # post = list(trim_levels_in_facets("AGE")) #(xxx why too fails? how to trim the empty levels?) + post = list(trim_levels_in_facets("AGE")) #(xxx why too fails? how to trim the empty levels?) )) %>% summarize_row_groups() %>% build_table(DM) @@ -718,7 +728,7 @@ trace(what = "do_split", basic_table() %>% split_rows_by("ARM") %>% split_rows_by("SEX", split_fun = drop_split_levels) %>% - analyze("BMRKR1", extra_args = list("a" = 3)) %>% + summarize_row_groups(extra_args = list(a = 3)) %>% build_table(DM) untrace(what = "do_split", From 024fd30de47e54e9ad76bfdf8fcb65ebc09b9d32 Mon Sep 17 00:00:00 2001 From: Melkiades Date: Wed, 14 Jun 2023 11:03:31 +0200 Subject: [PATCH 16/40] most recent update --- vignettes/dg_split_machinery.Rmd | 146 ++++++++++++++++++++++++++----- 1 file changed, 122 insertions(+), 24 deletions(-) diff --git a/vignettes/dg_split_machinery.Rmd b/vignettes/dg_split_machinery.Rmd index debba8d86..9fc637113 100644 --- a/vignettes/dg_split_machinery.Rmd +++ b/vignettes/dg_split_machinery.Rmd @@ -684,16 +684,27 @@ browsing_f <- function(df, spl, .spl_context, ...) { # do_base_split(df, spl, ...) # order matters!! This would fail if done do_base_split(spl = spl, df = df, vals = NULL, labels = NULL, trim = TRUE) } - +fnc_tmp <- function(innervar) { # Exploring trim_levels_in_facets (check its form) + function(ret, ...) { + # browser() + for(var in innervar) { # of course AGE is not here, so nothing is dropped!! + ret$datasplit <- lapply(ret$datasplit, function(df) { + df[[var]] <- factor(df[[var]]) + df + }) + } + ret + } +} basic_table() %>% split_rows_by("ARM") %>% split_rows_by("STRATA1") %>% split_rows_by_cuts("AGE", cuts = c(0, 50, 100), cutlabels = c("young", "old")) %>% split_rows_by("SEX", split_fun = make_split_fun( - pre = list(drop_facet_levels), # This is dropping the sex levels (age is upper level) add issue for this -> split_rows_by() wrapper with spl_fun - core_split = browsing_f#, #(xxx why does this fail?) - post = list(trim_levels_in_facets("AGE")) #(xxx why too fails? how to trim the empty levels?) + pre = list(drop_facet_levels), # This is dropping the sex levels (age is upper level) + core_split = browsing_f, + post = list(fnc_tmp("AGE")) # To drop these we should use a split_fun in the above level for that )) %>% summarize_row_groups() %>% build_table(DM) @@ -707,9 +718,9 @@ Browse[1]> .spl_context 4 AGE young c("S14",.... 36 TRUE, TR.... # NOTE: make_split_fun(pre = list(drop_facet_levels)) and drop_split_levels -# do the same in this case +# do the same thing in this case ``` -Here we can see what is the split column variable (`split`, first column) at this level of the splitting procedure. `value` is the current split value that is being dealt with. Now, for the next column, lets see the number of rows of these dataframes: `sapply(.spl_context$full_parent_df, nrow) # [1] 356 121 36 36`. Indeed, the `root` level contains the full input dataframe, while the other levels are subgroups of the full data according to the split value. `all_cols_n` shows exactly the numbers just described. `all obs` (xxx is the column "filter" name). It is possible to use the same information to make complex splits also on the column space by using the full dataframe and the value splits to select the interested values. This is something we will fix when it will be a more apparent need. +Here we can see what is the split column variable (`split`, first column) at this level of the splitting procedure. `value` is the current split value that is being dealt with. Now, for the next column, lets see the number of rows of these dataframes: `sapply(.spl_context$full_parent_df, nrow) # [1] 356 121 36 36`. Indeed, the `root` level contains the full input dataframe, while the other levels are subgroups of the full data according to the split value. `all_cols_n` shows exactly the numbers just described. `all obs` is the current filter applied to the columns. Appling this to the root data (or the row subgroup data) reveals the current facet column-wise (and row-wise if in row split). It is possible to use the same information to make complex splits also on the column space by using the full dataframe and the value splits to select the interested values. This is something we will change and simplify when it is a more apparent need. ### Extra arguments `extra_args` This functionality is well known and used in the setting of analysis functions (xxx vignette), but we show here how this can also apply to splits. @@ -719,43 +730,130 @@ This functionality is well known and used in the setting of analysis functions ( # rtables 6.0.2 # Lets use the tracer!! -my_tracer <- quote(if (!is(spl, "AllSplit") && - spl_variable(spl) == "SEX") browser()) +my_tracer <- quote(if (length(spl@extra_args) > 0) browser()) trace(what = "do_split", tracer = my_tracer, where = asNamespace("rtables")) - +custom_mean_var <- function(var) { + function(df, labelstr, na.rm = FALSE, ...) { + # browser() + mean(df[[var]], na.rm = na.rm) + } +} +DM_ageNA <- DM +DM_ageNA$AGE[1] <- NA basic_table() %>% split_rows_by("ARM") %>% split_rows_by("SEX", split_fun = drop_split_levels) %>% - summarize_row_groups(extra_args = list(a = 3)) %>% - build_table(DM) - + summarize_row_groups(cfun = custom_mean_var("AGE"), + extra_args = list(na.rm = TRUE), format = "xx.x", + label_fstr = "label %s") %>% + split_rows_by("STRATA1") %>% + analyze("AGE") %>% + build_table(DM_ageNA) + +# As we can see that was not possible. What if we now force it a bit? +my_split_fun <- function(df, spl, .spl_context, ...) { + spl@extra_args <- list(na.rm = TRUE) + drop_split_levels(df, spl) +} # does not work +basic_table() %>% + split_rows_by("ARM") %>% + split_rows_by("SEX", split_fun = my_split_fun) %>% + analyze("AGE") %>% + build_table(DM_ageNA) +# We invite the developer now to test all the tests file of this package with the tracer on +# therefore -> extra_args is not currently used in splits +untrace(what = "do_split", + where = asNamespace("rtables")) +# Let's try with the other variables identically +my_tracer <- quote(if (!is.null(vals) || !is.null(labels) || isTRUE(trim)) { + print("A LOT TO SAY") + message("CANT BLOCK US ALL") + stop("NOW FOR SURE") + browser() + }) +trace(what = "do_split", + tracer = my_tracer, + where = asNamespace("rtables")) +# Run tests by copying the above in setup-fakedata.R (then devtools::test()) untrace(what = "do_split", where = asNamespace("rtables")) ``` +As we have demonstrated, all the above seems like impossible cases, and are to be +considered as vestigial and deprecated heritage. + +Final examples with `MultiVarSplit` & `CompoundSplit` +# (xxx) can I swap split_cols_by_multivar -> probably you cannot atm, yes can do +custom split function that does the "fake" split -### (xxx - the other parameters... trim?? Maybe it does not work, vestigial) -(xxx use trace to find if vals and labels are used) +Firstly, we want to see how `MultiVarSplit` class behaves for an example case taken from +`?split_rows_by_multivar`. ```{r, eval=FALSE} -my_tracer <- quote(if (!isTRUE(trim)) browser()) +# rtables 6.0.2 + +my_tracer <- quote(if (is(spl, "MultiVarSplit")) browser()) trace(what = "do_split", tracer = my_tracer, where = asNamespace("rtables")) +# We want also to take a look at the following: +debugonce(rtables:::.apply_split_inner) +lyt <- basic_table() %>% + split_cols_by("ARM") %>% + split_rows_by_multivar(c("SEX", "STRATA1")) %>% + summarize_row_groups() %>% + analyze(c("AGE", "SEX")) -wrapper_do_split <- function(){ - do_base_split(spl, df, vals = NULL, labels = NULL, trim = TRUE) -} -basic_table() %>% - split_rows_by("ARM") %>% - split_rows_by("SEX") %>% - analyze("BMRKR1") %>% - build_table(DM) +build_table(lyt, DM) untrace(what = "do_split", where = asNamespace("rtables")) ``` -Final examples with `MultiVarSplit` & `CompoundSplit` +If we print them out, we will notice that the two groups (one called "SEX" and the other "STRATA1") are identical along the columns. This is because no subgroup was actually created. This is an interesting way to personalize splits and with the help of custom split functions and their split context, also to have widely different subgroups in the table. + +Lastly, we will briefly show an example of a split with cumulative function and how to replace it and solve the problem with empty age groups we had before. In the following, we propose the same simplified situation: + +```{r} +# rtables 6.0.2 + +cutfun <- function(x) { + # browser() + cutpoints <- c(0, mean(x), max(x)) + + names(cutpoints) <- c("", "Younger", "Older") + cutpoints +} +tbl <- basic_table() %>% + split_rows_by("ARM", + split_fun = drop_and_remove_levels(c("B: Placebo", "C: Combination"))) %>% + split_rows_by("STRATA1") %>% + split_rows_by_cutfun("AGE", cutfun = cutfun) %>% + # split_rows_by_cuts("AGE", cuts = c(0, 50, 100), + # cutlabels = c("young", "old")) %>% # our objective is to take out empty levels from this! + split_rows_by("SEX", split_fun = drop_split_levels) %>% + summarize_row_groups() %>% + build_table(DM) + +# (xxx) The real new thing + +# +# lyt5 <- basic_table() %>% +# split_cols_by_cutfun("AGE", cutfun = cutfun) %>% +# analyze("SEX") +# +# tbl5 <- build_table(lyt5, ex_adsl) +# tbl5 + +tbl +``` +The reader will notice here that there is `old` printed out without any data in it. We want to trim it! +(xxx) to drop values you need to override the core with make_split_fun +(xxx) add the pre-proc with z-scoring +```{r} +# try with pruning +prune_table(tbl) #(xxx) what is going on here? + +``` From 51de5589efb1c038bf1d2c11d163b20dd221fea1 Mon Sep 17 00:00:00 2001 From: Melkiades Date: Wed, 19 Jul 2023 17:49:04 +0200 Subject: [PATCH 17/40] finishing up --- vignettes/dg_split_machinery.Rmd | 94 ++++++++++++++++++++++++-------- 1 file changed, 71 insertions(+), 23 deletions(-) diff --git a/vignettes/dg_split_machinery.Rmd b/vignettes/dg_split_machinery.Rmd index 9fc637113..c9ddc596c 100644 --- a/vignettes/dg_split_machinery.Rmd +++ b/vignettes/dg_split_machinery.Rmd @@ -724,7 +724,6 @@ Here we can see what is the split column variable (`split`, first column) at thi ### Extra arguments `extra_args` This functionality is well known and used in the setting of analysis functions (xxx vignette), but we show here how this can also apply to splits. -(xxx intent is to set them on the parent and then used in the analyze call - possible this does not work. Issue to investigate this and if we need it xxx) ```{r, eval=FALSE} # rtables 6.0.2 @@ -785,9 +784,6 @@ considered as vestigial and deprecated heritage. Final examples with `MultiVarSplit` & `CompoundSplit` -# (xxx) can I swap split_cols_by_multivar -> probably you cannot atm, yes can do -custom split function that does the "fake" split - Firstly, we want to see how `MultiVarSplit` class behaves for an example case taken from `?split_rows_by_multivar`. @@ -803,6 +799,7 @@ debugonce(rtables:::.apply_split_inner) lyt <- basic_table() %>% split_cols_by("ARM") %>% split_rows_by_multivar(c("SEX", "STRATA1")) %>% + # split_rows_by("COUNTRY", split_fun = keep_split_levels("PAK")) %>% # xxx for #690 #691 summarize_row_groups() %>% analyze(c("AGE", "SEX")) @@ -813,47 +810,98 @@ untrace(what = "do_split", If we print them out, we will notice that the two groups (one called "SEX" and the other "STRATA1") are identical along the columns. This is because no subgroup was actually created. This is an interesting way to personalize splits and with the help of custom split functions and their split context, also to have widely different subgroups in the table. -Lastly, we will briefly show an example of a split with cumulative function and how to replace it and solve the problem with empty age groups we had before. In the following, we propose the same simplified situation: +We grasp the occasion to explain that with `xxx` comments we indicate placeholders for TODOs and warnings that needs further work. We invite the reader to try to understand why `split_rows_by_multivar` can have other row splits under it (see `xxx` comment in the previous code), while `split_cols_by_multivar` does not. It is a known bug at the moment, and we would be pleased to have a fix. The issues are often linked in the code by their code number (e.g. `#690`). + +Lastly, we will briefly show an example of a split with cut function and how to replace it and solve the problem with empty age groups we had before. In the following, we propose the same simplified situation: ```{r} # rtables 6.0.2 cutfun <- function(x) { - # browser() - cutpoints <- c(0, mean(x), max(x)) + # browser() + cutpoints <- c(0, 50, 100) names(cutpoints) <- c("", "Younger", "Older") cutpoints } -tbl <- basic_table() %>% +tbl <- basic_table(show_colcounts = TRUE) %>% split_rows_by("ARM", split_fun = drop_and_remove_levels(c("B: Placebo", "C: Combination"))) %>% split_rows_by("STRATA1") %>% split_rows_by_cutfun("AGE", cutfun = cutfun) %>% # split_rows_by_cuts("AGE", cuts = c(0, 50, 100), - # cutlabels = c("young", "old")) %>% # our objective is to take out empty levels from this! + # cutlabels = c("young", "old")) %>% # Works the same split_rows_by("SEX", split_fun = drop_split_levels) %>% - summarize_row_groups() %>% + summarize_row_groups() %>% # This is degenerate!!! build_table(DM) -# (xxx) The real new thing +tbl +``` +For both cases (`*_cuts` and `*_cutfun`), we have empty levels that are not dropped. This is to be expected and can be avoided by using a dedicated split function. Only intentionally looking at the future split is possible to know if there is any element in it. At the moment, though, it is not possible to add `spl_fun` to dedicated split function like `split_rows_by_cuts`. + +Note too that in the previous table we used only `summarize_row_groups` but no `analyze`. This rendered nicely but it is not the standard way to go as `summarize_row_groups` was intended ONLY to decorate row groups, i.e. row with labels. Internally, these rows are called content rows and that is why the analysis functions here are called `cfun` instead of `afun`. Indeed, also the tabulation machinery presents these two differences as it is described here (xxx link to tabulation vignette). + +We can try anyway to construct the split function for cuts manually with `make_split_fun`: +```{r, eval=FALSE} +my_count_afun <- function(x, .N_col, .spl_context, ...) { + # browser() + out <- list(c(length(x), length(x)/.N_col)) + names(out) <- .spl_context$value[nrow(.spl_context)] # workaround (xxx #689) + in_rows(.list = out, + .formats = c("xx (xx.x%)")) +} +# ?make_split_fun # To check for docs/examples + +# Core split +cuts_core <- function(spl, df, vals, labels, .spl_context) { + # browser() + young_v <- as.numeric(df[["AGE"]]) < 50 + make_split_result(c("young", "old"), + datasplit = list(df[young_v,], df[!young_v,]), + labels = c("Younger", "Older")) +} +drop_empties <- function(splret, spl, fulldf, ...){ + # browser() + nrows_data_split <- vapply(splret$datasplit, nrow, numeric(1)) + to_keep <- nrows_data_split > 0 + make_split_result(splret$values[to_keep], + splret$datasplit[to_keep], + splret$labels[to_keep]) +} +gen_split <- make_split_fun(core_split = cuts_core, + post = list(drop_empties)) -# -# lyt5 <- basic_table() %>% -# split_cols_by_cutfun("AGE", cutfun = cutfun) %>% -# analyze("SEX") -# -# tbl5 <- build_table(lyt5, ex_adsl) -# tbl5 +tbl <- basic_table(show_colcounts = TRUE) %>% + split_rows_by("ARM", split_fun = keep_split_levels(c("A: Drug X"))) %>% + split_rows_by("STRATA1") %>% + split_rows_by("AGE", split_fun = gen_split) %>% + split_rows_by("SEX", split_fun = drop_split_levels, + child_labels = "hidden") %>% + analyze("BMRKR1", afun = my_count_afun) %>% # This is NOT degenerate!!! BMRKR1 is only placeholder + build_table(DM) tbl +# why is it missing Older? (xxx) +# why the core split seems different from inside??? (xxx) ``` -The reader will notice here that there is `old` printed out without any data in it. We want to trim it! -(xxx) to drop values you need to override the core with make_split_fun -(xxx) add the pre-proc with z-scoring +There is another way to go. We could prune them out! ```{r} -# try with pruning -prune_table(tbl) #(xxx) what is going on here? +# rtables 6.0.2 +tbl <- basic_table(show_colcounts = TRUE) %>% + split_rows_by("ARM", split_fun = keep_split_levels(c("A: Drug X"))) %>% + split_rows_by("STRATA1") %>% + split_rows_by_cuts("AGE", cuts = c(0, 50, 100), + cutlabels = c("young", "old")) %>% + split_rows_by("SEX", split_fun = drop_split_levels) %>% + summarize_row_groups() %>% # This is degenerate!!! # we keep it until #689 + build_table(DM) + +tbl + +# Trying with pruning +prune_table(tbl) #(xxx) what is going on here? +# It is degenerate -> what to do? ``` +(xxx) add the pre-proc with z-scoring From 9d8890ad9a84a3c3f313613320e71972d20d9afe Mon Sep 17 00:00:00 2001 From: Melkiades Date: Wed, 19 Jul 2023 18:50:57 +0200 Subject: [PATCH 18/40] comment updates --- vignettes/dg_split_machinery.Rmd | 27 +++++++++++++++++++++------ 1 file changed, 21 insertions(+), 6 deletions(-) diff --git a/vignettes/dg_split_machinery.Rmd b/vignettes/dg_split_machinery.Rmd index c9ddc596c..7e0a45b6f 100644 --- a/vignettes/dg_split_machinery.Rmd +++ b/vignettes/dg_split_machinery.Rmd @@ -747,24 +747,39 @@ basic_table() %>% summarize_row_groups(cfun = custom_mean_var("AGE"), extra_args = list(na.rm = TRUE), format = "xx.x", label_fstr = "label %s") %>% - split_rows_by("STRATA1") %>% - analyze("AGE") %>% + # content_extra_args, c_extra_args are different slots!! (xxx) + split_rows_by("STRATA1", split_fun = keep_split_levels("A")) %>% + analyze("AGE") %>% # check with the extra_args (xxx) build_table(DM_ageNA) +# You can accumulate extra_args down to other splits. It is possible this does not +# work. Should it? That is why extra_args lives only in splits (xxx) check if it works +# as is. Difficult to find an use case for this. Maybe it could work for the ref_group +# info. That does not work with nesting already (fairly sure that it will break stuff). +# Does it make sense to have more than one ref_group at any point of the analysis? No docs, +# send a warning if users try to nest things with ref_group (that is passed around via +# extra_args) # As we can see that was not possible. What if we now force it a bit? my_split_fun <- function(df, spl, .spl_context, ...) { spl@extra_args <- list(na.rm = TRUE) + # does not work because do_split is not changing the object + # the split does not do anything with it drop_split_levels(df, spl) } # does not work basic_table() %>% split_rows_by("ARM") %>% split_rows_by("SEX", split_fun = my_split_fun) %>% - analyze("AGE") %>% + analyze("AGE", inclNAs = TRUE, afun = mean) %>% # include_NAs is set FALSE build_table(DM_ageNA) +# extra_args is in available in cols but not in rows, because different columns +# may need it for different col space. Row-wise it seems not necessary. +# The only thing that works is adding it to analyze (xxx) check if it is worth adding + # We invite the developer now to test all the tests file of this package with the tracer on -# therefore -> extra_args is not currently used in splits -untrace(what = "do_split", - where = asNamespace("rtables")) +# therefore -> extra_args is not currently used in splits (xxx could be wrong) +# could be not being hooked up +untrace(what = "do_split", where = asNamespace("rtables")) + # Let's try with the other variables identically my_tracer <- quote(if (!is.null(vals) || !is.null(labels) || isTRUE(trim)) { print("A LOT TO SAY") From f356077a2f9143d9a2a48895467241715b306e6d Mon Sep 17 00:00:00 2001 From: Melkiades Date: Thu, 20 Jul 2023 19:26:24 +0200 Subject: [PATCH 19/40] more details --- vignettes/dg_split_machinery.Rmd | 37 ++++++++++++++++++++++---------- 1 file changed, 26 insertions(+), 11 deletions(-) diff --git a/vignettes/dg_split_machinery.Rmd b/vignettes/dg_split_machinery.Rmd index 7e0a45b6f..c3fb3babf 100644 --- a/vignettes/dg_split_machinery.Rmd +++ b/vignettes/dg_split_machinery.Rmd @@ -798,6 +798,11 @@ As we have demonstrated, all the above seems like impossible cases, and are to b considered as vestigial and deprecated heritage. Final examples with `MultiVarSplit` & `CompoundSplit` +xxx CompoundSplit is used when you have an analyze with multiple variables or MultiVarSplit (nope it is a different thing, it is a single split that generates facets with different variables) makes a special AnalyzeMultiVars (inherits from CompoundSplit). AnalyzeMultiVars is for analyzing the same facets multiple times. MultiVarColSplit works with analyze_colvars, it is a different object. +.set_kids_sect_sep adds things between children (can be set from split) +xxx + +xxx file issue for multiple analyze_colvars Firstly, we want to see how `MultiVarSplit` class behaves for an example case taken from `?split_rows_by_multivar`. @@ -813,12 +818,16 @@ trace(what = "do_split", debugonce(rtables:::.apply_split_inner) lyt <- basic_table() %>% split_cols_by("ARM") %>% - split_rows_by_multivar(c("SEX", "STRATA1")) %>% - # split_rows_by("COUNTRY", split_fun = keep_split_levels("PAK")) %>% # xxx for #690 #691 + split_rows_by_multivar(c("BMRKR1", "BMRKR1"), + varlabels = c("SD", "MEAN")) %>% + split_rows_by("COUNTRY", + split_fun = keep_split_levels("PAK")) %>% # xxx for #690 #691 summarize_row_groups() %>% - analyze(c("AGE", "SEX")) + analyze(c("AGE", "SEX")) build_table(lyt, DM) + +# xxx check empty space on top -> check if it is a bug, file it untrace(what = "do_split", where = asNamespace("rtables")) ``` @@ -870,8 +879,12 @@ my_count_afun <- function(x, .N_col, .spl_context, ...) { # Core split cuts_core <- function(spl, df, vals, labels, .spl_context) { - # browser() - young_v <- as.numeric(df[["AGE"]]) < 50 + # browser() # file an issue xxx + # variables that are split on are converted to factor during the original clean-up + # cut split are not doing it but it is an exception. xxx + # young_v <- as.numeric(df[["AGE"]]) < 50 + # current solution: + young_v <- as.numeric(as.character(df[["AGE"]])) < 50 make_split_result(c("young", "old"), datasplit = list(df[young_v,], df[!young_v,]), labels = c("Younger", "Older")) @@ -891,14 +904,14 @@ tbl <- basic_table(show_colcounts = TRUE) %>% split_rows_by("ARM", split_fun = keep_split_levels(c("A: Drug X"))) %>% split_rows_by("STRATA1") %>% split_rows_by("AGE", split_fun = gen_split) %>% - split_rows_by("SEX", split_fun = drop_split_levels, - child_labels = "hidden") %>% - analyze("BMRKR1", afun = my_count_afun) %>% # This is NOT degenerate!!! BMRKR1 is only placeholder + analyze("SEX") %>% # It is the last step!! No need of BMRKR1 right? + # split_rows_by("SEX", split_fun = drop_split_levels, + # child_labels = "hidden") %>% # close issue #689. would it work for + # analyze_colvars? probably (xxx) + # analyze("BMRKR1", afun = my_count_afun) %>% # This is NOT degenerate!!! BMRKR1 is only placeholder build_table(DM) tbl -# why is it missing Older? (xxx) -# why the core split seems different from inside??? (xxx) ``` There is another way to go. We could prune them out! ```{r} @@ -916,7 +929,9 @@ tbl <- basic_table(show_colcounts = TRUE) %>% tbl # Trying with pruning -prune_table(tbl) #(xxx) what is going on here? +prune_table(tbl) #(xxx) what is going on here? it is degenerate so it has no real leaves # It is degenerate -> what to do? +# The same mechanism is applied in the case of NULL leaves, they are rolled up in the +# table tree ``` (xxx) add the pre-proc with z-scoring From 98d2985aff753f231689f6bd24f559c35cf2b1e2 Mon Sep 17 00:00:00 2001 From: Melkiades Date: Wed, 26 Jul 2023 17:26:15 +0200 Subject: [PATCH 20/40] constructing separate vignette for debugging --- vignettes/dg_debugging_rtables.Rmd | 96 ++++++++++++++++++++++++++++++ vignettes/dg_split_machinery.Rmd | 11 +--- 2 files changed, 98 insertions(+), 9 deletions(-) create mode 100644 vignettes/dg_debugging_rtables.Rmd diff --git a/vignettes/dg_debugging_rtables.Rmd b/vignettes/dg_debugging_rtables.Rmd new file mode 100644 index 000000000..0875faac2 --- /dev/null +++ b/vignettes/dg_debugging_rtables.Rmd @@ -0,0 +1,96 @@ +--- +title: "Debugging in `rtables` and beyond" +author: "Davide Garolini" +date: '`r Sys.Date()`' +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{rtables Advanced Usage} + %\VignetteEncoding{UTF-8} + %\VignetteEngine{knitr::rmarkdown} +editor_options: + chunk_output_type: console +--- + +```{r setup, include=FALSE} +knitr::opts_chunk$set(echo = TRUE) +``` + +# Debugging in `rtables` and beyond + +This is a not-so-short not-comprehensive guide to debugging `rtables`. It is to be considered, though, of general validity and of personal use. + +-=-=-=- Notes from meeting +Debugging: +-> easy to find the problem +-> never write clever code, impossible to debug + +* Coding Error, code does not do what you intended -> Bug in the punch card +* Unexpected Input. Defensive programming FAIL FAST FAIL LOUD -> useful and not too consuming +* Bug in Dependency -> never use + +e.g. of FFFL +as close as possible to the source +* bad inputs should be found very early + + +Worst: silently giving incorrect results + +common things that we can catch early +missing values/column length == 0 or length > 1 + +ROBUSTNESS: refuse to do stuff that can be very problematic + +WHAT TO DO +* Read Error Messages +debugcall you can add the signature (formals) +trace is powerful because you can add the reaction) +tracer is very good, at where it happens + +options(error=recover) is the best way to debug +core tool when developing/debugging + +dump.frames and debugger +it saves it to a file or an object and then you call debugger to step in it +as you did recover. You save the global form + +warn global option +<0 ignored +0 top level function call +1 immediately as they occur +>=2 throws errors + +<<- for recover or debugger gives it to the global environment + +lo-fi debugging +PRINT / CAT +print position of a function you get to then you get to a point where it breaks + +comment blocks -> does not work with pipes (use identity() it is a step +that does nothing but does not break the pipes) + +browser bombing + +regression tests +almost every bug should become a regression test + +debugging with pipes +-> pipes are better to write code but horrible to debug + +T pipe %T>% does print it midway + +debug_pipe -> it is like the T pipe going into browser() + +Shiny debugging is more difficult due to reactivity + +DO NOT BE CLEVER WITH CODE - ONLY IF YOU HAVE TO, CLEVER IS ALSO SUBJECTIVE +AND IT WILL CHANGE WITH TIME +-=-=-=- + + +We invite the smart developer to use the provided examples as a way to get an "interactive" and dynamic vision of the internal algorithms, as they are routinely executed when constructing tables with `rtables`. This is achieved by using `browser()` and `debugonce()` on internal and exported functions (`rtables:::` or `rtables::`), as we will see in a moment. We invite also the continuous and autonomous exploration of the multiple `S3` and `S4` objects that constitute the complexity and power of `rtables`. To do so we will use the following useful functions: + +* `methods(generic_function)`: This function lists the methods that are available for a generic function. Specifically for `S4` generic functions, `showMethods(generic_function)` is giving a more detailed information about each method (e.g. inheritance). +* `class(object)`: This function returns the class of an object. If the class is not one of the built-in classes in R, you can use this information to search for its documentation and examples. Indeed, `help(class)` may be informative, as it will call the documentation of the specific class. Similarly, the `?` operator will call the disambiguation page that delivers you to different `S4` methods. For `S3` methods it is necessary to postfix the class name with a dot (e.g. `?summary.lm`). +* `getClass(class)`: This describes in a compact way the type of class, the slots that it has, and the relationships that it may have with the other classes that may inherit or be inherited by it. With `getClass(object)`, instead, we can see to which values the slots of the object are assigned. It is possible to use `str(object, max.level = 2)` to see a less formal and more compact descriptions of the slots, but it may be problematic when there are one or more objects in the class slots. Hence, the maximum number of level should always be limited to 2 or 3 (`max.level = 2`). Similarly, `attributes()` can be used to retrieve some information, but we need to remember that storing important variables in this way is not encouraged. Information regarding the type of class can be retrieved with `mode()` and indirectly by `summary()` and `is.S4()`. +*`getAnywhere(function)` is very useful to get the source code of internal functions and specific generics. It works very well with `S3` methods, and will in any case display for each of the found methods, its relevant namespace. Similarly, `getMethod(S4_generic, S4_class)` can retrieve the source code of `S4` methods that are specific to a class. +* `eval(debugcall(generic_function(obj)))`: this is a very useful way to browser a `S4` method defined specifically for a defined object without having to manually insert `browser()` into the code. It is also possible to do similarly with R > 3.4.0 where `debug*()` calls can have the triggering signature (class) specified. Both of these are modern and simplified wrappers of tracing function `trace()`. diff --git a/vignettes/dg_split_machinery.Rmd b/vignettes/dg_split_machinery.Rmd index c3fb3babf..c797ff331 100644 --- a/vignettes/dg_split_machinery.Rmd +++ b/vignettes/dg_split_machinery.Rmd @@ -30,17 +30,10 @@ NB: we must remind the reader that `rtables` is still under active development, ## Process and Methods -(xxx reference to class vignette with this) -We invite the smart developer to use the provided examples as a way to get an "interactive" and dynamic vision of the internal algorithms, as they are routinely executed when constructing tables with `rtables`. This is achieved by using `browser()` and `debugonce()` on internal and exported functions (`rtables:::` or `rtables::`), as we will see in a moment. We invite also the continuous and autonomous exploration of the multiple `S3` and `S4` objects that constitute the complexity and power of `rtables`. To do so we will use the following useful functions: +As always, we encourage the reader to familiarize with `vignette("dg_debugging_rtables")` before going further. This document has a general validity, even if it has been tailored to study and understand complex packages like `rtables`. -* `methods(generic_function)`: This function lists the methods that are available for a generic function. Specifically for `S4` generic functions, `showMethods(generic_function)` is giving a more detailed information about each method (e.g. inheritance). -* `class(object)`: This function returns the class of an object. If the class is not one of the built-in classes in R, you can use this information to search for its documentation and examples. Indeed, `help(class)` may be informative, as it will call the documentation of the specific class. Similarly, the `?` operator will call the disambiguation page that delivers you to different `S4` methods. For `S3` methods it is necessary to postfix the class name with a dot (e.g. `?summary.lm`). -* `getClass(class)`: This describes in a compact way the type of class, the slots that it has, and the relationships that it may have with the other classes that may inherit or be inherited by it. With `getClass(object)`, instead, we can see to which values the slots of the object are assigned. It is possible to use `str(object, max.level = 2)` to see a less formal and more compact descriptions of the slots, but it may be problematic when there are one or more objects in the class slots. Hence, the maximum number of level should always be limited to 2 or 3 (`max.level = 2`). Similarly, `attributes()` can be used to retrieve some information, but we need to remember that storing important variables in this way is not encouraged. Information regarding the type of class can be retrieved with `mode()` and indirectly by `summary()` and `is.S4()`. -*`getAnywhere(function)` is very useful to get the source code of internal functions and specific generics. It works very well with `S3` methods, and will in any case display for each of the found methods, its relevant namespace. Similarly, `getMethod(S4_generic, S4_class)` can retrieve the source code of `S4` methods that are specific to a class. -* `eval(debugcall(generic_function(obj)))`: this is a very useful way to browser a `S4` method defined specifically for a defined object without having to manually insert `browser()` into the code. It is also possible to do similarly with R > 3.4.0 where `debug*()` calls can have the triggering signature (class) specified. Both of these are modern and simplified wrappers of tracing function `trace()`. - -We explore and analyze the split machinery now with a growing amount of complexity, always following relevant functions and methods throughout their execution. By going from basic to complex and by discussing important and special cases, we hope to be able to give you a good understanding of how the split machinery works. +Here, we explore and study the split machinery with a growing amount of complexity, always following relevant functions and methods throughout their execution. By going from basic to complex and by discussing important and special cases, we hope to be able to give you a good understanding of how the split machinery works. In practice, the majority of the split engine resides in the source file `R/split_funs.R` with occasional incursion into `R/make_split_fun.R` for custom split function creation and rarer references to other more general tabulation files. From 570c86d63f552ccffa5e9088fde8af9ae7f8bb15 Mon Sep 17 00:00:00 2001 From: Melkiades Date: Wed, 26 Jul 2023 17:27:21 +0200 Subject: [PATCH 21/40] adding to docs --- _pkgdown.yml | 1 + vignettes/{dg_debugging_rtables.Rmd => dg_debug_rtables.Rmd} | 0 vignettes/dg_split_machinery.Rmd | 2 +- 3 files changed, 2 insertions(+), 1 deletion(-) rename vignettes/{dg_debugging_rtables.Rmd => dg_debug_rtables.Rmd} (100%) diff --git a/_pkgdown.yml b/_pkgdown.yml index 3411f185d..4ab83ab1f 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -34,6 +34,7 @@ articles: desc: Vignettes aimed at package developers contents: - dg_split_machinery + - dg_debug_rtables reference: diff --git a/vignettes/dg_debugging_rtables.Rmd b/vignettes/dg_debug_rtables.Rmd similarity index 100% rename from vignettes/dg_debugging_rtables.Rmd rename to vignettes/dg_debug_rtables.Rmd diff --git a/vignettes/dg_split_machinery.Rmd b/vignettes/dg_split_machinery.Rmd index c797ff331..46f687df2 100644 --- a/vignettes/dg_split_machinery.Rmd +++ b/vignettes/dg_split_machinery.Rmd @@ -31,7 +31,7 @@ NB: we must remind the reader that `rtables` is still under active development, ## Process and Methods -As always, we encourage the reader to familiarize with `vignette("dg_debugging_rtables")` before going further. This document has a general validity, even if it has been tailored to study and understand complex packages like `rtables`. +As always, we encourage the reader to familiarize with `vignette("dg_debug_rtables")` before going further. This document has a general validity, even if it has been tailored to study and understand complex packages like `rtables`. Here, we explore and study the split machinery with a growing amount of complexity, always following relevant functions and methods throughout their execution. By going from basic to complex and by discussing important and special cases, we hope to be able to give you a good understanding of how the split machinery works. From 2c75dc8a2fa2959f5561a221782f8856246c10d1 Mon Sep 17 00:00:00 2001 From: Melkiades Date: Wed, 26 Jul 2023 17:32:58 +0200 Subject: [PATCH 22/40] init --- _pkgdown.yml | 3 ++- vignettes/dg_debug_rtables.Rmd | 6 +++--- vignettes/dg_tabulation.Rmd | 19 +++++++++++++++++++ 3 files changed, 24 insertions(+), 4 deletions(-) create mode 100644 vignettes/dg_tabulation.Rmd diff --git a/_pkgdown.yml b/_pkgdown.yml index 4ab83ab1f..a5d712e7b 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -33,8 +33,9 @@ articles: - title: For Developers desc: Vignettes aimed at package developers contents: - - dg_split_machinery - dg_debug_rtables + - dg_split_machinery + - dg_tabulation reference: diff --git a/vignettes/dg_debug_rtables.Rmd b/vignettes/dg_debug_rtables.Rmd index 0875faac2..b6bd20c04 100644 --- a/vignettes/dg_debug_rtables.Rmd +++ b/vignettes/dg_debug_rtables.Rmd @@ -20,9 +20,9 @@ knitr::opts_chunk$set(echo = TRUE) This is a not-so-short not-comprehensive guide to debugging `rtables`. It is to be considered, though, of general validity and of personal use. -=-=-=- Notes from meeting -Debugging: --> easy to find the problem --> never write clever code, impossible to debug +In general code must be in a way that: +-> it is easy to read and find problems +-> it is not clever, because it is impossible to debug * Coding Error, code does not do what you intended -> Bug in the punch card * Unexpected Input. Defensive programming FAIL FAST FAIL LOUD -> useful and not too consuming diff --git a/vignettes/dg_tabulation.Rmd b/vignettes/dg_tabulation.Rmd new file mode 100644 index 000000000..84cedbb95 --- /dev/null +++ b/vignettes/dg_tabulation.Rmd @@ -0,0 +1,19 @@ +--- +title: "Tabulation" +author: "Davide Garolini" +date: '`r Sys.Date()`' +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{rtables Advanced Usage} + %\VignetteEncoding{UTF-8} + %\VignetteEngine{knitr::rmarkdown} +editor_options: + chunk_output_type: console +--- + +```{r setup, include=FALSE} +knitr::opts_chunk$set(echo = TRUE) +``` + +# Tabulation + From fa0d41e9a417617136092deb5ef1fff9b11024d2 Mon Sep 17 00:00:00 2001 From: Melkiades Date: Wed, 26 Jul 2023 17:41:37 +0200 Subject: [PATCH 23/40] init 2 --- vignettes/dg_debug_rtables.Rmd | 6 +++++- vignettes/dg_tabulation.Rmd | 5 +++++ 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/vignettes/dg_debug_rtables.Rmd b/vignettes/dg_debug_rtables.Rmd index b6bd20c04..36c5b97eb 100644 --- a/vignettes/dg_debug_rtables.Rmd +++ b/vignettes/dg_debug_rtables.Rmd @@ -15,9 +15,13 @@ editor_options: knitr::opts_chunk$set(echo = TRUE) ``` +## Disclaimer + +Any code or prose which appears in a version of this vignette on the `main` branch of the repository may reflect a specific state of things that can be more or less recent. This guide reflects very important pieces of the split machinery that are unlikely to change. We anyway invite the reader to think always that the current code may have drifted from the following document, and it is always the best practice to read directly the code on `main`. + # Debugging in `rtables` and beyond -This is a not-so-short not-comprehensive guide to debugging `rtables`. It is to be considered, though, of general validity and of personal use. +This is a not-so-short and not-comprehensive guide to debugging `rtables`. It is to be considered, though, of general validity and of personal use. -=-=-=- Notes from meeting In general code must be in a way that: diff --git a/vignettes/dg_tabulation.Rmd b/vignettes/dg_tabulation.Rmd index 84cedbb95..77f032a87 100644 --- a/vignettes/dg_tabulation.Rmd +++ b/vignettes/dg_tabulation.Rmd @@ -14,6 +14,11 @@ editor_options: ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` +## Disclaimer + +Any code or prose which appears in a version of this vignette on the `main` branch of the repository may reflect a specific state of things that can be more or less recent. This guide reflects very important pieces of the split machinery that are unlikely to change. We anyway invite the reader to think always that the current code may have drifted from the following document, and it is always the best practice to read directly the code on `main`. (xxx we should insert it automatically) # Tabulation +Tabulation in `rtables` is a process that takes a pre-defined layout and applies it to the data. The layout object, with all its splits (see `vignette("dg_split_machinery")`) and `analyze`s, can be applied to different data to produce valid tables. This process is happening principally inside the file `tt_dotabulation.R` and its principal user-facing function `build_table` that resides in it. We assume any reader is already familiar with the documentation related to `build_table`. + From aac6f615239bd7ee1973fb711e4a33db43449530 Mon Sep 17 00:00:00 2001 From: Melkiades Date: Wed, 26 Jul 2023 17:56:54 +0200 Subject: [PATCH 24/40] up --- vignettes/dg_tabulation.Rmd | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/vignettes/dg_tabulation.Rmd b/vignettes/dg_tabulation.Rmd index 77f032a87..26ce6a47b 100644 --- a/vignettes/dg_tabulation.Rmd +++ b/vignettes/dg_tabulation.Rmd @@ -20,5 +20,23 @@ Any code or prose which appears in a version of this vignette on the `main` bran # Tabulation -Tabulation in `rtables` is a process that takes a pre-defined layout and applies it to the data. The layout object, with all its splits (see `vignette("dg_split_machinery")`) and `analyze`s, can be applied to different data to produce valid tables. This process is happening principally inside the file `tt_dotabulation.R` and its principal user-facing function `build_table` that resides in it. We assume any reader is already familiar with the documentation related to `build_table`. +Tabulation in `rtables` is a process that takes a pre-defined layout and applies it to the data. The layout object, with all its splits (see `vignette("dg_split_machinery")`) and `analyze`s, can be applied to different data to produce valid tables. This process is happening principally inside the file `tt_dotabulation.R` and its principal user-facing function `build_table` that resides in it. We will sometimes see functions and methods that are present in other files like `colby_construction.R`. We assume any reader is already familiar with the documentation related to `build_table`. Also, we suggest reading first the vignette regarding the split machinery (`vignette("dg_split_machinery")`), as it is instrumental in understanding how the layout object, which is built principally of splits, is tabulated when data is applied. +This time, we enter in _medias res_ into `build_table` to see how it is meant to work. + +```{r, eval=FALSE} +# rtables 6.2.0 +debugonce(build_table) + +# A very simple layout +lyt <- basic_table() %>% + split_rows_by("STRATA1") %>% + split_rows_by("SEX", split_fun = drop_split_levels) %>% + split_cols_by("ARM") %>% + analyze("BMRKR1") +# lyt must be a PreDataTableLayouts object +is(lyt, "PreDataTableLayouts") + +lyt %>% build_table(DM) + +``` \ No newline at end of file From c9cba0ea02a846bd2340660bb0b0c3695193ed9c Mon Sep 17 00:00:00 2001 From: Melkiades Date: Wed, 2 Aug 2023 18:16:48 +0200 Subject: [PATCH 25/40] update --- vignettes/dg_tabulation.Rmd | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/vignettes/dg_tabulation.Rmd b/vignettes/dg_tabulation.Rmd index 26ce6a47b..81c97044a 100644 --- a/vignettes/dg_tabulation.Rmd +++ b/vignettes/dg_tabulation.Rmd @@ -39,4 +39,18 @@ is(lyt, "PreDataTableLayouts") lyt %>% build_table(DM) +``` + +Now lets see the interior of our `build_table`, After initial check that the layout is a pre-data table layout, it is checked if the coulmn layout is defined (`clayout` accessor), i.e. it does not have any column split. If that is the case, a `All obs` column is added automatically with all observations. After this, there are a couple of defensive programming calls that do checks and fixtures as we finally have the data. These divide in two kinds: the one that are mainly concerning the layout, which are defined as generics and the one concerning the data that is instead a function as it is not dependent on the layout class. Indeed, the layout is structured and can be divided in `clayout` and `rlayout` (column and row layout). The first one is used to create `cinfo` which is the general object and container of the column splits and information. The second one contains the obligatory all data split, i.e. the root split (accessible with `root_spl`), and the row splits' vectors which are iterative splits in the row space. In the following we consider first the checks and defensive programming. +```{r, eval=FALSE} + ## do checks and defensive programming now that we have the data + lyt <- fix_dyncuts(lyt, df) # now that I have the data, I create the splits + lyt <- set_def_child_ord(lyt, df) # with the data I set the same order for all splits + lyt <- fix_analyze_vis(lyt) # checks if the analyze last split should be visible + df <- fix_split_vars(lyt, df, char_ok = is.null(col_counts)) + # checks if split vars are present + + lyt[] # preserve names and it is just warning if longer + lyt@.Data # might not preserve the names -> xxx fixme comment there + # Do extensive testing about these behaviors ``` \ No newline at end of file From b4e72ced609945fd71a18b5b3c4ff116f18862e4 Mon Sep 17 00:00:00 2001 From: Melkiades Date: Wed, 2 Aug 2023 19:04:52 +0200 Subject: [PATCH 26/40] few things --- vignettes/dg_tabulation.Rmd | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/vignettes/dg_tabulation.Rmd b/vignettes/dg_tabulation.Rmd index 81c97044a..0bc7f6db4 100644 --- a/vignettes/dg_tabulation.Rmd +++ b/vignettes/dg_tabulation.Rmd @@ -47,10 +47,17 @@ Now lets see the interior of our `build_table`, After initial check that the lay lyt <- fix_dyncuts(lyt, df) # now that I have the data, I create the splits lyt <- set_def_child_ord(lyt, df) # with the data I set the same order for all splits lyt <- fix_analyze_vis(lyt) # checks if the analyze last split should be visible + # If there is only one you will not get the variable name otherwise you get it if you + # have multivar and default is na. You can do it now only because you are sure to + # have the whole layout. df <- fix_split_vars(lyt, df, char_ok = is.null(col_counts)) # checks if split vars are present lyt[] # preserve names and it is just warning if longer lyt@.Data # might not preserve the names -> xxx fixme comment there # Do extensive testing about these behaviors + + # xxx PreDataAxisLayout is virtual class that both row and cols layouts inherit from + # Virtual classes are handy for group classes that need to share common things like + # labels or functions that need to be applicable to their subclasses ``` \ No newline at end of file From d6d09fb96935eda758ad13de7afb321aafe6e84d Mon Sep 17 00:00:00 2001 From: Melkiades Date: Thu, 10 Aug 2023 12:14:30 +0200 Subject: [PATCH 27/40] update --- vignettes/dg_tabulation.Rmd | 42 +++++++++++++++++++++++++------------ 1 file changed, 29 insertions(+), 13 deletions(-) diff --git a/vignettes/dg_tabulation.Rmd b/vignettes/dg_tabulation.Rmd index 0bc7f6db4..e8bb5cdd4 100644 --- a/vignettes/dg_tabulation.Rmd +++ b/vignettes/dg_tabulation.Rmd @@ -20,7 +20,7 @@ Any code or prose which appears in a version of this vignette on the `main` bran # Tabulation -Tabulation in `rtables` is a process that takes a pre-defined layout and applies it to the data. The layout object, with all its splits (see `vignette("dg_split_machinery")`) and `analyze`s, can be applied to different data to produce valid tables. This process is happening principally inside the file `tt_dotabulation.R` and its principal user-facing function `build_table` that resides in it. We will sometimes see functions and methods that are present in other files like `colby_construction.R`. We assume any reader is already familiar with the documentation related to `build_table`. Also, we suggest reading first the vignette regarding the split machinery (`vignette("dg_split_machinery")`), as it is instrumental in understanding how the layout object, which is built principally of splits, is tabulated when data is applied. +Tabulation in `rtables` is a process that takes a pre-defined layout and applies it to the data. The layout object, with all its splits (see `vignette("dg_split_machinery")`) and `analyze`s, can be applied to different data to produce valid tables. This process is happening principally inside the file `tt_dotabulation.R` and its principal user-facing function `build_table` that resides in it. We will sometimes see functions and methods that are present in other files like `colby_construction.R` or `make_subset_expr.R`. We assume any reader is already familiar with the documentation related to `build_table`. Also, we suggest reading first the vignette regarding the split machinery (`vignette("dg_split_machinery")`), as it is instrumental in understanding how the layout object, which is built principally of splits, is tabulated when data is applied. This time, we enter in _medias res_ into `build_table` to see how it is meant to work. @@ -44,20 +44,36 @@ lyt %>% build_table(DM) Now lets see the interior of our `build_table`, After initial check that the layout is a pre-data table layout, it is checked if the coulmn layout is defined (`clayout` accessor), i.e. it does not have any column split. If that is the case, a `All obs` column is added automatically with all observations. After this, there are a couple of defensive programming calls that do checks and fixtures as we finally have the data. These divide in two kinds: the one that are mainly concerning the layout, which are defined as generics and the one concerning the data that is instead a function as it is not dependent on the layout class. Indeed, the layout is structured and can be divided in `clayout` and `rlayout` (column and row layout). The first one is used to create `cinfo` which is the general object and container of the column splits and information. The second one contains the obligatory all data split, i.e. the root split (accessible with `root_spl`), and the row splits' vectors which are iterative splits in the row space. In the following we consider first the checks and defensive programming. ```{r, eval=FALSE} ## do checks and defensive programming now that we have the data - lyt <- fix_dyncuts(lyt, df) # now that I have the data, I create the splits - lyt <- set_def_child_ord(lyt, df) # with the data I set the same order for all splits - lyt <- fix_analyze_vis(lyt) # checks if the analyze last split should be visible + lyt <- fix_dyncuts(lyt, df) # Now that I have the data, I create the splits that depends on data + lyt <- set_def_child_ord(lyt, df) # With the data I set the same order for all splits + lyt <- fix_analyze_vis(lyt) # Checks if the analyze last split should be visible # If there is only one you will not get the variable name otherwise you get it if you - # have multivar and default is na. You can do it now only because you are sure to + # have multivar. Default is NA. You can do it now only because you are sure to # have the whole layout. df <- fix_split_vars(lyt, df, char_ok = is.null(col_counts)) # checks if split vars are present - lyt[] # preserve names and it is just warning if longer - lyt@.Data # might not preserve the names -> xxx fixme comment there - # Do extensive testing about these behaviors - - # xxx PreDataAxisLayout is virtual class that both row and cols layouts inherit from - # Virtual classes are handy for group classes that need to share common things like - # labels or functions that need to be applicable to their subclasses -``` \ No newline at end of file + lyt[] # preserve names and it is just warning if longer, and repeats the value if only one + lyt@.Data # might not preserve the names # it works only when it is another class that inherits from lists + # We suggest to do extensive testing about these behaviors in order to do choose + # the appropriate one +``` +Along the various checks and defensive programming, we found `PreDataAxisLayout` which is a virtual class that both row and cols layouts inherit from. Virtual classes are handy for group classes that need to share common things like labels or functions that need to be applicable to their relative classes. Check more information about `rtables` class hierarchy in the dedicated dev vignette (xxx add). + +Now, we continue with `build_table`. We notice after the checks `TreePos()` which is a constructor for an oject that retains a representation of the tree position along with split values and labels. This is mainly used by `create_colinfo` that we decide to enter now with `debugonce(create_colinfo)`. This function creates the object that represent the column splits and everything else that may be related with the columns. In particular, in this function the column counts are calculated. The parameter inputs are as follows: + +```{r, eval=FALSE} +cinfo <- create_colinfo(lyt, # Main layout with col splits info + df, # df used for splits and col counts if no alt_counts_df is present + rtpos, # TreePos (does not change in out of this function) + counts = col_counts, # If we want to overwrite the calculations with df/alt_counts_df + alt_counts_df = alt_counts_df, # alternative data for col counts + total = col_total, # calculated from build_table inputs (nrow of df or alt_counts_df) + topleft) # topleft information added into build_table +``` + +`create_colinfo` is in `make_subset_expr.R`. Here, we see that if `topleft` is present in `build_table`, it will override the one in `lyt`. + +xxx -> add a warning for NAs in the splits + +xxx -> precedence for col counts: document it if not present. Original use case is that you split events while the right colcounts is patients. First only counts as vector was added, but it is often the case that you have the possibility to add alt_counts_df From 698f928a0d05830405440701b5b0d0a431bc4e15 Mon Sep 17 00:00:00 2001 From: Melkiades Date: Thu, 17 Aug 2023 18:57:59 +0200 Subject: [PATCH 28/40] up --- vignettes/dg_tabulation.Rmd | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/vignettes/dg_tabulation.Rmd b/vignettes/dg_tabulation.Rmd index e8bb5cdd4..a6c983b95 100644 --- a/vignettes/dg_tabulation.Rmd +++ b/vignettes/dg_tabulation.Rmd @@ -26,6 +26,7 @@ This time, we enter in _medias res_ into `build_table` to see how it is meant to ```{r, eval=FALSE} # rtables 6.2.0 +library(rtables) debugonce(build_table) # A very simple layout @@ -72,8 +73,24 @@ cinfo <- create_colinfo(lyt, # Main layout with col splits info topleft) # topleft information added into build_table ``` -`create_colinfo` is in `make_subset_expr.R`. Here, we see that if `topleft` is present in `build_table`, it will override the one in `lyt`. +`create_colinfo` is in `make_subset_expr.R`. Here, we see that if `topleft` is present in `build_table`, it will override the one in `lyt`. Entering `create_colinfo`, we will see the following calls: +```{r, eval=FALSE} + + clayout <- clayout(lyt) # Extracts column split and info + if(is.null(topleft)) + topleft <- top_left(lyt) # If top_left is not present in build_table, it is took from lyt + ctree <- coltree(clayout, df = df, rtpos = rtpos) # Main constructor of LayoutColTree + # The above is referenced as generic and principally represented as + # setMethod("coltree", "PreDataColLayout", (located in `tree_accessor.R`). + # This is a call that restructure information from clayout and df and rtpos to get a more compact column tree + # layout. Part of this design is related to past implementations. + + cexprs <- make_col_subsets(ctree, df) # extracts expressions in a compact fashion. WARNING, removing NAs at this step + # is automatic. This should be coupled with a warning for NAs in the split (xxx) + colextras <- col_extra_args(ctree) # retrieves extra_args from the tree. It may be not used + +``` -xxx -> add a warning for NAs in the splits +Next in the function there is the creation of the column counts. For now this happens only at the leaf level but it can be certainly calculated for all levels independently (this is current issue in `rtables`, i.e. how to print other levels' totals). Precedence for col counts may be not documented (xxx todo). Original use case is that you split events while the column counts is the number of patients and not events. First only counts as vector was added, but it is often the case that you have the possibility to add `alt_counts_df`. Finally the `cinfo` object is created (`InstantiatedColumnInfo`) with all the above information. -xxx -> precedence for col counts: document it if not present. Original use case is that you split events while the right colcounts is patients. First only counts as vector was added, but it is often the case that you have the possibility to add alt_counts_df +Now, if we continue in `build_table` we hit `.make_ctab` for a root split. This is a general initial value that produces the root split as a content row. Indeed `ctab` stays for content row which is a row that has only a label in it. From the documentation regarding `summarize_row_groups`, you know that this is the way `rtables` defines label rows, as content rows. `.make_ctab` is very close to the actual creation of the table row which is done with `.make_tablerows`. This function also uses `parent_cfun` and `.make_caller` to retrieve the content function inserted in above levels \ No newline at end of file From 7d7b26cdad5eac28d2463a81f38acb5a321f7822 Mon Sep 17 00:00:00 2001 From: Melkiades Date: Fri, 18 Aug 2023 16:54:49 +0200 Subject: [PATCH 29/40] completed dev guide for tabulation --- vignettes/dg_tabulation.Rmd | 75 ++++++++++++++++++++++++++++++++++--- 1 file changed, 70 insertions(+), 5 deletions(-) diff --git a/vignettes/dg_tabulation.Rmd b/vignettes/dg_tabulation.Rmd index a6c983b95..10bba3a40 100644 --- a/vignettes/dg_tabulation.Rmd +++ b/vignettes/dg_tabulation.Rmd @@ -82,15 +82,80 @@ cinfo <- create_colinfo(lyt, # Main layout with col splits info ctree <- coltree(clayout, df = df, rtpos = rtpos) # Main constructor of LayoutColTree # The above is referenced as generic and principally represented as # setMethod("coltree", "PreDataColLayout", (located in `tree_accessor.R`). - # This is a call that restructure information from clayout and df and rtpos to get a more compact column tree - # layout. Part of this design is related to past implementations. + # This is a call that restructure information from clayout and df and rtpos + # to get a more compact column tree layout. Part of this design is related + # to past implementations. - cexprs <- make_col_subsets(ctree, df) # extracts expressions in a compact fashion. WARNING, removing NAs at this step - # is automatic. This should be coupled with a warning for NAs in the split (xxx) + cexprs <- make_col_subsets(ctree, df) # extracts expressions in a compact fashion. WARNING, + # removing NAs at this step is automatic. This should + # be coupled with a warning for NAs in the split (xxx) colextras <- col_extra_args(ctree) # retrieves extra_args from the tree. It may be not used ``` Next in the function there is the creation of the column counts. For now this happens only at the leaf level but it can be certainly calculated for all levels independently (this is current issue in `rtables`, i.e. how to print other levels' totals). Precedence for col counts may be not documented (xxx todo). Original use case is that you split events while the column counts is the number of patients and not events. First only counts as vector was added, but it is often the case that you have the possibility to add `alt_counts_df`. Finally the `cinfo` object is created (`InstantiatedColumnInfo`) with all the above information. -Now, if we continue in `build_table` we hit `.make_ctab` for a root split. This is a general initial value that produces the root split as a content row. Indeed `ctab` stays for content row which is a row that has only a label in it. From the documentation regarding `summarize_row_groups`, you know that this is the way `rtables` defines label rows, as content rows. `.make_ctab` is very close to the actual creation of the table row which is done with `.make_tablerows`. This function also uses `parent_cfun` and `.make_caller` to retrieve the content function inserted in above levels \ No newline at end of file +Now, if we continue in `build_table` we hit `.make_ctab` for a root split. This is a general initial procedure that generates the needed root split as a content row. Indeed `ctab` stays for content row which is a row that has only a label in it. From the documentation regarding `summarize_row_groups`, you know that this is the way `rtables` defines label rows, i.e. as content rows. `.make_ctab` is very close to the actual creation of the table row which is done with `.make_tablerows`. Note that this function also uses `parent_cfun` and `.make_caller` to retrieve the content function inserted in above levels. We split here what is the structural handling of the table object and the rows creation engine that are divided by `.make_tablerows` call. If you search the whole package, you will find that this function is called only twice, once in `.make_ctab` and once in `.make_analyzed_tab`. These two are the final elements of the table construction: the creation of rows. + +Going back to `build_table`, you will see that the row layout is actually a list of split vectors. The fundamental line ` kids <- lapply(seq_along(rlyt), function(i) {` allows us to appreciate this. Going forward we see how `recursive_applysplit` is applied to each split vector. It may be worth it to check how, in our test case, this vector looks like. + +```{r, eval=FALSE} +# rtables 6.2.0 + +# A very simple layout +lyt <- basic_table() %>% + split_rows_by("STRATA1") %>% + split_rows_by("SEX", split_fun = drop_split_levels) %>% + split_cols_by("ARM") %>% + analyze("BMRKR1") +rlyt <- rtables:::rlayout(lyt) +str(rlyt, max.level = 2) +Formal class 'PreDataRowLayout' [package "rtables"] with 2 slots + ..@ .Data :List of 2 # rlyt is a rtables object (PreDataRowLayout) that is also a list! + ..@ root_split:Formal class 'RootSplit' [package "rtables"] with 17 slots # another object! + +str(rtables:::root_spl(rlyt), max.level = 2) # it is still a split + +str(rlyt[[1]], max.level = 3) # still a rtables object (SplitVector) that is a list +Formal class 'SplitVector' [package "rtables"] with 1 slot + ..@ .Data:List of 3 + .. ..$ :Formal class 'VarLevelSplit' [package "rtables"] with 20 slots + .. ..$ :Formal class 'VarLevelSplit' [package "rtables"] with 20 slots + .. ..$ :Formal class 'AnalyzeMultiVars' [package "rtables"] with 17 slots +``` + +The last print is very informative. We can see from the layout construction that this object is built with 2 `VarLevelSplit` on the rows and one final `AnalyzeMultiVars` which is the leaf analysis split that has the final level rows. The second split vector is the following `AnalyzeVarSplit`. + +```{r, eval=FALSE} +> str(rlyt[[2]], max.level = 5) +Formal class 'SplitVector' [package "rtables"] with 1 slot + ..@ .Data:List of 1 + .. ..$ :Formal class 'AnalyzeVarSplit' [package "rtables"] with 21 slots + .. .. .. ..@ analysis_fun :function (x, ...) + .. .. .. .. ..- attr(*, "srcref")= 'srcref' int [1:8] 1723 5 1732 5 5 5 4198 4207 + .. .. .. .. .. ..- attr(*, "srcfile")=Classes 'srcfilealias', 'srcfile' + .. .. .. ..@ default_rowlabel : chr "Var3 Counts" + .. .. .. ..@ include_NAs : logi FALSE + .. .. .. ..@ var_label_position : chr "default" + .. .. .. ..@ payload : chr "VAR3" + .. .. .. ..@ name : chr "VAR3" + .. .. .. ..@ split_label : chr "Var3 Counts" + .. .. .. ..@ split_format : NULL + .. .. .. ..@ split_na_str : chr NA + .. .. .. ..@ split_label_position : chr(0) + .. .. .. ..@ content_fun : NULL + .. .. .. ..@ content_format : NULL + .. .. .. ..@ content_na_str : chr(0) + .. .. .. ..@ content_var : chr "" + .. .. .. ..@ label_children : logi FALSE + .. .. .. ..@ extra_args : list() + .. .. .. ..@ indent_modifier : int 0 + .. .. .. ..@ content_indent_modifier: int 0 + .. .. .. ..@ content_extra_args : list() + .. .. .. ..@ page_title_prefix : chr NA + .. .. .. ..@ child_section_div : chr NA +``` + +Continuing in `recursive_applysplit`, this is made up of two main calls: one to `.make_ctab` which makes the content row and calculates the counts if specified so, and `.make_split_kids`. This eventually contains `recursive_applysplit` if the split vector is built of `VarLevelSplit`. Ineed, here, it being a generic is very handy to switch between different downstream processes. In our case (`rlyt[[1]]`), we will call `setMethod(".make_split_kids", "Split",` twice before getting to the analysis split. There, we can have a multi variable split which would apply `.make_split_kids` to each of its elements, in turns calling the main `setMethod(".make_split_kids", "VAnalyzeSplit",` which would in turn go to `.make_analyzed_tab`. There are interesting edge cases here for different split cases like `split_by_multivars` and when one of the splits has a reference group. In the code here, it is called `baseline` internally. If we follow this variable across the function layers we will see that where the split (`do_split`) happens (in `setMethod(".make_split_kids", "Split",`), we have a second split for the reference group. This is done so to have available this in each row, to calculate, for example, differences with reference group. + +Now we move towards `.make_tablerows`, and here anlysis functions become key, i.e. is the place where these are applied and analyzed. First of all, the external `tryCatch` is used to cache errors at a higher level, so to differentiate the two major blocks. The function parameters are quite intuitive, with the exception of `spl_context`. This is a fundamental parameter, that helps keeping information about the splits that can be visible from analysis functions. Follow up and down this value and you will see that is brought and updated everywhere a split happens, except for columns. Column-related information is added last, when in `gen_onerv` which is the lowest level, where one result value is produced. From `.make_tablerows` we go to `gen_rowvalues`, beside some row and referential footers handling. `gen_rowvalues` unpacks the `cinfo` object and crosses it with the arriving row splitted information to generate rows. In particular `rawvals <- mapply(gen_onerv,` maps the columns to generate a list of values corresponding to a table row. Looking at the final if in `gen_onerv` we see that `if(!is(val, "RowsVerticalSection")) {` the function `in_rows` is called. We invite to read what are the building blocks of that and why `.make_tablerows` has the following function `rowconstr` that other is not if the constructor of a data row `DataRow` or a `ContentRow` depending if it is called from `.make_ctab` or `.make_analyzed_tab`. From 201d90213007abe87822a3040f655332e36f3066 Mon Sep 17 00:00:00 2001 From: Melkiades Date: Mon, 21 Aug 2023 10:24:09 +0200 Subject: [PATCH 30/40] update --- vignettes/dg_tabulation.Rmd | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/vignettes/dg_tabulation.Rmd b/vignettes/dg_tabulation.Rmd index 10bba3a40..b72a5202a 100644 --- a/vignettes/dg_tabulation.Rmd +++ b/vignettes/dg_tabulation.Rmd @@ -97,7 +97,7 @@ Next in the function there is the creation of the column counts. For now this ha Now, if we continue in `build_table` we hit `.make_ctab` for a root split. This is a general initial procedure that generates the needed root split as a content row. Indeed `ctab` stays for content row which is a row that has only a label in it. From the documentation regarding `summarize_row_groups`, you know that this is the way `rtables` defines label rows, i.e. as content rows. `.make_ctab` is very close to the actual creation of the table row which is done with `.make_tablerows`. Note that this function also uses `parent_cfun` and `.make_caller` to retrieve the content function inserted in above levels. We split here what is the structural handling of the table object and the rows creation engine that are divided by `.make_tablerows` call. If you search the whole package, you will find that this function is called only twice, once in `.make_ctab` and once in `.make_analyzed_tab`. These two are the final elements of the table construction: the creation of rows. -Going back to `build_table`, you will see that the row layout is actually a list of split vectors. The fundamental line ` kids <- lapply(seq_along(rlyt), function(i) {` allows us to appreciate this. Going forward we see how `recursive_applysplit` is applied to each split vector. It may be worth it to check how, in our test case, this vector looks like. +Going back to `build_table`, you will see that the row layout is actually a list of split vectors. The fundamental line `kids <- lapply(seq_along(rlyt), function(i) {` allows us to appreciate this. Going forward we see how `recursive_applysplit` is applied to each split vector. It may be worth it to check how, in our test case, this vector looks like. ```{r, eval=FALSE} # rtables 6.2.0 @@ -113,6 +113,8 @@ str(rlyt, max.level = 2) Formal class 'PreDataRowLayout' [package "rtables"] with 2 slots ..@ .Data :List of 2 # rlyt is a rtables object (PreDataRowLayout) that is also a list! ..@ root_split:Formal class 'RootSplit' [package "rtables"] with 17 slots # another object! + # If you do summarize_row_groups before anything you act on the root split. We need this to + # have a place for the content that is valid for the whole table. str(rtables:::root_spl(rlyt), max.level = 2) # it is still a split @@ -126,8 +128,11 @@ Formal class 'SplitVector' [package "rtables"] with 1 slot The last print is very informative. We can see from the layout construction that this object is built with 2 `VarLevelSplit` on the rows and one final `AnalyzeMultiVars` which is the leaf analysis split that has the final level rows. The second split vector is the following `AnalyzeVarSplit`. +xxx to get multiple split vectors you need to break the nesting with `nest = FALSE` or by adding a `split_rows_by` after an `analyze` call. + ```{r, eval=FALSE} -> str(rlyt[[2]], max.level = 5) +# rtables 6.2.0 +str(rlyt[[2]], max.level = 5) Formal class 'SplitVector' [package "rtables"] with 1 slot ..@ .Data:List of 1 .. ..$ :Formal class 'AnalyzeVarSplit' [package "rtables"] with 21 slots @@ -156,6 +161,12 @@ Formal class 'SplitVector' [package "rtables"] with 1 slot .. .. .. ..@ child_section_div : chr NA ``` -Continuing in `recursive_applysplit`, this is made up of two main calls: one to `.make_ctab` which makes the content row and calculates the counts if specified so, and `.make_split_kids`. This eventually contains `recursive_applysplit` if the split vector is built of `VarLevelSplit`. Ineed, here, it being a generic is very handy to switch between different downstream processes. In our case (`rlyt[[1]]`), we will call `setMethod(".make_split_kids", "Split",` twice before getting to the analysis split. There, we can have a multi variable split which would apply `.make_split_kids` to each of its elements, in turns calling the main `setMethod(".make_split_kids", "VAnalyzeSplit",` which would in turn go to `.make_analyzed_tab`. There are interesting edge cases here for different split cases like `split_by_multivars` and when one of the splits has a reference group. In the code here, it is called `baseline` internally. If we follow this variable across the function layers we will see that where the split (`do_split`) happens (in `setMethod(".make_split_kids", "Split",`), we have a second split for the reference group. This is done so to have available this in each row, to calculate, for example, differences with reference group. +Continuing in `recursive_applysplit`, this is made up of two main calls: one to `.make_ctab` which makes the content row and calculates the counts if specified so, and `.make_split_kids`. This eventually contains `recursive_applysplit` if the split vector is built of `Split` that are not `analyze` splits. Indeed, here, it being a generic is very handy to switch between different downstream processes. In our case (`rlyt[[1]]`), we will call the method `getMethod(".make_split_kids", "Split")` twice before getting to the analysis split. There, we can have a (xxx) multi variable split which would apply `.make_split_kids` to each of its elements, in turns calling the main `getMethod(".make_split_kids", "VAnalyzeSplit")` which would in turn go to `.make_analyzed_tab`. + +There are interesting edge cases here for different split cases like `split_by_multivars` and when one of the splits has a reference group. In the code here, it is called `baseline` internally. If we follow this variable across the function layers we will see that where the split (`do_split`) happens (in `getMethod(".make_split_kids", "Split")`), we have a second split for the reference group. This is done so to have available this in each row, to calculate, for example, differences with reference group. + +Now we move towards `.make_tablerows`, and here analysis functions become key, i.e. is the place where these are applied and analyzed. First of all, the external `tryCatch` is used to cache errors at a higher level, so to differentiate the two major blocks. The function parameters are quite intuitive, with the exception of `spl_context`. This is a fundamental parameter, that helps keeping information about the splits that can be visible from analysis functions. Follow up and down this value and you will see that is brought and updated everywhere a split happens, except for columns. Column-related information is added last, when in `gen_onerv` which is the lowest level, where one result value is produced. From `.make_tablerows` we go to `gen_rowvalues`, beside some row and referential footers handling. `gen_rowvalues` unpacks the `cinfo` object and crosses it with the arriving row split information to generate rows. In particular `rawvals <- mapply(gen_onerv,` maps the columns to generate a list of values corresponding to a table row. Looking at the final if in `gen_onerv` we see that `if(!is(val, "RowsVerticalSection"))` the function `in_rows` is called. We invite to read what are the building blocks of that and why `.make_tablerows` has the following function `rowconstr` that other is not if the constructor of a data row `DataRow` or a `ContentRow` depending if it is called from `.make_ctab` or `.make_analyzed_tab`. + +`.make_tablerows` either makes a content table or an "analysis table" `gen_rowvalues` generates a list of stacks (`RowsVerticalSection`, more than one rows potentially!) for each column -Now we move towards `.make_tablerows`, and here anlysis functions become key, i.e. is the place where these are applied and analyzed. First of all, the external `tryCatch` is used to cache errors at a higher level, so to differentiate the two major blocks. The function parameters are quite intuitive, with the exception of `spl_context`. This is a fundamental parameter, that helps keeping information about the splits that can be visible from analysis functions. Follow up and down this value and you will see that is brought and updated everywhere a split happens, except for columns. Column-related information is added last, when in `gen_onerv` which is the lowest level, where one result value is produced. From `.make_tablerows` we go to `gen_rowvalues`, beside some row and referential footers handling. `gen_rowvalues` unpacks the `cinfo` object and crosses it with the arriving row splitted information to generate rows. In particular `rawvals <- mapply(gen_onerv,` maps the columns to generate a list of values corresponding to a table row. Looking at the final if in `gen_onerv` we see that `if(!is(val, "RowsVerticalSection")) {` the function `in_rows` is called. We invite to read what are the building blocks of that and why `.make_tablerows` has the following function `rowconstr` that other is not if the constructor of a data row `DataRow` or a `ContentRow` depending if it is called from `.make_ctab` or `.make_analyzed_tab`. +to add: conceptual part -> calculating things by column and putting them side by side and slicing them by rows and putting it together -> rtables is row dominant From a7f0cfe6f784a7eb35037624a7542fc57431122f Mon Sep 17 00:00:00 2001 From: Emily de la Rua Date: Fri, 13 Oct 2023 16:42:31 -0400 Subject: [PATCH 31/40] Move files to inst/dev-guide --- {vignettes => inst/dev-guide}/dg_debug_rtables.Rmd | 0 {vignettes => inst/dev-guide}/dg_split_machinery.Rmd | 0 {vignettes => inst/dev-guide}/dg_tabulation.Rmd | 0 3 files changed, 0 insertions(+), 0 deletions(-) rename {vignettes => inst/dev-guide}/dg_debug_rtables.Rmd (100%) rename {vignettes => inst/dev-guide}/dg_split_machinery.Rmd (100%) rename {vignettes => inst/dev-guide}/dg_tabulation.Rmd (100%) diff --git a/vignettes/dg_debug_rtables.Rmd b/inst/dev-guide/dg_debug_rtables.Rmd similarity index 100% rename from vignettes/dg_debug_rtables.Rmd rename to inst/dev-guide/dg_debug_rtables.Rmd diff --git a/vignettes/dg_split_machinery.Rmd b/inst/dev-guide/dg_split_machinery.Rmd similarity index 100% rename from vignettes/dg_split_machinery.Rmd rename to inst/dev-guide/dg_split_machinery.Rmd diff --git a/vignettes/dg_tabulation.Rmd b/inst/dev-guide/dg_tabulation.Rmd similarity index 100% rename from vignettes/dg_tabulation.Rmd rename to inst/dev-guide/dg_tabulation.Rmd From 3bf0329cad8cff6b12092efdc2830e57b4e35d9e Mon Sep 17 00:00:00 2001 From: Emily de la Rua Date: Fri, 13 Oct 2023 16:45:55 -0400 Subject: [PATCH 32/40] Update output format, render location --- inst/dev-guide/dg_debug_rtables.Rmd | 14 +++++++------- inst/dev-guide/dg_split_machinery.Rmd | 17 ++++++++++------- inst/dev-guide/dg_tabulation.Rmd | 16 ++++++++++------ 3 files changed, 27 insertions(+), 20 deletions(-) diff --git a/inst/dev-guide/dg_debug_rtables.Rmd b/inst/dev-guide/dg_debug_rtables.Rmd index 36c5b97eb..59cdccda3 100644 --- a/inst/dev-guide/dg_debug_rtables.Rmd +++ b/inst/dev-guide/dg_debug_rtables.Rmd @@ -1,14 +1,14 @@ --- -title: "Debugging in `rtables` and beyond" +title: "Debugging in `rtables` and Beyond" author: "Davide Garolini" date: '`r Sys.Date()`' -output: rmarkdown::html_vignette -vignette: > - %\VignetteIndexEntry{rtables Advanced Usage} - %\VignetteEncoding{UTF-8} - %\VignetteEngine{knitr::rmarkdown} +output: + html_document: + theme: spacelab editor_options: chunk_output_type: console +knit: (function(inputFile, encoding) { + rmarkdown::render(inputFile, encoding = encoding, output_dir = ".")}) --- ```{r setup, include=FALSE} @@ -19,7 +19,7 @@ knitr::opts_chunk$set(echo = TRUE) Any code or prose which appears in a version of this vignette on the `main` branch of the repository may reflect a specific state of things that can be more or less recent. This guide reflects very important pieces of the split machinery that are unlikely to change. We anyway invite the reader to think always that the current code may have drifted from the following document, and it is always the best practice to read directly the code on `main`. -# Debugging in `rtables` and beyond +--- This is a not-so-short and not-comprehensive guide to debugging `rtables`. It is to be considered, though, of general validity and of personal use. diff --git a/inst/dev-guide/dg_split_machinery.Rmd b/inst/dev-guide/dg_split_machinery.Rmd index 46f687df2..c208b94ea 100644 --- a/inst/dev-guide/dg_split_machinery.Rmd +++ b/inst/dev-guide/dg_split_machinery.Rmd @@ -1,14 +1,17 @@ --- -title: "The Split Machinery" +title: "Split Machinery" author: "Davide Garolini" date: '`r Sys.Date()`' -output: rmarkdown::html_vignette -vignette: > - %\VignetteIndexEntry{rtables Advanced Usage} - %\VignetteEncoding{UTF-8} - %\VignetteEngine{knitr::rmarkdown} +output: + html_document: + theme: spacelab + toc: true + toc_float: + collapsed: false editor_options: chunk_output_type: console +knit: (function(inputFile, encoding) { + rmarkdown::render(inputFile, encoding = encoding, output_dir = ".")}) --- ```{r setup, include=FALSE} @@ -20,7 +23,7 @@ knitr::opts_chunk$set(echo = TRUE) Any code or prose which appears in a version of this vignette on the `main` branch of the repository may reflect a specific state of things that can be more or less recent. This guide reflects very important pieces of the split machinery that are unlikely to change. We anyway invite the reader to think always that the current code may have drifted from the following document, and it is always the best practice to read directly the code on `main`. -# The Split Machinery +## Introduction The scope of this vignette is understanding how `rtables` creates facets by splitting the incoming data into hierarchical groups that go from the root node to singular `rcell`s. The latter level, also called leaf-level, contains the final partition that is subjected to the analysis functions. More details from the user perspective can be found in relevant vignette `vignette("split_functions")` and function documentation like `?split_rows_by` and `?split_funcs`. diff --git a/inst/dev-guide/dg_tabulation.Rmd b/inst/dev-guide/dg_tabulation.Rmd index b72a5202a..cf18ef3b0 100644 --- a/inst/dev-guide/dg_tabulation.Rmd +++ b/inst/dev-guide/dg_tabulation.Rmd @@ -2,23 +2,27 @@ title: "Tabulation" author: "Davide Garolini" date: '`r Sys.Date()`' -output: rmarkdown::html_vignette -vignette: > - %\VignetteIndexEntry{rtables Advanced Usage} - %\VignetteEncoding{UTF-8} - %\VignetteEngine{knitr::rmarkdown} +output: + html_document: + theme: spacelab + toc: true + toc_float: + collapsed: false editor_options: chunk_output_type: console +knit: (function(inputFile, encoding) { + rmarkdown::render(inputFile, encoding = encoding, output_dir = ".")}) --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` + ## Disclaimer Any code or prose which appears in a version of this vignette on the `main` branch of the repository may reflect a specific state of things that can be more or less recent. This guide reflects very important pieces of the split machinery that are unlikely to change. We anyway invite the reader to think always that the current code may have drifted from the following document, and it is always the best practice to read directly the code on `main`. (xxx we should insert it automatically) -# Tabulation +## Introduction Tabulation in `rtables` is a process that takes a pre-defined layout and applies it to the data. The layout object, with all its splits (see `vignette("dg_split_machinery")`) and `analyze`s, can be applied to different data to produce valid tables. This process is happening principally inside the file `tt_dotabulation.R` and its principal user-facing function `build_table` that resides in it. We will sometimes see functions and methods that are present in other files like `colby_construction.R` or `make_subset_expr.R`. We assume any reader is already familiar with the documentation related to `build_table`. Also, we suggest reading first the vignette regarding the split machinery (`vignette("dg_split_machinery")`), as it is instrumental in understanding how the layout object, which is built principally of splits, is tabulated when data is applied. From 5031687e129f8efb460201b79fbda7b82ccd8615 Mon Sep 17 00:00:00 2001 From: Melkiades Date: Thu, 26 Oct 2023 15:32:39 +0200 Subject: [PATCH 33/40] kind of completing splits --- inst/dev-guide/dg_split_machinery.Rmd | 24 ++++++++++-------------- 1 file changed, 10 insertions(+), 14 deletions(-) diff --git a/inst/dev-guide/dg_split_machinery.Rmd b/inst/dev-guide/dg_split_machinery.Rmd index c208b94ea..cd42c7b3a 100644 --- a/inst/dev-guide/dg_split_machinery.Rmd +++ b/inst/dev-guide/dg_split_machinery.Rmd @@ -22,19 +22,19 @@ knitr::opts_chunk$set(echo = TRUE) Any code or prose which appears in a version of this vignette on the `main` branch of the repository may reflect a specific state of things that can be more or less recent. This guide reflects very important pieces of the split machinery that are unlikely to change. We anyway invite the reader to think always that the current code may have drifted from the following document, and it is always the best practice to read directly the code on `main`. +Please keep in mind that `rtables` is still under active development, and it has seen the efforts of multiple contributors across different years. Therefore, there may be legacy mechanisms and a couple of on-going transformations that could look different in the future. + +Being this a working document that may be subjected to both deprecation and updates, we keep `xxx` comments to indicate placeholders for TODOs and warnings that needs further work. ## Introduction The scope of this vignette is understanding how `rtables` creates facets by splitting the incoming data into hierarchical groups that go from the root node to singular `rcell`s. The latter level, also called leaf-level, contains the final partition that is subjected to the analysis functions. More details from the user perspective can be found in relevant vignette `vignette("split_functions")` and function documentation like `?split_rows_by` and `?split_funcs`. -The following vignette will describe how the split machinery works for the row domain. Further information on how columns are defined will follow soon. - -NB: we must remind the reader that `rtables` is still under active development, and it has seen the efforts of multiple contributors across different years. Therefore, there may be legacy mechanisms and a couple of on-going transformations that could look different in the future. - +The following vignette will describe how the split machinery works for the row domain. Further information on how columns will have a dedicated vignette. ## Process and Methods -As always, we encourage the reader to familiarize with `vignette("dg_debug_rtables")` before going further. This document has a general validity, even if it has been tailored to study and understand complex packages like `rtables`. +Beforehand, we encourage the reader to familiarize with `vignette("dg_debug_rtables")`. This document is generally valid for R programming, but has been tailored to study and understand complex packages that rely heavily on S3 and S4 object programming like `rtables`. Here, we explore and study the split machinery with a growing amount of complexity, always following relevant functions and methods throughout their execution. By going from basic to complex and by discussing important and special cases, we hope to be able to give you a good understanding of how the split machinery works. @@ -790,15 +790,11 @@ trace(what = "do_split", untrace(what = "do_split", where = asNamespace("rtables")) ``` -As we have demonstrated, all the above seems like impossible cases, and are to be -considered as vestigial and deprecated heritage. - -Final examples with `MultiVarSplit` & `CompoundSplit` -xxx CompoundSplit is used when you have an analyze with multiple variables or MultiVarSplit (nope it is a different thing, it is a single split that generates facets with different variables) makes a special AnalyzeMultiVars (inherits from CompoundSplit). AnalyzeMultiVars is for analyzing the same facets multiple times. MultiVarColSplit works with analyze_colvars, it is a different object. -.set_kids_sect_sep adds things between children (can be set from split) -xxx +As we have demonstrated, all the above seems like impossible cases, and are to be considered as vestigial and deprecated heritage. -xxx file issue for multiple analyze_colvars +### Final examples with `MultiVarSplit` & `CompoundSplit` +This final part of this chapter is still under construction, hence, the unspecific mentions and the to do list. +xxx `CompoundSplit` generates facets from one variable (e.g. cumulative distributions) while `MultiVarSplit` uses different variables for the split. See `AnalyzeMultiVars`, which inherits from `CompoundSplit` for more details on how it analyzes the same facets multiple times. `MultiVarColSplit` works with `analyze_colvars`, which is a different discussion. `.set_kids_sect_sep` adds things between children (can be set from split). Firstly, we want to see how `MultiVarSplit` class behaves for an example case taken from `?split_rows_by_multivar`. @@ -830,7 +826,7 @@ untrace(what = "do_split", If we print them out, we will notice that the two groups (one called "SEX" and the other "STRATA1") are identical along the columns. This is because no subgroup was actually created. This is an interesting way to personalize splits and with the help of custom split functions and their split context, also to have widely different subgroups in the table. -We grasp the occasion to explain that with `xxx` comments we indicate placeholders for TODOs and warnings that needs further work. We invite the reader to try to understand why `split_rows_by_multivar` can have other row splits under it (see `xxx` comment in the previous code), while `split_cols_by_multivar` does not. It is a known bug at the moment, and we would be pleased to have a fix. The issues are often linked in the code by their code number (e.g. `#690`). +We invite the reader to try to understand why `split_rows_by_multivar` can have other row splits under it (see `xxx` comment in the previous code), while `split_cols_by_multivar` does not. It is a known bug at the moment, and we would be pleased to have a fix. The issues are often linked in the code by their code number (e.g. `#690`). Lastly, we will briefly show an example of a split with cut function and how to replace it and solve the problem with empty age groups we had before. In the following, we propose the same simplified situation: From 32a73e277064af838f10edcb9ce21f460ec02aad Mon Sep 17 00:00:00 2001 From: Melkiades Date: Thu, 26 Oct 2023 15:53:59 +0200 Subject: [PATCH 34/40] fix debugging --- inst/dev-guide/dg_debug_rtables.Rmd | 92 ++++++++++++----------------- 1 file changed, 38 insertions(+), 54 deletions(-) diff --git a/inst/dev-guide/dg_debug_rtables.Rmd b/inst/dev-guide/dg_debug_rtables.Rmd index 59cdccda3..1cdd05030 100644 --- a/inst/dev-guide/dg_debug_rtables.Rmd +++ b/inst/dev-guide/dg_debug_rtables.Rmd @@ -15,82 +15,66 @@ knit: (function(inputFile, encoding) { knitr::opts_chunk$set(echo = TRUE) ``` -## Disclaimer +## Debugging -Any code or prose which appears in a version of this vignette on the `main` branch of the repository may reflect a specific state of things that can be more or less recent. This guide reflects very important pieces of the split machinery that are unlikely to change. We anyway invite the reader to think always that the current code may have drifted from the following document, and it is always the best practice to read directly the code on `main`. +This is a short and not-comprehensive guide to debugging `rtables`. It is to be considered, though, of general validity and of personal use. ---- - -This is a not-so-short and not-comprehensive guide to debugging `rtables`. It is to be considered, though, of general validity and of personal use. - --=-=-=- Notes from meeting -In general code must be in a way that: +#### Coding in Practice -> it is easy to read and find problems -> it is not clever, because it is impossible to debug -* Coding Error, code does not do what you intended -> Bug in the punch card -* Unexpected Input. Defensive programming FAIL FAST FAIL LOUD -> useful and not too consuming -* Bug in Dependency -> never use - -e.g. of FFFL -as close as possible to the source -* bad inputs should be found very early - - -Worst: silently giving incorrect results +#### Some Definitions +* __Coding Error__ - Code does not do what you intended -> Bug in the punch card +* __Unexpected Input__ - Defensive programming FAIL FAST FAIL LOUD (FFFL) -> useful and not too time consuming +* __Bug in Dependency__ -> never use dependencies if you can! -common things that we can catch early -missing values/column length == 0 or length > 1 +#### Considerations About FFFL +Errors should be as close as possible to the source. For example, bad inputs should be found very early. The worst possible example is a software that is silently giving incorrect results. Common things that we can catch early are missing values, column `length == 0`, or `length > 1`. -ROBUSTNESS: refuse to do stuff that can be very problematic - -WHAT TO DO +#### General Suggestions +* Robust code base does not attempt doing possibly problematic operations. * Read Error Messages -debugcall you can add the signature (formals) -trace is powerful because you can add the reaction) -tracer is very good, at where it happens - -options(error=recover) is the best way to debug -core tool when developing/debugging - -dump.frames and debugger -it saves it to a file or an object and then you call debugger to step in it -as you did recover. You save the global form +* `debugcall` you can add the signature (formals) +* `trace` is powerful because you can add the reaction +* `tracer` is very good and precise to find where it happens -warn global option -<0 ignored -0 top level function call -1 immediately as they occur ->=2 throws errors +`options(error = recover)` is one of the best tools to debug at it is a core tool when developing that allows you to step into any point of the function call sequence. -<<- for recover or debugger gives it to the global environment +`dump.frames` and `debugger`: it saves it to a file or an object and then you call debugger to step in it +as you did recover. -lo-fi debugging -PRINT / CAT -print position of a function you get to then you get to a point where it breaks +#### `warn` Global Option +- `<0` ignored +- `0` top level function call +- `1` immediately as they occur +- `>=2` throws errors -comment blocks -> does not work with pipes (use identity() it is a step -that does nothing but does not break the pipes) +`<<-` for `recover` or `debugger` gives it to the global environment -browser bombing +#### lo-fi debugging +* PRINT / CAT is always a low level debugging that can be used. It is helpful for server jobs where maybe only terminal or console output is available and no `browser()` can be used. For example, you can print the position or state of a function at a certain point untill you find the break point. +* comment blocks -> does not work with pipes (you can use `identity()` it is a step that does nothing but does not break the pipes) +* `browser()` bombing -regression tests -almost every bug should become a regression test +#### Regression Tests +Almost every bug should become a regression test. -debugging with pipes --> pipes are better to write code but horrible to debug +#### Debugging with Pipes +* Pipes are better to write code but horrible to debug +* T in pipe `%T>%` does print it midway +* `debug_pipe()` -> it is like the T pipe going into browser() -T pipe %T>% does print it midway +#### Shiny Debugging +More difficult due to reactivity. -debug_pipe -> it is like the T pipe going into browser() - -Shiny debugging is more difficult due to reactivity +#### General Suggestion DO NOT BE CLEVER WITH CODE - ONLY IF YOU HAVE TO, CLEVER IS ALSO SUBJECTIVE AND IT WILL CHANGE WITH TIME --=-=-=- +## Debugging in `rtables` + We invite the smart developer to use the provided examples as a way to get an "interactive" and dynamic vision of the internal algorithms, as they are routinely executed when constructing tables with `rtables`. This is achieved by using `browser()` and `debugonce()` on internal and exported functions (`rtables:::` or `rtables::`), as we will see in a moment. We invite also the continuous and autonomous exploration of the multiple `S3` and `S4` objects that constitute the complexity and power of `rtables`. To do so we will use the following useful functions: * `methods(generic_function)`: This function lists the methods that are available for a generic function. Specifically for `S4` generic functions, `showMethods(generic_function)` is giving a more detailed information about each method (e.g. inheritance). From 3ce712346f197948e5542d7aee79d4fc0cff926a Mon Sep 17 00:00:00 2001 From: Emily de la Rua Date: Thu, 26 Oct 2023 11:17:54 -0400 Subject: [PATCH 35/40] Remove from pkgdown config - fix checks --- _pkgdown.yml | 7 ------- 1 file changed, 7 deletions(-) diff --git a/_pkgdown.yml b/_pkgdown.yml index 18934e0cc..2c5f1eee7 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -46,13 +46,6 @@ articles: - manual_table_construction - tabulation_dplyr - - title: For Developers - desc: Vignettes aimed at package developers - contents: - - dg_debug_rtables - - dg_split_machinery - - dg_tabulation - reference: - title: Argument Conventions desc: The following dummy functions are unexported and used to document argument conventions in the framework. From c66501e40d9e249200d6abeb698bbff17e73cc11 Mon Sep 17 00:00:00 2001 From: Emily de la Rua Date: Tue, 31 Oct 2023 17:47:33 -0400 Subject: [PATCH 36/40] Split machinery - grammar fixes, adding links, applying styler --- inst/dev-guide/dg_split_machinery.Rmd | 1020 +++++++++++++------------ 1 file changed, 544 insertions(+), 476 deletions(-) diff --git a/inst/dev-guide/dg_split_machinery.Rmd b/inst/dev-guide/dg_split_machinery.Rmd index cd42c7b3a..f3c1a0ce2 100644 --- a/inst/dev-guide/dg_split_machinery.Rmd +++ b/inst/dev-guide/dg_split_machinery.Rmd @@ -20,42 +20,43 @@ knitr::opts_chunk$set(echo = TRUE) ## Disclaimer -Any code or prose which appears in a version of this vignette on the `main` branch of the repository may reflect a specific state of things that can be more or less recent. This guide reflects very important pieces of the split machinery that are unlikely to change. We anyway invite the reader to think always that the current code may have drifted from the following document, and it is always the best practice to read directly the code on `main`. +This article is intended for use by developers only and will contain low-level explanations of the topics covered. For user-friendly vignettes, please see the [Articles](https://insightsengineering.github.io/rtables/main/articles/index.html) section on the `rtables` website. -Please keep in mind that `rtables` is still under active development, and it has seen the efforts of multiple contributors across different years. Therefore, there may be legacy mechanisms and a couple of on-going transformations that could look different in the future. +Any code or prose which appears in the version of this article on the `main` branch of the repository may reflect a specific state of things that can be more or less recent. This guide describes very important pieces of the split machinery that are unlikely to change. Regardless, we invite the reader to keep in mind that the current repository code may have drifted from the following material in this document, and it is always the best practice to read the code directly on `main`. -Being this a working document that may be subjected to both deprecation and updates, we keep `xxx` comments to indicate placeholders for TODOs and warnings that needs further work. +Please keep in mind that `rtables` is still under active development, and it has seen the efforts of multiple contributors across different years. Therefore, there may be legacy mechanisms and ongoing transformations that could look different in the future. + +Being that this a working document that may be subjected to both deprecation and updates, we keep `xxx` comments to indicate placeholders for warnings and to-do's that need further work. ## Introduction -The scope of this vignette is understanding how `rtables` creates facets by splitting the incoming data into hierarchical groups that go from the root node to singular `rcell`s. The latter level, also called leaf-level, contains the final partition that is subjected to the analysis functions. More details from the user perspective can be found in relevant vignette `vignette("split_functions")` and function documentation like `?split_rows_by` and `?split_funcs`. +The scope of this article is understanding how `rtables` creates facets by splitting the incoming data into hierarchical groups that go from the root node to singular `rcell`s. The latter level, also called the leaf-level, contains the final partition that is subjected to analysis functions. More details from the user perspective can be found in the [Split Functions vignette](https://insightsengineering.github.io/rtables/main/articles/split_functions.html) and in function documentation like `?split_rows_by` and `?split_funcs`. -The following vignette will describe how the split machinery works for the row domain. Further information on how columns will have a dedicated vignette. +The following article will describe how the split machinery works in the row domain. Further information on how the split machinery works in the column domain will be covered in a separate article. ## Process and Methods -Beforehand, we encourage the reader to familiarize with `vignette("dg_debug_rtables")`. This document is generally valid for R programming, but has been tailored to study and understand complex packages that rely heavily on S3 and S4 object programming like `rtables`. - -Here, we explore and study the split machinery with a growing amount of complexity, always following relevant functions and methods throughout their execution. By going from basic to complex and by discussing important and special cases, we hope to be able to give you a good understanding of how the split machinery works. +Beforehand, we encourage the reader to familiarize themselves with the Debugging in `rtables`(xxx link here) article from the `rtables` Developers Guide. This document is generally valid for R programming, but has been tailored to study and understand complex packages that rely heavily on S3 and S4 object programming like `rtables`. -In practice, the majority of the split engine resides in the source file `R/split_funs.R` with occasional incursion into `R/make_split_fun.R` for custom split function creation and rarer references to other more general tabulation files. +Here, we explore and study the split machinery with a growing amount of complexity, following relevant functions and methods throughout their execution. By going from basic to complex and by discussing important and special cases, we hope to be able to give you a good understanding of how the split machinery works. +In practice, the majority of the split engine resides in the source file `R/split_funs.R`, with occasional incursion into `R/make_split_fun.R` for custom split function creation and rarer references to other more general tabulation files. ## `do_split` -The split machinery is so fundamental to `rtables` that relevant functions like `do_split` are executed, even when no split is requested. The following example shows how we can enter `do_split` and start understanding the class hierarchy and the main split engine. +The split machinery is so fundamental to `rtables` that relevant functions like `do_split` are executed even when no split is requested. The following example shows how we can enter `do_split` and start understanding the class hierarchy and the main split engine. ```{r, message=FALSE} library(rtables) # debugonce(rtables:::do_split) # Uncomment me to enter the function!!! basic_table() %>% - build_table(DM) + build_table(DM) ``` -In the following, we copied it so to allow the reader to go through the general structure with its enhanced comments and sections. Each section in the code reflects roughly a section of this vignette. +In the following code, we copied the `do_split` function code to allow the reader to go through the general structure with enhanced comments and sections. Each section in the code reflects roughly one section of this article. ```{r, eval=FALSE} -# rtables 6.0.2 +# rtables 0.6.2 ### NB This is called at EACH level of recursive splitting do_split <- function(spl, df, @@ -63,75 +64,83 @@ do_split <- function(spl, labels = NULL, trim = FALSE, spl_context) { -# CHECKS # - ## This will error if, e.g., df does not have columns - ## required by spl, or generally any time the split (spl) - ## can not be applied to df - check_validsplit(spl, df) - -# SPLIT FUNCTION # - ## In special cases, we need to partition data (split) - ## in a very specific way, e.g. depending on the data or - ## external values. These can be achieved by using a custom - ## split function. - - ## note the <- here!!! - if(!is.null(splfun <- split_fun(spl))) { - ## Currently the contract is that split_functions take df, vals, labels and - ## return list(values=., datasplit=., labels = .), optionally with - ## an additional extras element - if(func_takes(splfun, ".spl_context")) { - ret <- tryCatch(splfun(df, spl, vals, labels, trim = trim, - .spl_context = spl_context), - error = function(e) e) ## rawvalues(spl_context )) - } else { - ret <- tryCatch(splfun(df, spl, vals, labels, trim = trim), - error = function(e) e) - } - if(is(ret, "error")) { - stop("Error applying custom split function: ", ret$message, "\n\tsplit: ", - class(spl), " (", payloadmsg(spl), ")\n", - "\toccured at path: ", - spl_context_to_disp_path(spl_context), "\n") - } +# - CHECKS - # + ## This will error if, e.g., df does not have columns + ## required by spl, or generally any time the split (spl) + ## can not be applied to df + check_validsplit(spl, df) + +# - SPLIT FUNCTION - # + ## In special cases, we need to partition data (split) + ## in a very specific way, e.g. depending on the data or + ## external values. These can be achieved by using a custom + ## split function. + + ## note the <- here!!! + if (!is.null(splfun <- split_fun(spl))) { + ## Currently split functions take df, vals, labels and + ## return list(values = ..., datasplit = ..., labels = ...), + ## with an optional additional 'extras' element + if (func_takes(splfun, ".spl_context")) { + ret <- tryCatch( + splfun(df, spl, vals, labels, + trim = trim, + .spl_context = spl_context + ), + error = function(e) e + ) ## rawvalues(spl_context)) } else { -# .apply_split_inner # - ## This is called when no split function is provided. Please note that also when provided, - ## this function will be probably called, as far as the main splitting method is not willingly - ## modified by the split function. - ret <- .apply_split_inner(df = df, spl = spl, vals = vals, labels = labels, trim = trim) + ret <- tryCatch(splfun(df, spl, vals, labels, trim = trim), + error = function(e) e + ) } - -# EXTRA # - ## this adds .ref_full and .in_ref_col - if(is(spl, "VarLevWBaselineSplit")) - ret <- .add_ref_extras(spl, df, ret) - -# FIXUPVALS # - ## this: - ## - guarantees that ret$values contains SplitValue objects - ## - removes the extras element since its redundant after the above - ## - Ensures datasplit and values lists are named according to labels - ## - ensures labels are character not factor - ret <- .fixupvals(ret) - -# RETURN # - ret + if (is(ret, "error")) { + stop( + "Error applying custom split function: ", ret$message, "\n\tsplit: ", + class(spl), " (", payloadmsg(spl), ")\n", + "\toccured at path: ", + spl_context_to_disp_path(spl_context), "\n" + ) + } + } else { +# - .apply_split_inner - # + ## This is called when no split function is provided. Please note that this function + ## will also probably be called when the split function is provided, as long as the + ## main splitting method is not willingly modified by the split function. + ret <- .apply_split_inner(df = df, spl = spl, vals = vals, labels = labels, trim = trim) + } + +# - EXTRA - # + ## this adds .ref_full and .in_ref_col + if (is(spl, "VarLevWBaselineSplit")) { + ret <- .add_ref_extras(spl, df, ret) + } + +# - FIXUPVALS - # + ## This: + ## - guarantees that ret$values contains SplitValue objects + ## - removes the extras element since its redundant after the above + ## - ensures datasplit and values lists are named according to labels + ## - ensures labels are character not factor + ret <- .fixupvals(ret) + +# - RETURN - # + ret } ``` -We will see how input parameters are used and where. The most important ones are `spl` and `df`: the split objects and the input `data.frame`. +We will see where and how input parameters are used. The most important parameters are `spl` and `df` - the split objects and the input `data.frame`, respectively. ### Checks and Classes -We will start by looking at the first function called from `do_split`. This may give us a good overview of how the split itself is defined. This is, of course, the check-function (`check_validsplit`) that is used to verify if the split is valid for the data. In the following we will describe step-by-step the split-class hierarchy, but we invite the reader to explore this autonomously in future occasions. +We will start by looking at the first function called from `do_split`. This will give us a good overview of how the split itself is defined. This function is, of course, the check function (`check_validsplit`) that is used to verify if the split is valid for the data. In the following we will describe the split-class hierarchy step-by-step, but we invite the reader to explore this further on their own as well. -Lets then search the package for `check_validsplit`, you will find that it is defined as a generic in `R/split_funs.R`, where it is applied to the following "split" classes: `VarLevelSplit`, `MultiVarSplit`, `VAnalyzeSplit`, `CompoundSplit`, and `Split`. Another way to find this information, which is more useful for more spread out and complicated objects, is using `showMethods(check_validsplit)`. The virtual class `VAnalyzeSplit` (convention: it starts with "V") defines the main parent of analysis split which we discuss in detail in related vignette `vignette()` (xxx). From this, we can intuit that the `analyze()` calls actually mimic split objects as they create different results under a specific final split (or node). Now, notice that `check_validsplit` is also called in another location, i.e. in the main `R/tt_dotabulation.R` source file. This is again something related to making the "analyze" rows as it mainly checks for `VAnalyzeSplit` (link to tabulation dev guide xxx). We will discuss the other classes when they will appear in our examples (link to class hierarchy xxx). +Let's first search the package for `check_validsplit`. You will find that it is defined as a generic in `R/split_funs.R`, where it is applied to the following "split" classes: `VarLevelSplit`, `MultiVarSplit`, `VAnalyzeSplit`, `CompoundSplit`, and `Split`. Another way to find this information, which is more useful for more spread out and complicated objects, is by using `showMethods(check_validsplit)`. The virtual class `VAnalyzeSplit` (by convention virtual classes start with "V") defines the main parent of the analysis split which we discuss in detail in the related vignette `vignette()` (xxx). From this, we can see that the `analyze()` calls actually mimic split objects as they create different results under a specific final split (or node). Now, notice that `check_validsplit` is also called in another location, the main `R/tt_dotabulation.R` source file. This is again something related to making "analyze" rows as it mainly checks for `VAnalyzeSplit` (link to tabulation dev guide xxx). We will discuss the other classes as they appear in our examples (link to class hierarchy xxx). -For the moment, we see with `class(spl)` (from the main `do_split` function) that we are dealing with an `AllSplit` object. By calling `showMethods(check_validsplit)` we will produce the following: +For the moment, we see with `class(spl)` (from the main `do_split` function) that we are dealing with an `AllSplit` object. By calling `showMethods(check_validsplit)` we produce the following: ``` -# rtables 6.0.2 +# rtables 0.6.2 Function: check_validsplit (package rtables) spl="AllSplit" (inherited from: spl="Split") @@ -141,55 +150,59 @@ spl="Split" spl="VAnalyzeSplit" spl="VarLevelSplit" ``` -It means that each of the listed classes has a dedicated definition of `check_validsplit` that may largely differ from the others. Only the class `AllSplit` does not have its own function definition as it is inherited from the `Split` class. Therefore, we understand that `AllSplit` is a class parent of `Split`. This is one of the first definition of a virtual class in the package and it is the only one that does not present the "V" prefix. Any of these classes are defined along with their constructor in `R/00tabletrees.R`. Reading how `AllSplit` is structured can be an useful example to understand how split objects are expected to work. Please see the comments in the following: + +This means that each of the listed classes has a dedicated definition of `check_validsplit` that may largely differ from the others. Only the class `AllSplit` does not have its own function definition as it is inherited from the `Split` class. Therefore, we understand that `AllSplit` is a parent class of `Split`. This is one of the first definitions of a virtual class in the package and it is the only one that does not include the "V" prefix. These classes are defined along with their constructors in `R/00tabletrees.R`. Reading about how `AllSplit` is structured can be useful in understanding how split objects are expected to work. Please see the comments in the following: ```{r, eval=FALSE} -# rtables 6.0.2 +# rtables 0.6.2 setClass("AllSplit", contains = "Split") AllSplit <- function(split_label = "", - cfun = NULL, - cformat = NULL, - cna_str = NA_character_, - split_format = NULL, - split_na_str = NA_character_, - split_name = NULL, - extra_args = list(), - indent_mod = 0L, - cindent_mod = 0L, - cvar = "", - cextra_args = list(), - ...) { - if(is.null(split_name)) { # If the split has no name - if(nzchar(split_label)) # (std is "") - split_name <- split_label - else - split_name <- "all obs" # Nor label, a standard split with all - # observations is assigned. + cfun = NULL, + cformat = NULL, + cna_str = NA_character_, + split_format = NULL, + split_na_str = NA_character_, + split_name = NULL, + extra_args = list(), + indent_mod = 0L, + cindent_mod = 0L, + cvar = "", + cextra_args = list(), + ...) { + if (is.null(split_name)) { # If the split has no name + if (nzchar(split_label)) { # (std is "") + split_name <- split_label + } else { + split_name <- "all obs" # No label, a standard split with all + # observations is assigned. } - new("AllSplit", split_label = split_label, - content_fun = cfun, - content_format = cformat, - content_na_str = cna_str, - split_format = split_format, - split_na_str = split_na_str, - name = split_name, - label_children = FALSE, - extra_args = extra_args, - indent_modifier = as.integer(indent_mod), - content_indent_modifier = as.integer(cindent_mod), - content_var = cvar, - split_label_position = "hidden", - content_extra_args = cextra_args, - page_title_prefix = NA_character_, - child_section_div = NA_character_) + } + new("AllSplit", + split_label = split_label, + content_fun = cfun, + content_format = cformat, + content_na_str = cna_str, + split_format = split_format, + split_na_str = split_na_str, + name = split_name, + label_children = FALSE, + extra_args = extra_args, + indent_modifier = as.integer(indent_mod), + content_indent_modifier = as.integer(cindent_mod), + content_var = cvar, + split_label_position = "hidden", + content_extra_args = cextra_args, + page_title_prefix = NA_character_, + child_section_div = NA_character_ + ) } ``` -We can see also print this information by calling `getClass("AllSplit")` for the general slot definition, or by calling `getClass(spl)` for having also all the values. Note that the first call will give also a lot of information about the class hierarchy. For more information regarding class hierarchy, please refer to the relevant vignette (xxx). We will discuss the majority of the slots by the end of this document. Now lets see if we can find some of the values described in the constructor in our object. To do so we will show here the more compact representation given by `str`. When there are multiple and hierarchical slots that contain objects themselves, calling `str` will be much less informative or not at all informative if the maximum level of nesting is not set (e.g. `max.level = 2`). +We can also print this information by calling `getClass("AllSplit")` for the general slot definition, or by calling `getClass(spl)`. Note that the first call will give also a lot of information about the class hierarchy. For more information regarding class hierarchy, please refer to the relevant article (xxx). We will discuss the majority of the slots by the end of this document. Now, let's see if we can find some of the values described in the constructor within our object. To do so, we will show the more compact representation given by `str`. When there are multiple and hierarchical slots that contain objects themselves, calling `str` will be much less or not at all informative if the maximum level of nesting is not set (e.g. `max.level = 2`). ```{r, eval=FALSE} -# rtables 6.0.2 +# rtables 0.6.2 Browse[2]> str(spl, max.level = 2) Formal class 'AllSplit' [package "rtables"] with 17 slots ..@ payload : NULL @@ -211,100 +224,106 @@ Formal class 'AllSplit' [package "rtables"] with 17 slots ..@ child_section_div : chr NA ``` -Details about these slots will be necessary in future examples, and we will deal with them at that time. Now, we gave you a hint of the complex class hierarchy that makes up `rtables`, and how to explore it autonomously. Lets go forward in `do_split`. In our case, being `AllSplit` inherited from `Split`, we are sure that the called function will be the following (read the comment!): +Details about these slots will become necessary in future examples, and we will deal with them at that time. Now, we gave you a hint of the complex class hierarchy that makes up `rtables`, and how to explore it autonomously. Let's go forward in `do_split`. In our case, with `AllSplit` inherited from `Split`, we are sure that the called function will be the following (read the comment!): ```{r, eval=FALSE} -# rtables 6.0.2 -## default does nothing, add methods as they become -## required -setMethod("check_validsplit", "Split", - function(spl, df) - invisible(NULL)) +# rtables 0.6.2 +## Default does nothing, add methods as they become required +setMethod( + "check_validsplit", "Split", + function(spl, df) invisible(NULL) +) ``` -### Split function and `.apply_split_inner` +### Split Functions and `.apply_split_inner` -Before diving into custom split functions we need to take a moment to analyze how `.apply_split_inner` works. This function is routinely called, also in the case we do have a split function. Lets see why this can be the case by entering it with `debugonce(.apply_split_inner)`. Of course, we are still browsing `do_split` in debug mode from the first example. We printed and commented it in the following: +Before diving into custom split functions, we need to take a moment to analyze how `.apply_split_inner` works. This function is routinely called whether or not we have a split function. Let's see why this is the case by entering it with `debugonce(.apply_split_inner)`. Of course, we are still currently browsing within `do_split` in debug mode from the first example. We print and comment on the function in the following: ```{r, eval=FALSE} -# rtables 6.0.2 +# rtables 0.6.2 .apply_split_inner <- function(spl, df, vals = NULL, labels = NULL, trim = FALSE) { - # - INPUTS - # - # In this case .applysplit_rawvals will attempt finding the split values if vals is NULL. - # Please notice that they might be a non-mutually exclusive set or subset of elements that - # will constitute the split. - - # - SPLIT VALS - # - ## try to calculate values first. Most of the time we can - if(is.null(vals)) - vals <- .applysplit_rawvals(spl, df) - - # - EXTRA PARAMETERS - # - # This call extracts extra parameters from the split, according to the split values - extr <- .applysplit_extras(spl, df, vals) - - # If there are no values to do the split upon, we return an empty final split - if(is.null(vals)) { - return(list(values = list(), - datasplit = list(), - labels = list(), - extras = list())) + # - INPUTS - # + # In this case .applysplit_rawvals will attempt to find the split values if vals is NULL. + # Please notice that there may be a non-mutually exclusive set or subset of elements that + # will constitute the split. + + # - SPLIT VALS - # + ## Try to calculate values first - most of the time we can + if (is.null(vals)) { + vals <- .applysplit_rawvals(spl, df) + } + + # - EXTRA PARAMETERS - # + # This call extracts extra parameters from the split, according to the split values + extr <- .applysplit_extras(spl, df, vals) + + # If there are no values to do the split upon, we return an empty final split + if (is.null(vals)) { + return(list( + values = list(), + datasplit = list(), + labels = list(), + extras = list() + )) + } + + # - DATA SUBSETTING - # + dpart <- .applysplit_datapart(spl, df, vals) + + # - LABEL RETRIEVAL - # + if (is.null(labels)) { + labels <- .applysplit_partlabels(spl, df, vals, labels) + } else { + stopifnot(names(labels) == names(vals)) + } + + # - TRIM - # + ## Get rid of columns that would not have any observations, + ## but only if there were any rows to start with - if not + ## we're in a manually constructed table column tree + if (trim) { + hasdata <- sapply(dpart, function(x) nrow(x) > 0) + if (nrow(df) > 0 && length(dpart) > sum(hasdata)) { # some empties + dpart <- dpart[hasdata] + vals <- vals[hasdata] + extr <- extr[hasdata] + labels <- labels[hasdata] } - - # - DATA SUBSETTING - # - dpart <- .applysplit_datapart(spl, df, vals) - - # - LABEL RETRIEVAL - # - if(is.null(labels)) - labels <- .applysplit_partlabels(spl, df, vals, labels) - else - stopifnot(names(labels) == names(vals)) - - # - TRIM - # - ## get rid of columns that would not have any - ## observations. - ## - ## But only if there were any rows to start with - ## if not we're in a manually constructed table - ## column tree - if(trim) { - hasdata <- sapply(dpart, function(x) nrow(x) > 0) - if(nrow(df) > 0 && length(dpart) > sum(hasdata)) { #some empties - dpart <- dpart[hasdata] - vals <- vals[hasdata] - extr <- extr[hasdata] - labels <- labels[hasdata] - } - } - - # - ORDER RESULTS - # - # Finds relevant order depending on spl_child_order() - if(is.null(spl_child_order(spl)) || is(spl, "AllSplit")) { - vord <- seq_along(vals) - } else { - vord <- match(spl_child_order(spl), - vals) - vord <- vord[!is.na(vord)] - } - - - ## FIXME: should be an S4 object, not a list - ret <- list(values = vals[vord], - datasplit = dpart[vord], - labels = labels[vord], - extras = extr[vord]) - ret + } + + # - ORDER RESULTS - # + # Finds relevant order depending on spl_child_order() + if (is.null(spl_child_order(spl)) || is(spl, "AllSplit")) { + vord <- seq_along(vals) + } else { + vord <- match( + spl_child_order(spl), + vals + ) + vord <- vord[!is.na(vord)] + } + + ## FIXME: should be an S4 object, not a list + ret <- list( + values = vals[vord], + datasplit = dpart[vord], + labels = labels[vord], + extras = extr[vord] + ) + ret } ``` -After reading `.apply_split_inner`, we see that there are some fundamental functions, defined strictly for internal use (convention: they start with ".") that are generics and depend on the kind of split in input. `R/split_funs.R` is very kind and group their generic definition at the beginning of the file. These functions are the main dispatcher for the majority of the split machinery. This is a clear example that shows how using `S4` logic helps clarity and flexibility in programming, allowing for easy extension of the program. For compactness we show also the `showMethods` result for each generic. +After reading through `.apply_split_inner`, we see that there are some fundamental functions - defined strictly for internal use (by convention they start with ".") - that are generics and depend on the kind of split in input. `R/split_funs.R` is very kind and groups generic definitions at the beginning of the file. These functions are the main dispatchers for the majority of the split machinery. This is a clear example that shows how using `S4` logic enables better clarity and flexibility in programming, allowing for easy extension of the program. For compactness we also show the `showMethods` result for each generic. ```{r, eval=FALSE} -# rtables 6.0.2 +# rtables 0.6.2 # Retrieves the values that will constitute the splits (facets), not necessarily a unique list. -# They could come from the data cuts for example -> it can be anything if it produces a set of strings. -setGeneric(".applysplit_rawvals", - function(spl, df) standardGeneric(".applysplit_rawvals")) +# They could come from the data cuts for example -> it can be anything that produces a set of strings. +setGeneric( + ".applysplit_rawvals", + function(spl, df) standardGeneric(".applysplit_rawvals") +) # Browse[2]> showMethods(.applysplit_rawvals) # Function: .applysplit_rawvals (package rtables) # spl="AllSplit" @@ -315,25 +334,31 @@ setGeneric(".applysplit_rawvals", # spl="VarStaticCutSplit" # Nothing here is inherited from the virtual class Split!!! -# Contains the subset of the data (default, but these can overlap, can also NOT be mutually exclusive). -setGeneric(".applysplit_datapart", - function(spl, df, vals) standardGeneric(".applysplit_datapart")) +# Contains the subset of the data (default, but these can overlap and can also NOT be mutually exclusive). +setGeneric( + ".applysplit_datapart", + function(spl, df, vals) standardGeneric(".applysplit_datapart") +) # Same as .applysplit_rawvals # Extract the extra parameter for the split -setGeneric(".applysplit_extras", - function(spl, df, vals) standardGeneric(".applysplit_extras")) +setGeneric( + ".applysplit_extras", + function(spl, df, vals) standardGeneric(".applysplit_extras") +) # Browse[2]> showMethods(.applysplit_extras) # Function: .applysplit_extras (package rtables) # spl="AllSplit" # (inherited from: spl="Split") # spl="Split" -# This means there is only a function for the virtual class Split. -# So all splits behaves the same!!! +# This means there is only a function for the virtual class Split. +# So all splits behave the same!!! # Split label retrieval and assignment if visible. -setGeneric(".applysplit_partlabels", - function(spl, df, vals, labels) standardGeneric(".applysplit_partlabels")) +setGeneric( + ".applysplit_partlabels", + function(spl, df, vals, labels) standardGeneric(".applysplit_partlabels") +) # Browse[2]> showMethods(.applysplit_partlabels) # Function: .applysplit_partlabels (package rtables) # spl="AllSplit" @@ -342,32 +367,35 @@ setGeneric(".applysplit_partlabels", # spl="Split" # spl="VarLevelSplit" -setGeneric("check_validsplit", # our friend - function(spl, df) standardGeneric("check_validsplit")) -# Note: check_validsplit is an internal function but it is not excluded that one -# day it will be exported. That is way it does not have the "." prefix. - -setGeneric(".applysplit_ref_vals", - function(spl, df, vals) standardGeneric(".applysplit_ref_vals")) +setGeneric( + "check_validsplit", # our friend + function(spl, df) standardGeneric("check_validsplit") +) +# Note: check_validsplit is an internal function but may one day be exported. +# This is why it does not have the "." prefix. + +setGeneric( + ".applysplit_ref_vals", + function(spl, df, vals) standardGeneric(".applysplit_ref_vals") +) # Browse[2]> showMethods(.applysplit_ref_vals) # Function: .applysplit_ref_vals (package rtables) # spl="Split" # spl="VarLevWBaselineSplit" - ``` -Now, we know that `.applysplit_extras` is the function that will be called first because we did not specify any `vals` and it is therefore `NULL`. This is a generic function as it can be seen by `showMethod(.applysplit_extras)`. It is indeed an `S4` generics and its source code can be determined by the following: +Now, we know that `.applysplit_extras` is the function that will be called first. This is because we did not specify any `vals` and it is therefore `NULL`. This is an `S4` generic function as can be seen by `showMethod(.applysplit_extras)`, and its definition can be seen in the following: ```{r, eval=FALSE} -# rtables 6.0.2 +# rtables 0.6.2 Browse[3]> getMethod(".applysplit_rawvals", "AllSplit") Method Definition: -function (spl, df) +function (spl, df) obj_name(spl) Signatures: - spl + spl target "AllSplit" defined "AllSplit" @@ -379,51 +407,51 @@ Browse[3]> obj_name(spl) Browse[3]> getMethod("obj_name", "Split") Method Definition: -function (obj) +function (obj) obj@name ##### Slot that we could see from str(spl, max.level = 2) Signatures: - obj + obj target "Split" defined "Split" ``` -Then we have `.applysplit_extras` that will be covered in later sections and simply extracts the extra arguments from the split objects and assign them to their relative split values. If no split values are still available, the function will exit here with an empty split. Otherwise the data will be divided in different splits or data subsets (facets) with `.applysplit_datapart`. In our current example the resulting list comprises the whole input data set (i.e. do `getMethod(".applysplit_datapart", "AllSplit")` and a list will be evident: `function (spl, df, vals) list(df)`). +Then we have `.applysplit_extras`, which simply extracts the extra arguments from the split objects and assigns them to their relative split values. This function will be covered in more detail in a later section. If still no split values are available, the function will exit here with an empty split. Otherwise, the data will be divided into different splits or data subsets (facets) with `.applysplit_datapart`. In our current example, the resulting list comprises the whole input dataset (do `getMethod(".applysplit_datapart", "AllSplit")` and the list will be evident: `function (spl, df, vals) list(df)`). -Next, split labels are checked. If they are not present split values (`vals`) will be used with `.applysplit_partlabels` that, in the case of it being applied to a `Split` object, it translates into `as.character(vals)`. Otherwise, the inserted labels are checked against the name of split values. +Next, split labels are checked. If they are not present, split values (`vals`) will be used with `.applysplit_partlabels`, transformed into `as.character(vals)` if applied to a `Split` object. Otherwise, the inserted labels are checked against the names of split values. -Lastly, the split values are ordered on the basis of `spl_child_order`. In our case, which concerns the general `AllSplit`, the sorting will not happen, i.e. it will be simply dependent on the number of split values `seq_along(vals)`. +Lastly, the split values are ordered according to `spl_child_order`. In our case, which concerns the general `AllSplit`, the sorting will not happen, i.e. it will be dependent simply on the number of split values (`seq_along(vals)`). -#### A simple split +## A Simple Split -In the following, we demonstrate how row splits work according to the features that we have already described. We add two splits and see how `do_split` behavior changes. Note that if we do not add an `analyze` call, the split will behave as before, giving an empty table with all observations. As default, calling `analyze` on a variable will produce a mean for each data subset that has been generated by the splits. We want to go beyond the first call of `do_split` that is by design on all observation with the purpose of generating the root split that contains all data and all the splits (indeed `AllSplit`). To achieve this goal we can use `debug(rtables:::do_split)` instead of `debugonce(rtables:::do_split)` as we will need to step in each of the splits. Alternatively, it is possible to use the more powerful `trace` function to enter specifically in the case the input is from a specific class. To do so the following can be used: `trace("do_split", quote(if(!is(spl, "AllSplit")) browser()), where = asNamespace("rtables"))`. Note that we had to specify the namespace with where. Multiple tracer elements can be added with `expression(E1, E2)` which is the same as `c(quote(E1), quote(E2))`. Specific steps can be specified with the `at` parameter. Remember to do `untrace("do_split", quote(if(!is(spl, "AllSplit")) browser()), where = asNamespace("rtables"))` to remove it. +In the following, we demonstrate how row splits work using the features that we have already described. We will add two splits and see how the behavior of `do_split` changes. Note that if we do not add an `analyze` call the split will behave as before, giving an empty table with all observations. By default, calling `analyze` on a variable will calculate the mean for each data subset that has been generated by the splits. We want to go beyond the first call of `do_split` that is by design applied on all observations, with the purpose of generating the root split that contains all data and all splits (indeed `AllSplit`). To achieve this we use `debug(rtables:::do_split)` instead of `debugonce(rtables:::do_split)` as we will need to step into each of the splits. Alternatively, it is possible to use the more powerful `trace` function to enter in cases where input is from a specific class. To do so, the following can be used: `trace("do_split", quote(if(!is(spl, "AllSplit")) browser()), where = asNamespace("rtables"))`. Note that we specify the namespace with `where`. Multiple tracer elements can be added with `expression(E1, E2)`, which is the same as `c(quote(E1), quote(E2))`. Specific _steps_ can be specified with the `at` parameter. Remember to call `untrace("do_split", quote(if(!is(spl, "AllSplit")) browser()), where = asNamespace("rtables"))` once finished to remove the trace. ```{r, message=FALSE} -# rtables 6.0.2 +# rtables 0.6.2 library(rtables) library(dplyr) # This filter is added to avoid having too many calls to do_split -DM_tmp <- DM %>% - filter(ARM %in% names(table(DM$ARM)[1:2])) %>% # limit to two - filter(SEX %in% c("M", "F")) %>% # limit to two - mutate(SEX = factor(SEX), ARM = factor(ARM)) # to drop unattended levels +DM_tmp <- DM %>% + filter(ARM %in% names(table(DM$ARM)[1:2])) %>% # limit to two + filter(SEX %in% c("M", "F")) %>% # limit to two + mutate(SEX = factor(SEX), ARM = factor(ARM)) # to drop unused levels # debug(rtables:::do_split) lyt <- basic_table() %>% - split_rows_by("ARM") %>% - split_rows_by("SEX") %>% - analyze("BMRKR1") # analyze() is needed for the table to have non-label rows + split_rows_by("ARM") %>% + split_rows_by("SEX") %>% + analyze("BMRKR1") # analyze() is needed for the table to have non-label rows -lyt %>% - build_table(DM_tmp) +lyt %>% + build_table(DM_tmp) # undebug(rtables:::do_split) ``` -Now, we might want to check the formal class of `spl` before anything else. +Before continuing, we want to check the formal class of `spl`. ```{r, eval=FALSE} -# rtables 6.0.2 +# rtables 0.6.2 Browse[2]> str(spl, max.level = 2) Formal class 'VarLevelSplit' [package "rtables"] with 20 slots ..@ value_label_var : chr "ARM" @@ -448,57 +476,57 @@ Formal class 'VarLevelSplit' [package "rtables"] with 20 slots ..@ child_section_div : chr NA ``` -From this, we can directly infer that the class is different now (`VarLevelSplit`) and understand that the split label will be hidden (`split_label_position` slot). Moreover, we see a specific value order with specific split values. Also, `VarLevelSplit` seems to have three more slots than `AllSplit`. What are they precisely? +From this, we can directly infer that the class is different now (`VarLevelSplit`) and understand that the split label will be hidden (`split_label_position` slot). Moreover, we see a specific value order with specific split values. `VarLevelSplit` also seems to have three more slots than `AllSplit`. What are they precisely? ```{r, eval=FALSE} -# rtables 6.0.2 +# rtables 0.6.2 slots_as <- getSlots("AllSplit") # inherits virtual class Split and is general class for all splits # getClass("CustomizableSplit") # -> Extends: "Split", Known Subclasses: Class "VarLevelSplit", directly slots_cs <- getSlots("CustomizableSplit") # Adds split function slots_vls <- getSlots("VarLevelSplit") slots_cs[!(names(slots_cs) %in% names(slots_as))] -# split_fun -# "functionOrNULL" +# split_fun +# "functionOrNULL" slots_vls[!(names(slots_vls) %in% names(slots_cs))] -# value_label_var value_order +# value_label_var value_order # "character" "ANY" ``` -Remember always to check the constructor and class definition inside `R/00tabletrees.R` if exploratory tools do not suffice. Now, `check_validsplit(spl, df)` will dispatch to a different method than before (`getMethod("check_validsplit", "VarLevelSplit")`). Indeed, it uses the internal utility function `.checkvarsok` to check if the `vars`, i.e. the `payload` is actually present in `names(df)`. +Remember to always check the constructor and class definition in `R/00tabletrees.R` if exploratory tools do not suffice. Now, `check_validsplit(spl, df)` will use a different method than before (`getMethod("check_validsplit", "VarLevelSplit")`). It uses the internal utility function `.checkvarsok` to check if `vars`, i.e. the `payload`, is actually present in `names(df)`. -Now, the next relevant function will be `.apply_split_inner` where we want to see exactly what changes (`debugonce(.apply_split_inner)`). Of course, this function is directly called as no custom split function is provided. Being parameter `vals` not specified (`NULL`), the split values are retrieved from `df` by using the split payload to select specific columns (`varvec <- df[[spl_payload(spl)]]`). Every time no split values are specified, they will be retrieved from the selected column as unique values, if character, or levels, if factor. +The next relevant function will be `.apply_split_inner`, and we will exactly what changes using `debugonce(.apply_split_inner)`. Of course, this function is called directly as no custom split function is provided. Since parameter `vals` is not specified (`NULL`), the split values are retrieved from `df` by using the split payload to select specific columns (`varvec <- df[[spl_payload(spl)]]`). Whenever no split values are specified they are retrieved from the selected column as unique values (`character`) or levels (`factor`). -Next, `.applysplit_datapart` creates a named list of facets or data subsets. In this case, the result is actually a mutually exclusive partition of the data. This is because we did not specify any split values and the column content was used as such with unique call in case of a character vector or levels in case of factors. `.applysplit_partlabels` is a bit less linear as it has to take into account the possibility of having specified labels in the payload. Beside looking at the function source code with `getMethod(".applysplit_partlabels", "VarLevelSplit")`, we can enter in debugging mode the `S4` generic function as follows: +Next, `.applysplit_datapart` creates a named list of facets or data subsets. In this case, the result is actually a mutually exclusive partition of the data. This is because we did not specify any split values and as such the column content was retrieved via `unique` (in case of a character vector) or `levels` (in case of factors). `.applysplit_partlabels` is a bit less linear as it has to take into account the possibility of having specified labels in the payload. Instead of looking at the function source code with `getMethod(".applysplit_partlabels", "VarLevelSplit")`, we can enter the `S4` generic function in debugging mode as follows: ```{r, eval=FALSE} -# rtables 6.0.2 +# rtables 0.6.2 eval(debugcall(.applysplit_partlabels(spl, df, vals, labels))) # We leave to the smart developer to see how the labels are assigned -# PS: remember to undebugcall() similarly +# Remember to undebugcall() similarly! ``` -In our case, the final labels are `vals` because they were not assigned. Their order is retrieved from the split object (`spl_child_order(spl)`) and matched with current split values. The returned list is then processed as it was before. +In our case, the final labels are `vals` because they were not explicitly assigned. Their order is retrieved from the split object (`spl_child_order(spl)`) and matched with current split values. The returned list is then processed as it was before. -If we continue with the next call of `do_split`, the same procedure is accomplished for the second `ARM` split. This is done on the partition that was already done in the first split. The only give out of this is the fact that the main `df` is constituted by a subset (facet) of the total data, according to the first split. This will be done iteratively for as many data split as requested. Before concluding this iteration, we take a moment to talk a bit more in detail about how `.fixupvals(partinfo)` works. It is not a generic function and the source code can be easily accessed as follows. We suggest to run through it with `debugonce(.fixupvals)` to understand what it does in practice. The fundamental aspects are listed in the following: +If we continue with the next call of `do_split`, the same procedure is followed for the second `ARM` split. This is applied to the partition that was created in the first split. The main `df` is now constituted by a subset (facet) of the total data, determined by the first split. This will be repeated iteratively for as many data splits as requested. Before concluding this iteration, we take a moment to discuss in detail how `.fixupvals(partinfo)` works. This is not a generic function and the source code can be easily accessed. We suggest running through it with `debugonce(.fixupvals)` to understand what it does in practice. The fundamental aspects of `.fixupvals(partinfo)` are as follows: * Ensures that labels are character and not factor. * Ensures that the splits of data and list of values are named according to labels. * Guarantees that `ret$values` contains `SplitValue` objects. -* Removes the list element `extra` since its now included in the `SplitValue`. +* Removes the list element `extra` since it is now included in the `SplitValue`. Note that this function can occasionally be called more than once on the same return object (a named list for now). Of course, after the first call only checks are applied. ```{r, eval=FALSE} -# rtables 6.0.2 - +# rtables 0.6.2 + # Can find the following core function: # vals <- make_splvalue_vec(vals, extr, labels = labels) -# ---> Main list of SplitValue objects: iterative call of +# ---> Main list of SplitValue objects: iterative call of # new("SplitValue", value = val, extra = extr, label = label) -# Structure of ret before the function call +# Structure of ret before calling .fixupvals Browse[2]> str(ret, max.level = 2) List of 4 $ values : chr [1:2] "A: Drug X" "B: Placebo" @@ -510,7 +538,7 @@ List of 4 $ extras :List of 2 ..$ : list() ..$ : list() - + # Structure of ret after the function call Browse[2]> str(.fixupvals(ret), max.level = 2) List of 3 @@ -522,7 +550,7 @@ List of 3 ..$ B: Placebo: tibble [106 × 8] (S3: tbl_df/tbl/data.frame) $ labels : Named chr [1:2] "A: Drug X" "B: Placebo" ..- attr(*, "names")= chr [1:2] "A: Drug X" "B: Placebo" - + # The SplitValue object is fundamental Browse[2]> str(ret$values) List of 2 @@ -536,65 +564,66 @@ List of 2 .. ..@ label: chr "B: Placebo" ``` +### Pre-Made Split Functions -#### Included split functions - -We start with a custom split function that is already defined in `rtables`. Its scope is filtering out specific values as follows: +We start by examining a split function that is already defined in `rtables`. Its scope is filtering out specific values as follows: ```{r, message=FALSE} library(rtables) # debug(rtables:::do_split) # uncomment to see into the main split function basic_table() %>% - split_rows_by("SEX", split_fun = drop_split_levels) %>% - analyze("BMRKR1") %>% - build_table(DM) + split_rows_by("SEX", split_fun = drop_split_levels) %>% + analyze("BMRKR1") %>% + build_table(DM) # undebug(rtables:::do_split) -# PS: this produces the same output as before with the filters +# This produces the same output as before (when filters were used) ``` -After skipping the root split, we enter the split based on column `SEX`. As we specified a split function, we retrieve the split function by using `splfun <- split_fun(spl)` and enter an `if-else` statement for the two possible cases where there is split contenxt or not. In both cases, an error catching framework is used so to give informative errors in case of failure. Later we will see better how it works. +After the root split, we enter the split based on `SEX`. As we have specified a split function, we can retrieve the split function by using `splfun <- split_fun(spl)` and enter an if-else statement for the two possible cases: whether there is split context or not. In both cases, an error catching framework is used to give informative errors in case of failure. Later we will see more in depth how this works. -Here, we invite to always keep a keen eye on `spl_context`, as it is fundamental for more sophisticate splits, e.g. in the cases where the split itself depends mainly on preceding splits or values. Please, when the split function is called, take a moment to look at how `drop_split_levels` is defined. You will see that it is fundamentally a wrapper of `.apply_split_inner` that drops empty factor levels, therefore avoiding empty split. +We invite the reader to always keep an eye on `spl_context`, as it is fundamental to more sophisticated splits, e.g. in the cases where the split itself depends mainly on preceding splits or values. When the split function is called, please take a moment to look at how `drop_split_levels` is defined. You will see that the function is fundamentally a wrapper of `.apply_split_inner` that drops empty factor levels, therefore avoiding empty splits. ```{r, eval=FALSE} -# rtables 6.0.2 +# rtables 0.6.2 > drop_split_levels function(df, spl, vals = NULL, labels = NULL, trim = FALSE) { - # Retrieve split column - var <- spl_payload(spl) - df2 <- df - - ## This call is exactly the one we did in the filtering to get rid of empty levels - df2[[var]] <- factor(df[[var]]) - - ## Our main function! - .apply_split_inner(spl, df2, vals = vals, - labels = labels, - trim = trim) + # Retrieve split column + var <- spl_payload(spl) + df2 <- df + + ## This call is exactly the one we used when filtering to get rid of empty levels + df2[[var]] <- factor(df[[var]]) + + ## Our main function! + .apply_split_inner(spl, df2, + vals = vals, + labels = labels, + trim = trim + ) } ``` -There are many split functions already included in `rtables`. Lists of them can be found in `vignette("split_functions")`, `?split_funcs`, and `vignette("advanced_usage")`. We leave to the smart developer finding in detail how some of these work, in particular `trim_levels_to_map`. +There are many pre-made split functions included in `rtables`. A list of these functions can be found in the [Split Functions vignette](https://insightsengineering.github.io/rtables/main/articles/split_functions.html), or via `?split_funcs`. We leave it to the developer to look into how some of these split functions work, in particular `trim_levels_to_map` may be of interest. -#### Custom split functions +### Creating Custom Split Functions -Now we try to create our custom split function. Firstly, we will see how the system manages error messages. For a general understanding of how we can provide custom split functions, please read `?custom_split_funs` in detail. In the following we use browser() to enter our custom split functions. For the error cases, we invite the reader to activate `options(error = recover)` so to investigate the cases where we have an error. Note that you can retrieve original behavior by restarting `R` session or by caching the default option value. Another smart possibility is to use `callr` to retrieve the default as follows: `default_opts <- callr::r(function(){options()}); options(error = default_opts$error)`. +Now we will create a custom split function. Firstly, we will see how the system manages error messages. For a general understanding of how custom split functions are created, please read the [Custom Split Functions section](https://insightsengineering.github.io/rtables/main/articles/advanced_usage.html#custom-split-functions) of the Advanced Usage vignette or see `?custom_split_funs`. In the following code we use `browser()` to enter our custom split functions. We invite the reader to activate `options(error = recover)` to investigate cases where we encounter an error. Note that you can revert to default behavior by restarting your `R` session, by caching the default option value, or by using `callr` to retrieve the default as follows: `default_opts <- callr::r(function(){options()}); options(error = default_opts$error)`. ```{r} -# rtables 6.0.2 -# Table call with only function changing -simple_table <- function(DM, f){ - lyt <- basic_table() %>% - split_rows_by("ARM", split_fun = f) %>% - analyze("BMRKR1") - - lyt %>% - build_table(DM) +# rtables 0.6.2 +# Table call with only the function changing +simple_table <- function(DM, f) { + lyt <- basic_table() %>% + split_rows_by("ARM", split_fun = f) %>% + analyze("BMRKR1") + + lyt %>% + build_table(DM) } # First round will fail because there are unused arguments exploratory_split_fun <- function(df, spl) NULL @@ -605,105 +634,110 @@ err_msg <- tryCatch(simple_table(DM, exploratory_split_fun), error = function(e) message(err_msg$message) ``` -Commented debugging options can get you above and before the error. Nonetheless using the recover option will get you the possibility to select the frame number, i.e. the trace level to enter as debugging selecting the last one (10 in my case), will allow you to see the value of `ret` from `rtables:::do_split` that is the simple error and how the informative error message that follows is created. +The commented debugging lines above will allow you to inspect the error. Alternatively, using the recover option will allow you the possibility to select the frame number, i.e. the trace level, to enter. Selecting the last frame number (10 in this case) will allow you to see the value of `ret` from `rtables:::do_split` that causes the error and how the informative error message that follows is created. ```{r, eval=FALSE} -# rtables 6.0.2 +# rtables 0.6.2 # Debugging level 10: tt_dotabulation.R#627: do_split(spl, df, spl_context = spl_context) # Original call and final error > simple_table(DM, exploratory_split_fun) -Error in do_split(spl, df, spl_context = spl_context) : +Error in do_split(spl, df, spl_context = spl_context) : Error applying custom split function: unused arguments (vals, labels, trim = trim) # This is main error split: VarLevelSplit (ARM) # Split reference - occured at path: root # Path level (where it happened) + occured at path: root # Path level (where it occurred) ``` -The previous split function fails because not all arguments are present. A simple way to avoid this is to add `...` to the function call. Now lets construct an interesting split function (and error): +The previous split function fails because `exploratory_split_fun` is given more arguments than it accepts. A simple way to avoid this is to add `...` to the function call. Now let's construct an interesting split function (and error): ```{r} -# rtables 6.0.2 -f_brakes_if <- function(split_col = NULL, error = FALSE){ - function(df, spl, ...){ # order matters! more than naming - # browser() # To check how it works - if (is.null(split_col)) { # Retrieves the default - split_col <- spl_variable(spl) # Internal accessor to split obj - } - my_payload <- split_col # Changing split column value - - vals <- levels(df[[my_payload]]) # Extracting values to split - datasplit <- lapply(seq_along(vals), function(i) { - df[df[[my_payload]] == vals[[i]], ] - }) - names(datasplit) <- as.character(vals) - - # Fantasy error - if (isTRUE(error)) { - # browser() # If you need to check how it works - mystery_error_values <- sapply(datasplit, function(x) mean(x$BMRKR1)) - if (any(mystery_error_values > 6)) { - stop("It should not be more than 6! Should it be? Found in split values: ", - names(datasplit)[which(mystery_error_values > 6)]) - } - } - - # Handy function to return a split result!! - make_split_result(vals, datasplit, vals) +# rtables 0.6.2 +f_brakes_if <- function(split_col = NULL, error = FALSE) { + function(df, spl, ...) { # order matters! more than naming + # browser() # To check how it works + if (is.null(split_col)) { # Retrieves the default + split_col <- spl_variable(spl) # Internal accessor to split obj + } + my_payload <- split_col # Changing split column value + + vals <- levels(df[[my_payload]]) # Extracting values to split + datasplit <- lapply(seq_along(vals), function(i) { + df[df[[my_payload]] == vals[[i]], ] + }) + names(datasplit) <- as.character(vals) + + # Error + if (isTRUE(error)) { + # browser() # If you need to check how it works + mystery_error_values <- sapply(datasplit, function(x) mean(x$BMRKR1)) + if (any(mystery_error_values > 6)) { + stop( + "It should not be more than 6! Should it be? Found in split values: ", + names(datasplit)[which(mystery_error_values > 6)] + ) + } } + + # Handy function to return a split result!! + make_split_result(vals, datasplit, vals) + } } simple_table(DM, f_brakes_if()) # works! simple_table(DM, f_brakes_if(split_col = "STRATA1")) # works! -# Does not work, but in an informative way -# simple_table(DM, f_brakes_if(error = TRUE)) +# simple_table(DM, f_brakes_if(error = TRUE)) # does not work, but returns an informative message # Error in do_split(spl, df, spl_context = spl_context) : # Error applying custom split function: It should not be more than 6! Should it be? Found in split values: B: Placebo # split: VarLevelSplit (ARM) -# occured at path: root +# occurred at path: root ``` -Now we will dwell a moment to the relatively new machinery to create custom split functions. Before doing so, please read the relevant documentation `?make_split_fun`. The majority of functions already included in `rtables` can be or will be written with `make_split_fun` as it is a more stable constructor for such functions. We invite the reader to take a look at `make_split_fun.R`. The majority of functions should be very understandable as far as you got into this guide. We want to highlight that if no core split function is specified, which is commonly the case, `make_split_fun` calls directly `do_base_split` which is a minimal wrapper of our well known `do_split`. `drop_facet_levels` for example is a pre-processing function that at the core simply removes empty factor levels from the split "column", thus avoiding empty lines to be shown. +Now we will take a moment to dwell on the machinery included in `rtables` to create custom split functions. Before doing so, please read the relevant documentation at `?make_split_fun`. Most of the pre-made split functions included in `rtables` are or will be written with `make_split_fun` as it is a more stable constructor for such functions than was previously used. We invite the reader to take a look at `make_split_fun.R`. The majority of the functions here should be understandable with the knowledge you have gained from this guide so far. It is important to note that if no core split function is specified, which is commonly the case, `make_split_fun` calls `do_base_split` directly, which is a minimal wrapper of the well-known `do_split`. `drop_facet_levels`, for example, is a pre-processing function that at its core simply removes empty factor levels from the split "column", thus avoiding showing empty lines. + +It is also possible to provide a list of functions, as it can be seen in the examples of `?make_split_fun`. Note that pre- and post-processing requires a list as input to support the possibility of combining multiple functions. In contrast, the core splitting function must be a single function call as it is not expected to have stacked features. This rarely needs to be modified and the majority of the included split functions work with pre- or post-processing. Included post-processing functions are interesting as they interact with the split object, e.g. by reordering the facets or by adding an overall facet (`add_overall_facet`). The attentive reader will have noticed that the core function relies on `do_split` and many of the post-processing functions rely on `make_split_result`, which is the best way to get the correct split return structure. Note that modifying the core split only works in the row space at the moment. -It is possible, also to add a list of functions, as it can be seen in the examples of `?make_split_fun`. Note that pre and post processing need a list in input to support the possibility to combine multiple functions. The core splitting function, instead, must be a single function call as it is not expected to have stacked features. This needs rarely to be modified and the majority of the included split functions work with pre or post processing. Included post-processing functions are interesting as they interact with the split object, e.g. by reordering the facets or by adding an overall facet (`add_overall_facet`). The smart reader will have noticed as the core function rely somehow on `do_split` and many of the post processing functions rely on `make_split_result` which is the best way to get the correct split return structure. Note that modifying the core split works only in the row space at the moment. +#### `.spl_context` - Adding Context to Our Splits -#### `.spl_context` - a bit of context to our splits -The best way to understand what split context does and how to use it is to read relevant vignette (xxx advanced usage), and to use `browser()` a split function to see how it is structured. As `.spl_context` is needed for rewriting core functions, we propose here a wrapper of `do_base_split`, which is a handy redirection to the standard `do_split` without -the split function part, i.e. it is a wrapper of `.apply_split_inner`, the real core splitting machinery. For curiosity we set here `trim = TRUE`. This trimming works only when there is a mixed table (some values are 0s and some have content, there it trims the 0s). This is rarely the case and we encourage using the replacement functions `trim_levels_to_group` and `trim_levels_to_map`. Nowadays, it should even be impossible to set it differently from `trim = FALSE`. +The best way to understand what split context does, and how to use it, is to read the [Leveraging `.spl_context` section](https://insightsengineering.github.io/rtables/main/articles/advanced_usage.html#leveraging--spl_context) of the Advanced Usage vignette, and to use `browser()` within a split function to see how it is structured. As `.spl_context` is needed for rewriting core functions, we propose a wrapper of `do_base_split` here, which is a handy redirection to the standard `do_split` without the split function part (i.e. it is a wrapper of `.apply_split_inner`, the real core splitting machinery). Out of curiosity, we set `trim = TRUE` here. This trimming only works when there is a mixed table (some values are 0s and some have content), for which it will trim 0s. This is rarely the case, and we encourage using the replacement functions `trim_levels_to_group` and `trim_levels_to_map` for trimming. Nowadays, it should even be impossible to set it differently from `trim = FALSE`. (write an issue informative error for not list xxx). ```{r, eval=FALSE} -# rtables 6.0.2 +# rtables 0.6.2 browsing_f <- function(df, spl, .spl_context, ...) { - # browser() - # do_base_split(df, spl, ...) # order matters!! This would fail if done - do_base_split(spl = spl, df = df, vals = NULL, labels = NULL, trim = TRUE) + # browser() + # do_base_split(df, spl, ...) # order matters!! This would fail if done + do_base_split(spl = spl, df = df, vals = NULL, labels = NULL, trim = TRUE) } + fnc_tmp <- function(innervar) { # Exploring trim_levels_in_facets (check its form) - function(ret, ...) { - # browser() - for(var in innervar) { # of course AGE is not here, so nothing is dropped!! - ret$datasplit <- lapply(ret$datasplit, function(df) { - df[[var]] <- factor(df[[var]]) - df - }) - } - ret - } + function(ret, ...) { + # browser() + for (var in innervar) { # of course AGE is not here, so nothing is dropped!! + ret$datasplit <- lapply(ret$datasplit, function(df) { + df[[var]] <- factor(df[[var]]) + df + }) + } + ret + } } -basic_table() %>% - split_rows_by("ARM") %>% - split_rows_by("STRATA1") %>% - split_rows_by_cuts("AGE", cuts = c(0, 50, 100), - cutlabels = c("young", "old")) %>% - split_rows_by("SEX", split_fun = make_split_fun( - pre = list(drop_facet_levels), # This is dropping the sex levels (age is upper level) - core_split = browsing_f, - post = list(fnc_tmp("AGE")) # To drop these we should use a split_fun in the above level for that - )) %>% - summarize_row_groups() %>% - build_table(DM) + +basic_table() %>% + split_rows_by("ARM") %>% + split_rows_by("STRATA1") %>% + split_rows_by_cuts("AGE", + cuts = c(0, 50, 100), + cutlabels = c("young", "old") + ) %>% + split_rows_by("SEX", split_fun = make_split_fun( + pre = list(drop_facet_levels), # This is dropping the SEX levels (AGE is upper level) + core_split = browsing_f, + post = list(fnc_tmp("AGE")) # To drop these we should use a split_fun in the above level + )) %>% + summarize_row_groups() %>% + build_table(DM) # The following is the .spl_contest printout: Browse[1]> .spl_context @@ -712,42 +746,51 @@ Browse[1]> .spl_context 2 ARM A: Drug X c("S6", .... 121 TRUE, TR.... 3 STRATA1 A c("S14",.... 36 TRUE, TR.... 4 AGE young c("S14",.... 36 TRUE, TR.... - -# NOTE: make_split_fun(pre = list(drop_facet_levels)) and drop_split_levels + +# NOTE: make_split_fun(pre = list(drop_facet_levels)) and drop_split_levels # do the same thing in this case ``` -Here we can see what is the split column variable (`split`, first column) at this level of the splitting procedure. `value` is the current split value that is being dealt with. Now, for the next column, lets see the number of rows of these dataframes: `sapply(.spl_context$full_parent_df, nrow) # [1] 356 121 36 36`. Indeed, the `root` level contains the full input dataframe, while the other levels are subgroups of the full data according to the split value. `all_cols_n` shows exactly the numbers just described. `all obs` is the current filter applied to the columns. Appling this to the root data (or the row subgroup data) reveals the current facet column-wise (and row-wise if in row split). It is possible to use the same information to make complex splits also on the column space by using the full dataframe and the value splits to select the interested values. This is something we will change and simplify when it is a more apparent need. -### Extra arguments `extra_args` -This functionality is well known and used in the setting of analysis functions (xxx vignette), but we show here how this can also apply to splits. +Here we can see what the split column variable is (`split`, first column) at this level of the splitting procedure. `value` is the current split value that is being dealt with. For the next column, let's see the number of rows of these data frames: `sapply(.spl_context$full_parent_df, nrow) # [1] 356 121 36 36`. Indeed, the `root` level contains the full input data frame, while the other levels are subgroups of the full data according to the split value. `all_cols_n` shows exactly the numbers just described. `all obs` is the current filter applied to the columns. Applying this to the root data (or the row subgroup data) reveals the current column-wise facet (or row-wise for a row split). It is also possible to use the same information to make complex splits in the column space by using the full data frame and the value splits to select the interested values. This is something we will change and simplify within `rtables` as the need becomes apparent. + +### Extra Arguments: `extra_args` + +This functionality is well-known and used in the setting of analysis functions (a somewhat complicated example of this can be found in the [Example Complex Analysis Function vignette](https://insightsengineering.github.io/rtables/main/articles/example_analysis_coxreg.html#constructing-the-table)), but we will show here how this can also apply to splits. ```{r, eval=FALSE} -# rtables 6.0.2 +# rtables 0.6.2 -# Lets use the tracer!! +# Let's use the tracer!! my_tracer <- quote(if (length(spl@extra_args) > 0) browser()) -trace(what = "do_split", - tracer = my_tracer, - where = asNamespace("rtables")) +trace( + what = "do_split", + tracer = my_tracer, + where = asNamespace("rtables") +) + custom_mean_var <- function(var) { - function(df, labelstr, na.rm = FALSE, ...) { - # browser() - mean(df[[var]], na.rm = na.rm) - } + function(df, labelstr, na.rm = FALSE, ...) { + # browser() + mean(df[[var]], na.rm = na.rm) + } } + DM_ageNA <- DM DM_ageNA$AGE[1] <- NA -basic_table() %>% - split_rows_by("ARM") %>% - split_rows_by("SEX", split_fun = drop_split_levels) %>% - summarize_row_groups(cfun = custom_mean_var("AGE"), - extra_args = list(na.rm = TRUE), format = "xx.x", - label_fstr = "label %s") %>% - # content_extra_args, c_extra_args are different slots!! (xxx) - split_rows_by("STRATA1", split_fun = keep_split_levels("A")) %>% - analyze("AGE") %>% # check with the extra_args (xxx) - build_table(DM_ageNA) -# You can accumulate extra_args down to other splits. It is possible this does not + +basic_table() %>% + split_rows_by("ARM") %>% + split_rows_by("SEX", split_fun = drop_split_levels) %>% + summarize_row_groups( + cfun = custom_mean_var("AGE"), + extra_args = list(na.rm = TRUE), format = "xx.x", + label_fstr = "label %s" + ) %>% + # content_extra_args, c_extra_args are different slots!! (xxx) + split_rows_by("STRATA1", split_fun = keep_split_levels("A")) %>% + analyze("AGE") %>% # check with the extra_args (xxx) + build_table(DM_ageNA) +# You can pass extra_args down to other splits. It is possible this will not not # work. Should it? That is why extra_args lives only in splits (xxx) check if it works # as is. Difficult to find an use case for this. Maybe it could work for the ref_group # info. That does not work with nesting already (fairly sure that it will break stuff). @@ -757,173 +800,198 @@ basic_table() %>% # As we can see that was not possible. What if we now force it a bit? my_split_fun <- function(df, spl, .spl_context, ...) { - spl@extra_args <- list(na.rm = TRUE) - # does not work because do_split is not changing the object - # the split does not do anything with it - drop_split_levels(df, spl) + spl@extra_args <- list(na.rm = TRUE) + # does not work because do_split is not changing the object + # the split does not do anything with it + drop_split_levels(df, spl) } # does not work -basic_table() %>% - split_rows_by("ARM") %>% - split_rows_by("SEX", split_fun = my_split_fun) %>% - analyze("AGE", inclNAs = TRUE, afun = mean) %>% # include_NAs is set FALSE - build_table(DM_ageNA) -# extra_args is in available in cols but not in rows, because different columns + +basic_table() %>% + split_rows_by("ARM") %>% + split_rows_by("SEX", split_fun = my_split_fun) %>% + analyze("AGE", inclNAs = TRUE, afun = mean) %>% # include_NAs is set FALSE + build_table(DM_ageNA) +# extra_args is in available in cols but not in rows, because different columns # may need it for different col space. Row-wise it seems not necessary. # The only thing that works is adding it to analyze (xxx) check if it is worth adding -# We invite the developer now to test all the tests file of this package with the tracer on -# therefore -> extra_args is not currently used in splits (xxx could be wrong) +# We invite the developer now to test all the test files of this package with the tracer on +# therefore -> extra_args is not currently used in splits (xxx could be wrong) # could be not being hooked up untrace(what = "do_split", where = asNamespace("rtables")) # Let's try with the other variables identically my_tracer <- quote(if (!is.null(vals) || !is.null(labels) || isTRUE(trim)) { - print("A LOT TO SAY") - message("CANT BLOCK US ALL") - stop("NOW FOR SURE") - browser() - }) -trace(what = "do_split", - tracer = my_tracer, - where = asNamespace("rtables")) + print("A LOT TO SAY") + message("CANT BLOCK US ALL") + stop("NOW FOR SURE") + browser() +}) +trace( + what = "do_split", + tracer = my_tracer, + where = asNamespace("rtables") +) # Run tests by copying the above in setup-fakedata.R (then devtools::test()) -untrace(what = "do_split", - where = asNamespace("rtables")) +untrace( + what = "do_split", + where = asNamespace("rtables") +) ``` -As we have demonstrated, all the above seems like impossible cases, and are to be considered as vestigial and deprecated heritage. -### Final examples with `MultiVarSplit` & `CompoundSplit` -This final part of this chapter is still under construction, hence, the unspecific mentions and the to do list. -xxx `CompoundSplit` generates facets from one variable (e.g. cumulative distributions) while `MultiVarSplit` uses different variables for the split. See `AnalyzeMultiVars`, which inherits from `CompoundSplit` for more details on how it analyzes the same facets multiple times. `MultiVarColSplit` works with `analyze_colvars`, which is a different discussion. `.set_kids_sect_sep` adds things between children (can be set from split). +As we have demonstrated, all of the above seem like impossible cases and are to be considered as vestigial and to be deprecated. + +## `MultiVarSplit` & `CompoundSplit` Examples -Firstly, we want to see how `MultiVarSplit` class behaves for an example case taken from -`?split_rows_by_multivar`. +The final part of this article is still under construction, hence the non-specific mentions and the to do list. +xxx `CompoundSplit` generates facets from one variable (e.g. cumulative distributions) while `MultiVarSplit` uses different variables for the split. See `AnalyzeMultiVars`, which inherits from `CompoundSplit` for more details on how it analyzes the same facets multiple times. `MultiVarColSplit` works with `analyze_colvars`, which is out of the scope of this article. `.set_kids_sect_sep` adds things between children (can be set from split). + +First, we want to see how the `MultiVarSplit` class behaves for an example case taken from `?split_rows_by_multivar`. ```{r, eval=FALSE} -# rtables 6.0.2 +# rtables 0.6.2 my_tracer <- quote(if (is(spl, "MultiVarSplit")) browser()) -trace(what = "do_split", - tracer = my_tracer, - where = asNamespace("rtables")) +trace( + what = "do_split", + tracer = my_tracer, + where = asNamespace("rtables") +) # We want also to take a look at the following: debugonce(rtables:::.apply_split_inner) lyt <- basic_table() %>% - split_cols_by("ARM") %>% - split_rows_by_multivar(c("BMRKR1", "BMRKR1"), - varlabels = c("SD", "MEAN")) %>% - split_rows_by("COUNTRY", - split_fun = keep_split_levels("PAK")) %>% # xxx for #690 #691 - summarize_row_groups() %>% - analyze(c("AGE", "SEX")) + split_cols_by("ARM") %>% + split_rows_by_multivar(c("BMRKR1", "BMRKR1"), + varlabels = c("SD", "MEAN") + ) %>% + split_rows_by("COUNTRY", + split_fun = keep_split_levels("PAK") + ) %>% # xxx for #690 #691 + summarize_row_groups() %>% + analyze(c("AGE", "SEX")) build_table(lyt, DM) # xxx check empty space on top -> check if it is a bug, file it -untrace(what = "do_split", - where = asNamespace("rtables")) +untrace( + what = "do_split", + where = asNamespace("rtables") +) ``` -If we print them out, we will notice that the two groups (one called "SEX" and the other "STRATA1") are identical along the columns. This is because no subgroup was actually created. This is an interesting way to personalize splits and with the help of custom split functions and their split context, also to have widely different subgroups in the table. +If we print the output, we will notice that the two groups (one called "SEX" and the other "STRATA1") are identical along the columns. This is because no subgroup was actually created. This is an interesting way to personalize splits with the help of custom split functions and their split context, and to have widely different subgroups in the table. -We invite the reader to try to understand why `split_rows_by_multivar` can have other row splits under it (see `xxx` comment in the previous code), while `split_cols_by_multivar` does not. It is a known bug at the moment, and we would be pleased to have a fix. The issues are often linked in the code by their code number (e.g. `#690`). +We invite the reader to try to understand why `split_rows_by_multivar` can have other row splits under it (see `xxx` comment in the previous code), while `split_cols_by_multivar` does not. This is a known bug at the moment, and we will work towards a fix for this. Known issues are often linked in the source code by their GitHub issue number (e.g. `#690`). -Lastly, we will briefly show an example of a split with cut function and how to replace it and solve the problem with empty age groups we had before. In the following, we propose the same simplified situation: +Lastly, we will briefly show an example of a split by cut function and how to replace it to solve the empty age groups problem as we did before. We propose the same simplified situation: ```{r} -# rtables 6.0.2 +# rtables 0.6.2 cutfun <- function(x) { # browser() cutpoints <- c(0, 50, 100) - names(cutpoints) <- c("", "Younger", "Older") cutpoints } -tbl <- basic_table(show_colcounts = TRUE) %>% - split_rows_by("ARM", - split_fun = drop_and_remove_levels(c("B: Placebo", "C: Combination"))) %>% - split_rows_by("STRATA1") %>% - split_rows_by_cutfun("AGE", cutfun = cutfun) %>% - # split_rows_by_cuts("AGE", cuts = c(0, 50, 100), - # cutlabels = c("young", "old")) %>% # Works the same - split_rows_by("SEX", split_fun = drop_split_levels) %>% - summarize_row_groups() %>% # This is degenerate!!! - build_table(DM) +tbl <- basic_table(show_colcounts = TRUE) %>% + split_rows_by("ARM", split_fun = drop_and_remove_levels(c("B: Placebo", "C: Combination"))) %>% + split_rows_by("STRATA1") %>% + split_rows_by_cutfun("AGE", cutfun = cutfun) %>% + # split_rows_by_cuts("AGE", cuts = c(0, 50, 100), + # cutlabels = c("young", "old")) %>% # Works the same + split_rows_by("SEX", split_fun = drop_split_levels) %>% + summarize_row_groups() %>% # This is degenerate!!! + build_table(DM) tbl ``` -For both cases (`*_cuts` and `*_cutfun`), we have empty levels that are not dropped. This is to be expected and can be avoided by using a dedicated split function. Only intentionally looking at the future split is possible to know if there is any element in it. At the moment, though, it is not possible to add `spl_fun` to dedicated split function like `split_rows_by_cuts`. -Note too that in the previous table we used only `summarize_row_groups` but no `analyze`. This rendered nicely but it is not the standard way to go as `summarize_row_groups` was intended ONLY to decorate row groups, i.e. row with labels. Internally, these rows are called content rows and that is why the analysis functions here are called `cfun` instead of `afun`. Indeed, also the tabulation machinery presents these two differences as it is described here (xxx link to tabulation vignette). +For both row split cases (`*_cuts` and `*_cutfun`), we have empty levels that are not dropped. This is to be expected and can be avoided by using a dedicated split function. Intentionally looking at the future split is possible in order to determine if an element is present in it. At the moment it is not possible to add `spl_fun` to dedicated split functions like `split_rows_by_cuts`. + +Note that in the previous table we only used `summarize_row_groups`, with no `analyze` calls. This rendered the table nicely, but it is not the standard method to use as `summarize_row_groups` is intended *only* to decorate row groups, i.e. rows with labels. Internally, these rows are called content rows and that is why analysis functions in `summarize_row_groups` are called `cfun` instead of `afun`. Indeed, the tabulation machinery also presents these two differently as is described in the [Tabulation with Row Structure section](https://insightsengineering.github.io/rtables/main/articles/tabulation_concepts.html#tabulation-with-row-structure) of the Tabulation vignette. + +We can try to construct the split function for cuts manually with `make_split_fun`: -We can try anyway to construct the split function for cuts manually with `make_split_fun`: ```{r, eval=FALSE} my_count_afun <- function(x, .N_col, .spl_context, ...) { - # browser() - out <- list(c(length(x), length(x)/.N_col)) - names(out) <- .spl_context$value[nrow(.spl_context)] # workaround (xxx #689) - in_rows(.list = out, - .formats = c("xx (xx.x%)")) + # browser() + out <- list(c(length(x), length(x) / .N_col)) + names(out) <- .spl_context$value[nrow(.spl_context)] # workaround (xxx #689) + in_rows( + .list = out, + .formats = c("xx (xx.x%)") + ) } # ?make_split_fun # To check for docs/examples # Core split cuts_core <- function(spl, df, vals, labels, .spl_context) { - # browser() # file an issue xxx - # variables that are split on are converted to factor during the original clean-up - # cut split are not doing it but it is an exception. xxx - # young_v <- as.numeric(df[["AGE"]]) < 50 - # current solution: - young_v <- as.numeric(as.character(df[["AGE"]])) < 50 - make_split_result(c("young", "old"), - datasplit = list(df[young_v,], df[!young_v,]), - labels = c("Younger", "Older")) + # browser() # file an issue xxx + # variables that are split on are converted to factor during the original clean-up + # cut split are not doing it but it is an exception. xxx + # young_v <- as.numeric(df[["AGE"]]) < 50 + # current solution: + young_v <- as.numeric(as.character(df[["AGE"]])) < 50 + make_split_result(c("young", "old"), + datasplit = list(df[young_v, ], df[!young_v, ]), + labels = c("Younger", "Older") + ) } -drop_empties <- function(splret, spl, fulldf, ...){ - # browser() - nrows_data_split <- vapply(splret$datasplit, nrow, numeric(1)) - to_keep <- nrows_data_split > 0 - make_split_result(splret$values[to_keep], - splret$datasplit[to_keep], - splret$labels[to_keep]) +drop_empties <- function(splret, spl, fulldf, ...) { + # browser() + nrows_data_split <- vapply(splret$datasplit, nrow, numeric(1)) + to_keep <- nrows_data_split > 0 + make_split_result( + splret$values[to_keep], + splret$datasplit[to_keep], + splret$labels[to_keep] + ) } -gen_split <- make_split_fun(core_split = cuts_core, - post = list(drop_empties)) - -tbl <- basic_table(show_colcounts = TRUE) %>% - split_rows_by("ARM", split_fun = keep_split_levels(c("A: Drug X"))) %>% - split_rows_by("STRATA1") %>% - split_rows_by("AGE", split_fun = gen_split) %>% - analyze("SEX") %>% # It is the last step!! No need of BMRKR1 right? - # split_rows_by("SEX", split_fun = drop_split_levels, - # child_labels = "hidden") %>% # close issue #689. would it work for - # analyze_colvars? probably (xxx) - # analyze("BMRKR1", afun = my_count_afun) %>% # This is NOT degenerate!!! BMRKR1 is only placeholder - build_table(DM) +gen_split <- make_split_fun( + core_split = cuts_core, + post = list(drop_empties) +) + +tbl <- basic_table(show_colcounts = TRUE) %>% + split_rows_by("ARM", split_fun = keep_split_levels(c("A: Drug X"))) %>% + split_rows_by("STRATA1") %>% + split_rows_by("AGE", split_fun = gen_split) %>% + analyze("SEX") %>% # It is the last step!! No need of BMRKR1 right? + # split_rows_by("SEX", split_fun = drop_split_levels, + # child_labels = "hidden") %>% # close issue #689. would it work for + # analyze_colvars? probably (xxx) + # analyze("BMRKR1", afun = my_count_afun) %>% # This is NOT degenerate!!! BMRKR1 is only placeholder + build_table(DM) tbl ``` -There is another way to go. We could prune them out! + +Alternatively, we could choose to prune these rows out with `prune_table`! + ```{r} -# rtables 6.0.2 - -tbl <- basic_table(show_colcounts = TRUE) %>% - split_rows_by("ARM", split_fun = keep_split_levels(c("A: Drug X"))) %>% - split_rows_by("STRATA1") %>% - split_rows_by_cuts("AGE", cuts = c(0, 50, 100), - cutlabels = c("young", "old")) %>% - split_rows_by("SEX", split_fun = drop_split_levels) %>% - summarize_row_groups() %>% # This is degenerate!!! # we keep it until #689 - build_table(DM) +# rtables 0.6.2 + +tbl <- basic_table(show_colcounts = TRUE) %>% + split_rows_by("ARM", split_fun = keep_split_levels(c("A: Drug X"))) %>% + split_rows_by("STRATA1") %>% + split_rows_by_cuts( + "AGE", + cuts = c(0, 50, 100), + cutlabels = c("young", "old") + ) %>% + split_rows_by("SEX", split_fun = drop_split_levels) %>% + summarize_row_groups() %>% # This is degenerate!!! # we keep it until #689 + build_table(DM) tbl # Trying with pruning -prune_table(tbl) #(xxx) what is going on here? it is degenerate so it has no real leaves +prune_table(tbl) # (xxx) what is going on here? it is degenerate so it has no real leaves # It is degenerate -> what to do? -# The same mechanism is applied in the case of NULL leaves, they are rolled up in the +# The same mechanism is applied in the case of NULL leaves, they are rolled up in the # table tree ``` + (xxx) add the pre-proc with z-scoring From 9148d21087c2c129bff9bbe0da3e809f83ccab69 Mon Sep 17 00:00:00 2001 From: Emily de la Rua Date: Tue, 31 Oct 2023 17:47:55 -0400 Subject: [PATCH 37/40] Change default spaces in rtables project --- rtables.Rproj | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rtables.Rproj b/rtables.Rproj index 828602dd8..eaa6b8186 100644 --- a/rtables.Rproj +++ b/rtables.Rproj @@ -6,7 +6,7 @@ AlwaysSaveHistory: Default EnableCodeIndexing: Yes UseSpacesForTab: Yes -NumSpacesForTab: 4 +NumSpacesForTab: 2 Encoding: UTF-8 RnwWeave: Sweave From d3801287c4bcd69472a914247ac9907e2cc996fb Mon Sep 17 00:00:00 2001 From: Emily de la Rua Date: Tue, 31 Oct 2023 18:58:40 -0400 Subject: [PATCH 38/40] Tabulation - grammar fixes, rewording, adding links, applying styler --- inst/dev-guide/dg_tabulation.Rmd | 158 +++++++++++++++++-------------- 1 file changed, 87 insertions(+), 71 deletions(-) diff --git a/inst/dev-guide/dg_tabulation.Rmd b/inst/dev-guide/dg_tabulation.Rmd index cf18ef3b0..2fbd2059b 100644 --- a/inst/dev-guide/dg_tabulation.Rmd +++ b/inst/dev-guide/dg_tabulation.Rmd @@ -20,104 +20,119 @@ knitr::opts_chunk$set(echo = TRUE) ## Disclaimer -Any code or prose which appears in a version of this vignette on the `main` branch of the repository may reflect a specific state of things that can be more or less recent. This guide reflects very important pieces of the split machinery that are unlikely to change. We anyway invite the reader to think always that the current code may have drifted from the following document, and it is always the best practice to read directly the code on `main`. (xxx we should insert it automatically) +This article is intended for use by developers only and will contain low-level explanations of the topics covered. For user-friendly vignettes, please see the [Articles](https://insightsengineering.github.io/rtables/main/articles/index.html) page on the `rtables` website. + +Any code or prose which appears in the version of this article on the `main` branch of the repository may reflect a specific state of things that can be more or less recent. This guide describes very important aspects of the tabulation process that are unlikely to change. Regardless, we invite the reader to keep in mind that the current repository code may have drifted from the following material in this document, and it is always the best practice to read the code directly on `main`. + +Please keep in mind that `rtables` is still under active development, and it has seen the efforts of multiple contributors across different years. Therefore, there may be legacy mechanisms and ongoing transformations that could look different in the future. + +Being that this a working document that may be subjected to both deprecation and updates, we keep `xxx` comments to indicate placeholders for warnings and to-do's that need further work. ## Introduction -Tabulation in `rtables` is a process that takes a pre-defined layout and applies it to the data. The layout object, with all its splits (see `vignette("dg_split_machinery")`) and `analyze`s, can be applied to different data to produce valid tables. This process is happening principally inside the file `tt_dotabulation.R` and its principal user-facing function `build_table` that resides in it. We will sometimes see functions and methods that are present in other files like `colby_construction.R` or `make_subset_expr.R`. We assume any reader is already familiar with the documentation related to `build_table`. Also, we suggest reading first the vignette regarding the split machinery (`vignette("dg_split_machinery")`), as it is instrumental in understanding how the layout object, which is built principally of splits, is tabulated when data is applied. +Tabulation in `rtables` is a process that takes a pre-defined layout and applies it to data. The layout object, with all of its splits (see xxx link split machinery article) and `analyze`s, can be applied to different data to produce valid tables. This process happens principally within the `tt_dotabulation.R` file and the user-facing function `build_table` that resides in it. We will occasionally use functions and methods that are present in other files, like `colby_construction.R` or `make_subset_expr.R`. We assume the reader is already familiar with the documentation for `build_table`. We suggest reading the split machinery vignette (xxx link) prior to this one, as it is instrumental in understanding how the layout object, which is essentially built out of splits, is tabulated when data is supplied. + +## Tabulation -This time, we enter in _medias res_ into `build_table` to see how it is meant to work. +We enter into `build_table` using `debugonce` to see how it works. ```{r, eval=FALSE} -# rtables 6.2.0 +# rtables 0.6.2 library(rtables) debugonce(build_table) # A very simple layout -lyt <- basic_table() %>% - split_rows_by("STRATA1") %>% - split_rows_by("SEX", split_fun = drop_split_levels) %>% - split_cols_by("ARM") %>% - analyze("BMRKR1") +lyt <- basic_table() %>% + split_rows_by("STRATA1") %>% + split_rows_by("SEX", split_fun = drop_split_levels) %>% + split_cols_by("ARM") %>% + analyze("BMRKR1") + # lyt must be a PreDataTableLayouts object is(lyt, "PreDataTableLayouts") lyt %>% build_table(DM) - ``` -Now lets see the interior of our `build_table`, After initial check that the layout is a pre-data table layout, it is checked if the coulmn layout is defined (`clayout` accessor), i.e. it does not have any column split. If that is the case, a `All obs` column is added automatically with all observations. After this, there are a couple of defensive programming calls that do checks and fixtures as we finally have the data. These divide in two kinds: the one that are mainly concerning the layout, which are defined as generics and the one concerning the data that is instead a function as it is not dependent on the layout class. Indeed, the layout is structured and can be divided in `clayout` and `rlayout` (column and row layout). The first one is used to create `cinfo` which is the general object and container of the column splits and information. The second one contains the obligatory all data split, i.e. the root split (accessible with `root_spl`), and the row splits' vectors which are iterative splits in the row space. In the following we consider first the checks and defensive programming. +Now let's look within our `build_table` call. After the initial check that the layout is a pre-data table layout, it checks if the column layout is defined (`clayout` accessor), i.e. it does not have any column split. If that is the case, a `All obs` column is added automatically with all observations. After this, there are a couple of defensive programming calls that do checks and transformations as we finally have the data. These can be divided into two categories: those that mainly concern the layout, which are defined as generics, and those that concern the data, which are instead a function as they are not dependent on the layout class. Indeed, the layout is structured and can be divided into `clayout` and `rlayout` (column and row layout). The first one is used to create `cinfo`, which is the general object and container of the column splits and information. The second one contains the obligatory all data split, i.e. the root split (accessible with `root_spl`), and the row splits' vectors which are iterative splits in the row space. In the following, we consider the initial checks and defensive programming. + ```{r, eval=FALSE} - ## do checks and defensive programming now that we have the data - lyt <- fix_dyncuts(lyt, df) # Now that I have the data, I create the splits that depends on data - lyt <- set_def_child_ord(lyt, df) # With the data I set the same order for all splits - lyt <- fix_analyze_vis(lyt) # Checks if the analyze last split should be visible - # If there is only one you will not get the variable name otherwise you get it if you - # have multivar. Default is NA. You can do it now only because you are sure to - # have the whole layout. - df <- fix_split_vars(lyt, df, char_ok = is.null(col_counts)) - # checks if split vars are present - - lyt[] # preserve names and it is just warning if longer, and repeats the value if only one - lyt@.Data # might not preserve the names # it works only when it is another class that inherits from lists - # We suggest to do extensive testing about these behaviors in order to do choose - # the appropriate one +## do checks and defensive programming now that we have the data +lyt <- fix_dyncuts(lyt, df) # Create the splits that depends on data +lyt <- set_def_child_ord(lyt, df) # With the data I set the same order for all splits +lyt <- fix_analyze_vis(lyt) # Checks if the analyze last split should be visible +# If there is only one you will not get the variable name, otherwise you get it if you +# have multivar. Default is NA. You can do it now only because you are sure to +# have the whole layout. +df <- fix_split_vars(lyt, df, char_ok = is.null(col_counts)) +# checks if split vars are present + +lyt[] # preserve names - warning if names longer, repeats the name value if only one +lyt@.Data # might not preserve the names # it works only when it is another class that inherits from lists +# We suggest doing extensive testing about these behaviors in order to do choose the appropriate one ``` -Along the various checks and defensive programming, we found `PreDataAxisLayout` which is a virtual class that both row and cols layouts inherit from. Virtual classes are handy for group classes that need to share common things like labels or functions that need to be applicable to their relative classes. Check more information about `rtables` class hierarchy in the dedicated dev vignette (xxx add). -Now, we continue with `build_table`. We notice after the checks `TreePos()` which is a constructor for an oject that retains a representation of the tree position along with split values and labels. This is mainly used by `create_colinfo` that we decide to enter now with `debugonce(create_colinfo)`. This function creates the object that represent the column splits and everything else that may be related with the columns. In particular, in this function the column counts are calculated. The parameter inputs are as follows: +Along with the various checks and defensive programming, we find `PreDataAxisLayout` which is a virtual class that both row and column layouts inherit from. Virtual classes are handy for group classes that need to share things like labels or functions that need to be applicable to their relative classes. See more information about the `rtables` class hierarchy in the dedicated article here (xxx add). + +Now, we continue with `build_table`. After the checks, we notice `TreePos()` which is a constructor for an object that retains a representation of the tree position along with split values and labels. This is mainly used by `create_colinfo`, which we enter now with `debugonce(create_colinfo)`. This function creates the object that represents the column splits and everything else that may be related to the columns. In particular, the column counts are calculated in this function. The parameter inputs are as follows: ```{r, eval=FALSE} -cinfo <- create_colinfo(lyt, # Main layout with col splits info - df, # df used for splits and col counts if no alt_counts_df is present - rtpos, # TreePos (does not change in out of this function) - counts = col_counts, # If we want to overwrite the calculations with df/alt_counts_df - alt_counts_df = alt_counts_df, # alternative data for col counts - total = col_total, # calculated from build_table inputs (nrow of df or alt_counts_df) - topleft) # topleft information added into build_table +cinfo <- create_colinfo( + lyt, # Main layout with col split info + df, # df used for splits and col counts if no alt_counts_df is present + rtpos, # TreePos (does not change out of this function) + counts = col_counts, # If we want to overwrite the calculations with df/alt_counts_df + alt_counts_df = alt_counts_df, # alternative data for col counts + total = col_total, # calculated from build_table inputs (nrow of df or alt_counts_df) + topleft # topleft information added into build_table +) ``` `create_colinfo` is in `make_subset_expr.R`. Here, we see that if `topleft` is present in `build_table`, it will override the one in `lyt`. Entering `create_colinfo`, we will see the following calls: + ```{r, eval=FALSE} +clayout <- clayout(lyt) # Extracts column split and info + +if (is.null(topleft)) { + topleft <- top_left(lyt) # If top_left is not present in build_table, it is taken from lyt +} + +ctree <- coltree(clayout, df = df, rtpos = rtpos) # Main constructor of LayoutColTree +# The above is referenced as generic and principally represented as +# setMethod("coltree", "PreDataColLayout", (located in `tree_accessor.R`). +# This is a call that restructures information from clayout, df, and rtpos +# to get a more compact column tree layout. Part of this design is related +# to past implementations. - clayout <- clayout(lyt) # Extracts column split and info - if(is.null(topleft)) - topleft <- top_left(lyt) # If top_left is not present in build_table, it is took from lyt - ctree <- coltree(clayout, df = df, rtpos = rtpos) # Main constructor of LayoutColTree - # The above is referenced as generic and principally represented as - # setMethod("coltree", "PreDataColLayout", (located in `tree_accessor.R`). - # This is a call that restructure information from clayout and df and rtpos - # to get a more compact column tree layout. Part of this design is related - # to past implementations. - - cexprs <- make_col_subsets(ctree, df) # extracts expressions in a compact fashion. WARNING, - # removing NAs at this step is automatic. This should - # be coupled with a warning for NAs in the split (xxx) - colextras <- col_extra_args(ctree) # retrieves extra_args from the tree. It may be not used +cexprs <- make_col_subsets(ctree, df) # extracts expressions in a compact fashion. +# WARNING: removing NAs at this step is automatic. This should +# be coupled with a warning for NAs in the split (xxx) +colextras <- col_extra_args(ctree) # retrieves extra_args from the tree. It may not be used ``` -Next in the function there is the creation of the column counts. For now this happens only at the leaf level but it can be certainly calculated for all levels independently (this is current issue in `rtables`, i.e. how to print other levels' totals). Precedence for col counts may be not documented (xxx todo). Original use case is that you split events while the column counts is the number of patients and not events. First only counts as vector was added, but it is often the case that you have the possibility to add `alt_counts_df`. Finally the `cinfo` object is created (`InstantiatedColumnInfo`) with all the above information. +Next in the function is the determination of the column counts. Currently, this happens only at the leaf level, but it can certainly be calculated independently for all levels (this is an open issue in `rtables`, i.e. how to print other levels' totals). Precedence for column counts may be not documented (xxx todo). The main use case is when you are analyzing a participation-level dataset, with multiple records per subject, and you would like to retain the total numbers of subjects per column, often taken from a subject-level dataset, to use as column counts. Originally, counts were only able to be added as a vector, but it is often the case that users would like the possibility to use `alt_counts_df`. The `cinfo` object (`InstantiatedColumnInfo`) is created with all the above information. -Now, if we continue in `build_table` we hit `.make_ctab` for a root split. This is a general initial procedure that generates the needed root split as a content row. Indeed `ctab` stays for content row which is a row that has only a label in it. From the documentation regarding `summarize_row_groups`, you know that this is the way `rtables` defines label rows, i.e. as content rows. `.make_ctab` is very close to the actual creation of the table row which is done with `.make_tablerows`. Note that this function also uses `parent_cfun` and `.make_caller` to retrieve the content function inserted in above levels. We split here what is the structural handling of the table object and the rows creation engine that are divided by `.make_tablerows` call. If you search the whole package, you will find that this function is called only twice, once in `.make_ctab` and once in `.make_analyzed_tab`. These two are the final elements of the table construction: the creation of rows. +If we continue inside `build_table`, we see `.make_ctab` used to make a root split. This is a general procedure that generates the initial root split as a content row. `ctab` is applied to this content row, which is a row that contains only a label. From `?summarize_row_groups`, you know that this is how `rtables` defines label rows, i.e. as content rows. `.make_ctab` is very similar to the function that actual creates the table rows, `.make_tablerows`. Note that this function uses `parent_cfun` and `.make_caller` to retrieve the content function inserted in above levels. Here we split the structural handling of the table object and the row-creation engine, which are divided by a `.make_tablerows` call. If you search the package, you will find that this function is only called twice, once in `.make_ctab` and once in `.make_analyzed_tab`. These two are the final elements of the table construction: the creation of rows. -Going back to `build_table`, you will see that the row layout is actually a list of split vectors. The fundamental line `kids <- lapply(seq_along(rlyt), function(i) {` allows us to appreciate this. Going forward we see how `recursive_applysplit` is applied to each split vector. It may be worth it to check how, in our test case, this vector looks like. +Going back to `build_table`, you will see that the row layout is actually a list of split vectors. The fundamental line, `kids <- lapply(seq_along(rlyt), function(i) {`, allows us to appreciate this. Going forward we see how `recursive_applysplit` is applied to each split vector. It may be worthwhile to check what this vector looks like in our test case. ```{r, eval=FALSE} -# rtables 6.2.0 - +# rtables 0.6.2 # A very simple layout -lyt <- basic_table() %>% - split_rows_by("STRATA1") %>% - split_rows_by("SEX", split_fun = drop_split_levels) %>% - split_cols_by("ARM") %>% - analyze("BMRKR1") +lyt <- basic_table() %>% + split_rows_by("STRATA1") %>% + split_rows_by("SEX", split_fun = drop_split_levels) %>% + split_cols_by("ARM") %>% + analyze("BMRKR1") + rlyt <- rtables:::rlayout(lyt) str(rlyt, max.level = 2) + Formal class 'PreDataRowLayout' [package "rtables"] with 2 slots ..@ .Data :List of 2 # rlyt is a rtables object (PreDataRowLayout) that is also a list! ..@ root_split:Formal class 'RootSplit' [package "rtables"] with 17 slots # another object! - # If you do summarize_row_groups before anything you act on the root split. We need this to + # If you do summarize_row_groups before anything you act on the root split. We need this to # have a place for the content that is valid for the whole table. str(rtables:::root_spl(rlyt), max.level = 2) # it is still a split @@ -130,19 +145,19 @@ Formal class 'SplitVector' [package "rtables"] with 1 slot .. ..$ :Formal class 'AnalyzeMultiVars' [package "rtables"] with 17 slots ``` -The last print is very informative. We can see from the layout construction that this object is built with 2 `VarLevelSplit` on the rows and one final `AnalyzeMultiVars` which is the leaf analysis split that has the final level rows. The second split vector is the following `AnalyzeVarSplit`. +The last print is very informative. We can see from the layout construction that this object is built with 2 `VarLevelSplit`s on the rows and one final `AnalyzeMultiVars`, which is the leaf analysis split that has the final level rows. The second split vector is the following `AnalyzeVarSplit`. -xxx to get multiple split vectors you need to break the nesting with `nest = FALSE` or by adding a `split_rows_by` after an `analyze` call. +xxx To get multiple split vectors, you need to escape the nesting with `nest = FALSE` or by adding a `split_rows_by` call after an `analyze` call. ```{r, eval=FALSE} -# rtables 6.2.0 +# rtables 0.6.2 str(rlyt[[2]], max.level = 5) Formal class 'SplitVector' [package "rtables"] with 1 slot ..@ .Data:List of 1 .. ..$ :Formal class 'AnalyzeVarSplit' [package "rtables"] with 21 slots - .. .. .. ..@ analysis_fun :function (x, ...) + .. .. .. ..@ analysis_fun :function (x, ...) .. .. .. .. ..- attr(*, "srcref")= 'srcref' int [1:8] 1723 5 1732 5 5 5 4198 4207 - .. .. .. .. .. ..- attr(*, "srcfile")=Classes 'srcfilealias', 'srcfile' + .. .. .. .. .. ..- attr(*, "srcfile")=Classes 'srcfilealias', 'srcfile' .. .. .. ..@ default_rowlabel : chr "Var3 Counts" .. .. .. ..@ include_NAs : logi FALSE .. .. .. ..@ var_label_position : chr "default" @@ -151,10 +166,10 @@ Formal class 'SplitVector' [package "rtables"] with 1 slot .. .. .. ..@ split_label : chr "Var3 Counts" .. .. .. ..@ split_format : NULL .. .. .. ..@ split_na_str : chr NA - .. .. .. ..@ split_label_position : chr(0) + .. .. .. ..@ split_label_position : chr(0) .. .. .. ..@ content_fun : NULL .. .. .. ..@ content_format : NULL - .. .. .. ..@ content_na_str : chr(0) + .. .. .. ..@ content_na_str : chr(0) .. .. .. ..@ content_var : chr "" .. .. .. ..@ label_children : logi FALSE .. .. .. ..@ extra_args : list() @@ -165,12 +180,13 @@ Formal class 'SplitVector' [package "rtables"] with 1 slot .. .. .. ..@ child_section_div : chr NA ``` -Continuing in `recursive_applysplit`, this is made up of two main calls: one to `.make_ctab` which makes the content row and calculates the counts if specified so, and `.make_split_kids`. This eventually contains `recursive_applysplit` if the split vector is built of `Split` that are not `analyze` splits. Indeed, here, it being a generic is very handy to switch between different downstream processes. In our case (`rlyt[[1]]`), we will call the method `getMethod(".make_split_kids", "Split")` twice before getting to the analysis split. There, we can have a (xxx) multi variable split which would apply `.make_split_kids` to each of its elements, in turns calling the main `getMethod(".make_split_kids", "VAnalyzeSplit")` which would in turn go to `.make_analyzed_tab`. +Continuing in `recursive_applysplit`, this is made up of two main calls: one to `.make_ctab` which makes the content row and calculates the counts if specified, and `.make_split_kids`. This eventually contains `recursive_applysplit` which is applied if the split vector is built of `Split`s that are not `analyze` splits. It being a generic is very handy here to switch between different downstream processes. In our case (`rlyt[[1]]`) we will call the method `getMethod(".make_split_kids", "Split")` twice before getting to the analysis split. There, we have a (xxx) multi-variable split which applies `.make_split_kids` to each of its elements, in turn calling the main `getMethod(".make_split_kids", "VAnalyzeSplit")` which would in turn go to `.make_analyzed_tab`. -There are interesting edge cases here for different split cases like `split_by_multivars` and when one of the splits has a reference group. In the code here, it is called `baseline` internally. If we follow this variable across the function layers we will see that where the split (`do_split`) happens (in `getMethod(".make_split_kids", "Split")`), we have a second split for the reference group. This is done so to have available this in each row, to calculate, for example, differences with reference group. +There are interesting edge cases here for different split cases, like `split_by_multivars` or when one of the splits has a reference group. In the internal code here, it is called `baseline`. If we follow this variable across the function layers, we will see that where the split (`do_split`) happens (in `getMethod(".make_split_kids", "Split")`) we have a second split for the reference group. This is done to make this available in each row to calculate, for example, differences from the reference group. -Now we move towards `.make_tablerows`, and here analysis functions become key, i.e. is the place where these are applied and analyzed. First of all, the external `tryCatch` is used to cache errors at a higher level, so to differentiate the two major blocks. The function parameters are quite intuitive, with the exception of `spl_context`. This is a fundamental parameter, that helps keeping information about the splits that can be visible from analysis functions. Follow up and down this value and you will see that is brought and updated everywhere a split happens, except for columns. Column-related information is added last, when in `gen_onerv` which is the lowest level, where one result value is produced. From `.make_tablerows` we go to `gen_rowvalues`, beside some row and referential footers handling. `gen_rowvalues` unpacks the `cinfo` object and crosses it with the arriving row split information to generate rows. In particular `rawvals <- mapply(gen_onerv,` maps the columns to generate a list of values corresponding to a table row. Looking at the final if in `gen_onerv` we see that `if(!is(val, "RowsVerticalSection"))` the function `in_rows` is called. We invite to read what are the building blocks of that and why `.make_tablerows` has the following function `rowconstr` that other is not if the constructor of a data row `DataRow` or a `ContentRow` depending if it is called from `.make_ctab` or `.make_analyzed_tab`. +Now we move towards `.make_tablerows`, and here analysis functions become key as this is the place where these are applied and analyzed. First, the external `tryCatch` is used to cache errors at a higher level, so as to differentiate the two major blocks. The function parameters here are quite intuitive, with the exception of `spl_context`. This is a fundamental parameter that keeps information about splits so that it can be visible from analysis functions. If you look into this value, you will see that is carried and updated everywhere a split happens, except for columns. Column-related information is added last, when in `gen_onerv`, which is the lowest level where one result value is produced. From `.make_tablerows` we go to `gen_rowvalues`, aside from some row and referential footers handling. `gen_rowvalues` unpacks the `cinfo` object and crosses it with the arriving row split information to generate rows. In particular, `rawvals <- mapply(gen_onerv,` maps the columns to generate a list of values corresponding to a table row. Looking at the final `if` in `gen_onerv` we see `if (!is(val, "RowsVerticalSection"))` and the function `in_rows` is called. We invite the reader to explore what the building blocks of `in_rows` are, and how `.make_tablerows` constructs a data row (`DataRow`) or a content row (`ContentRow`) depending on whether it is called from `.make_ctab` or `.make_analyzed_tab`. -`.make_tablerows` either makes a content table or an "analysis table" `gen_rowvalues` generates a list of stacks (`RowsVerticalSection`, more than one rows potentially!) for each column +`.make_tablerows` either makes a content table or an "analysis table". +`gen_rowvalues` generates a list of stacks (`RowsVerticalSection`, more than one rows potentially!) for each column. -to add: conceptual part -> calculating things by column and putting them side by side and slicing them by rows and putting it together -> rtables is row dominant +To add: conceptual part -> calculating things by column and putting them side by side and slicing them by rows and putting it together -> rtables is row dominant From e601ce8f60736094ff28d3248c7a0753d946c8b5 Mon Sep 17 00:00:00 2001 From: Emily de la Rua Date: Tue, 31 Oct 2023 18:58:52 -0400 Subject: [PATCH 39/40] Change word --- inst/dev-guide/dg_split_machinery.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/inst/dev-guide/dg_split_machinery.Rmd b/inst/dev-guide/dg_split_machinery.Rmd index f3c1a0ce2..0f8632a22 100644 --- a/inst/dev-guide/dg_split_machinery.Rmd +++ b/inst/dev-guide/dg_split_machinery.Rmd @@ -20,7 +20,7 @@ knitr::opts_chunk$set(echo = TRUE) ## Disclaimer -This article is intended for use by developers only and will contain low-level explanations of the topics covered. For user-friendly vignettes, please see the [Articles](https://insightsengineering.github.io/rtables/main/articles/index.html) section on the `rtables` website. +This article is intended for use by developers only and will contain low-level explanations of the topics covered. For user-friendly vignettes, please see the [Articles](https://insightsengineering.github.io/rtables/main/articles/index.html) page on the `rtables` website. Any code or prose which appears in the version of this article on the `main` branch of the repository may reflect a specific state of things that can be more or less recent. This guide describes very important pieces of the split machinery that are unlikely to change. Regardless, we invite the reader to keep in mind that the current repository code may have drifted from the following material in this document, and it is always the best practice to read the code directly on `main`. From b364d8735f0916b0f554ee87dc5b3245b14d8476 Mon Sep 17 00:00:00 2001 From: Emily de la Rua Date: Tue, 31 Oct 2023 19:10:28 -0400 Subject: [PATCH 40/40] Debugging - grammar fixes --- inst/dev-guide/dg_debug_rtables.Rmd | 31 ++++++++++++++++++----------- 1 file changed, 19 insertions(+), 12 deletions(-) diff --git a/inst/dev-guide/dg_debug_rtables.Rmd b/inst/dev-guide/dg_debug_rtables.Rmd index 1cdd05030..a89cc0d71 100644 --- a/inst/dev-guide/dg_debug_rtables.Rmd +++ b/inst/dev-guide/dg_debug_rtables.Rmd @@ -17,21 +17,25 @@ knitr::opts_chunk$set(echo = TRUE) ## Debugging -This is a short and not-comprehensive guide to debugging `rtables`. It is to be considered, though, of general validity and of personal use. +This is a short and non-comprehensive guide to debugging `rtables`. Regardless, it is to be considered valid for personal use at your discretion. #### Coding in Practice --> it is easy to read and find problems --> it is not clever, because it is impossible to debug + +* It is easy to read and find problems +* It is not clever if it is impossible to debug #### Some Definitions + * __Coding Error__ - Code does not do what you intended -> Bug in the punch card * __Unexpected Input__ - Defensive programming FAIL FAST FAIL LOUD (FFFL) -> useful and not too time consuming * __Bug in Dependency__ -> never use dependencies if you can! #### Considerations About FFFL + Errors should be as close as possible to the source. For example, bad inputs should be found very early. The worst possible example is a software that is silently giving incorrect results. Common things that we can catch early are missing values, column `length == 0`, or `length > 1`. #### General Suggestions + * Robust code base does not attempt doing possibly problematic operations. * Read Error Messages * `debugcall` you can add the signature (formals) @@ -44,6 +48,7 @@ Errors should be as close as possible to the source. For example, bad inputs sho as you did recover. #### `warn` Global Option + - `<0` ignored - `0` top level function call - `1` immediately as they occur @@ -52,33 +57,35 @@ as you did recover. `<<-` for `recover` or `debugger` gives it to the global environment #### lo-fi debugging + * PRINT / CAT is always a low level debugging that can be used. It is helpful for server jobs where maybe only terminal or console output is available and no `browser()` can be used. For example, you can print the position or state of a function at a certain point untill you find the break point. * comment blocks -> does not work with pipes (you can use `identity()` it is a step that does nothing but does not break the pipes) * `browser()` bombing #### Regression Tests + Almost every bug should become a regression test. #### Debugging with Pipes + * Pipes are better to write code but horrible to debug * T in pipe `%T>%` does print it midway * `debug_pipe()` -> it is like the T pipe going into browser() #### Shiny Debugging -More difficult due to reactivity. +More difficult due to reactivity. #### General Suggestion -DO NOT BE CLEVER WITH CODE - ONLY IF YOU HAVE TO, CLEVER IS ALSO SUBJECTIVE -AND IT WILL CHANGE WITH TIME +DO NOT BE CLEVER WITH CODE - ONLY IF YOU HAVE TO, CLEVER IS ALSO SUBJECTIVE AND IT WILL CHANGE WITH TIME. ## Debugging in `rtables` -We invite the smart developer to use the provided examples as a way to get an "interactive" and dynamic vision of the internal algorithms, as they are routinely executed when constructing tables with `rtables`. This is achieved by using `browser()` and `debugonce()` on internal and exported functions (`rtables:::` or `rtables::`), as we will see in a moment. We invite also the continuous and autonomous exploration of the multiple `S3` and `S4` objects that constitute the complexity and power of `rtables`. To do so we will use the following useful functions: +We invite the smart developer to use the provided examples as a way to get an "interactive" and dynamic view of the internal algorithms as they are routinely executed when constructing tables with `rtables`. This is achieved by using `browser()` and `debugonce()` on internal and exported functions (`rtables:::` or `rtables::`), as we will see in a moment. We invite you to continuously and autonomously explore the multiple `S3` and `S4` objects that constitute the complexity and power of `rtables`. To do so, we will use the following functions: -* `methods(generic_function)`: This function lists the methods that are available for a generic function. Specifically for `S4` generic functions, `showMethods(generic_function)` is giving a more detailed information about each method (e.g. inheritance). -* `class(object)`: This function returns the class of an object. If the class is not one of the built-in classes in R, you can use this information to search for its documentation and examples. Indeed, `help(class)` may be informative, as it will call the documentation of the specific class. Similarly, the `?` operator will call the disambiguation page that delivers you to different `S4` methods. For `S3` methods it is necessary to postfix the class name with a dot (e.g. `?summary.lm`). -* `getClass(class)`: This describes in a compact way the type of class, the slots that it has, and the relationships that it may have with the other classes that may inherit or be inherited by it. With `getClass(object)`, instead, we can see to which values the slots of the object are assigned. It is possible to use `str(object, max.level = 2)` to see a less formal and more compact descriptions of the slots, but it may be problematic when there are one or more objects in the class slots. Hence, the maximum number of level should always be limited to 2 or 3 (`max.level = 2`). Similarly, `attributes()` can be used to retrieve some information, but we need to remember that storing important variables in this way is not encouraged. Information regarding the type of class can be retrieved with `mode()` and indirectly by `summary()` and `is.S4()`. -*`getAnywhere(function)` is very useful to get the source code of internal functions and specific generics. It works very well with `S3` methods, and will in any case display for each of the found methods, its relevant namespace. Similarly, `getMethod(S4_generic, S4_class)` can retrieve the source code of `S4` methods that are specific to a class. -* `eval(debugcall(generic_function(obj)))`: this is a very useful way to browser a `S4` method defined specifically for a defined object without having to manually insert `browser()` into the code. It is also possible to do similarly with R > 3.4.0 where `debug*()` calls can have the triggering signature (class) specified. Both of these are modern and simplified wrappers of tracing function `trace()`. +* `methods(generic_function)`: This function lists the methods that are available for a generic function. Specifically for `S4` generic functions, `showMethods(generic_function)` gives more detailed information about each method (e.g. inheritance). +* `class(object)`: This function returns the class of an object. If the class is not one of the built-in classes in R, you can use this information to search for its documentation and examples. `help(class)` may be informative as it will call the documentation of the specific class. Similarly, the `?` operator will bring up the documentation page for different `S4` methods. For `S3` methods it is necessary to postfix the class name with a dot (e.g. `?summary.lm`). +* `getClass(class)`: This describes the type of class in a compact way, the slots that it has, and the relationships that it may have with the other classes that may inherit from or be inherited by it. With `getClass(object)` we can see to which values the slots of the object are assigned. It is possible to use `str(object, max.level = 2)` to see less formal and more compact descriptions of the slots, but it may be problematic when there are one or more objects in the class slots. Hence, the maximum number of levels should always be limited to 2 or 3 (`max.level = 2`). Similarly, `attributes()` can be used to retrieve some information, but we need to remember that storing important variables in this way is not encouraged. Information regarding the type of class can be retrieved with `mode()` and indirectly by `summary()` and `is.S4()`. +*`getAnywhere(function)` is very useful to get the source code of internal functions and specific generics. It works very well with `S3` methods, and will display the relevant namespace for each of the methods found. Similarly, `getMethod(S4_generic, S4_class)` can retrieve the source code of class-specific `S4` methods. +* `eval(debugcall(generic_function(obj)))`: this is a very useful way to browse a `S4` method, specifically for a defined object, without having to manually insert `browser()` into the code. It is also possible to do similarly with R > 3.4.0 where `debug*()` calls can have the triggering signature (class) specified. Both of these are modern and simplified wrappers of the tracing function `trace()`.