diff --git a/NEWS.md b/NEWS.md index 4b7faf982..9256d3c90 100644 --- a/NEWS.md +++ b/NEWS.md @@ -30,6 +30,7 @@ * When tables are exported as `txt`, they preserve the horizontal separator of the table. * Added imports on `stringi` and `checkmate` as they are fundamental packages for string handling and argument checking. + * Updated introduction vignette and split it into two. Section on introspecting tables is now located in a separate vignette. ## rtables 0.6.5 ### New Features diff --git a/_pkgdown.yml b/_pkgdown.yml index 41db5436e..5ed84d73c 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -76,6 +76,7 @@ articles: - split_functions - format_precedence - tabulation_concepts + - introspecting_tables - title: Advanced Usage contents: diff --git a/man/figures/rtables-basics.png b/man/figures/rtables-basics.png new file mode 100644 index 000000000..4ef55dd4c Binary files /dev/null and b/man/figures/rtables-basics.png differ diff --git a/vignettes/introduction.Rmd b/vignettes/introduction.Rmd index 452b8f829..05edf959f 100644 --- a/vignettes/introduction.Rmd +++ b/vignettes/introduction.Rmd @@ -1,66 +1,60 @@ --- -title: "Introduction to rtables" +title: "Introduction to {rtables}" author: "Gabriel Becker and Adrian Waddell" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > - %\VignetteIndexEntry{Introduction to rtables} + %\VignetteIndexEntry{Introduction to {rtables}} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} -editor_options: +editor_options: chunk_output_type: console --- - ```{r, echo=FALSE} knitr::opts_chunk$set(comment = "#") ``` -```{css, echo=FALSE} -.reveal .r code { - white-space: pre; -} -``` - ## Introduction -The `rtables` R package provides a framework to create, tabulate and -output tables in `R`. Most of the design requirements for `rtables` +The `rtables` package provides a framework to create, tabulate, and +output tables in R. Most of the design requirements for `rtables` have their origin in studying tables that are commonly used to report analyses from clinical trials; however, we were careful to keep `rtables` a general purpose toolkit. -There are a number of other table frameworks available in `R` such as -[gt](https://gt.rstudio.com/) from `RStudio`, -[xtable](https://CRAN.R-project.org/package=xtable), -[tableone](https://CRAN.R-project.org/package=tableone), and -[tables](https://CRAN.R-project.org/package=tables) to name a -few. There is a number of reasons to implement `rtables` (yet another -tables R package): - -* output tables in ASCII to text files -* table rendering (ASCII, HTML, etc.) is separate from the data - model. Hence, one always has access to the non-rounded/non-formatted - numbers. -* pagination in both horizontal and vertical directions to meet the - health authority submission requirements -* cell, row, column, table reference system -* titles, footers, and referential footnotes -* path based access to cell content which will be useful for automated - content generation -In the remainder of this vignette, we give a short introduction into -`rtables` and tabulating a table. The content is based on the [useR -2020 presentation from Gabriel -Becker](https://www.youtube.com/watch?v=CBQzZ8ZhXLA). +In this vignette, we give a short introduction into `rtables` and +tabulating a table. + +The content in this vignette is based on the following two resources: -The packages used for this vignette are `rtables` and `dplyr`: +* The [`rtables` useR 2020 presentation](https://www.youtube.com/watch?v=CBQzZ8ZhXLA) +by Gabriel Becker +* [`rtables` - A Framework For Creating Complex Structured Reporting Tables Via +Multi-Level Faceted Computations](https://arxiv.org/pdf/2306.16610.pdf). + +The packages used in this vignette are `rtables` and `dplyr`: ```{r, message=FALSE} library(rtables) library(dplyr) ``` -## Data +## Overview + +To build a table using `rtables` two components are required: A layout constructed +using `rtables` functions, and a `data.frame` of unaggregated data. These two +elements are combined to build a table object. Table objects contain information +about both the content and the structure of the table, as well as instructions on +how this information should be processed to construct the table. After obtaining the +table object, a formatted table can be printed in ASCII format, or exported to a +variety of other formats (.txt, .pdf, .docx, etc.). + +```{r echo=FALSE, fig.align='center'} +knitr::include_graphics("../man/figures/rtables-basics.png") +``` + +## Data The data used in this vignette is a made up using random number generators. The data content is relatively simple: one row per @@ -89,7 +83,6 @@ Note that we use factor variables so that the level order is represented in the row or column order when we tabulate the information of `df` below. - ## Building a Table The aim of this vignette is to build the following table step by step: @@ -102,13 +95,48 @@ lyt <- basic_table(show_colcounts = TRUE) %>% summarize_row_groups() %>% split_rows_by("handed") %>% summarize_row_groups() %>% - analyze("age", afun = mean, format = "xx.x") + analyze("age", afun = mean, format = "xx.xx") tbl <- build_table(lyt, df) tbl ``` -## Starting Simple +## Quick Start + +The table above can be achieved via the `qtable()` function. If you are new +to tabulation with the `rtables` layout framework, you can use this +convenience wrapper to create many types of two-way frequency tables. + +The purpose of `qtable` is to enable quick exploratory data analysis. See the +[`exploratory_analysis`](https://insightsengineering.github.io/rtables/main/articles/exploratory_analysis.html) vignette for more details. + +Here is the code to recreate the table above: +```{r} +qtable(df, + row_vars = c("country", "handed"), + col_vars = c("arm", "gender"), + avar = "age", + afun = mean, + summarize_groups = TRUE, + row_labels = "mean" +) +``` + +From the `qtable` function arguments above we can see many of the +key concepts of the underlying `rtables` layout framework. +The user needs to define: + + - Which variables should be used as facets in the row and/or column space? + - Which variable should be used in the summary analysis? + - Which function should be used as a summary? + - Should the table include any marginal summaries? + - Are any labels needed to clarify the table content? + +In the sections below we will look at translating each of these questions +to a set of features part of the `rtables` layout framework. Now let's take a +look at building the example table with a layout. + +## Layout Instructions In `rtables` a basic table is defined to have 0 rows and one column representing all data. Analyzing a variable is one way of adding a @@ -122,9 +150,6 @@ tbl <- build_table(lyt, df) tbl ``` - -### Layout Instructions - In the code above we first described the table and assigned that description to a variable `lyt`. We then built the table using the actual data with `build_table()`. The description of a table is called @@ -158,13 +183,11 @@ The general layouting instructions are summarized below: Using those functions, it is possible to create a wide variety of tables as we will show in this document. - -### Adding Column Structure +## Adding Column Structure We will now add more structure to the columns by adding a column split based on the factor variable `arm`: - ```{r} lyt <- basic_table() %>% split_cols_by("arm") %>% @@ -198,7 +221,7 @@ The first column represents the data in `df` where `df$arm == "A" & df$gender == "Female"` and the second column the data in `df` where `df$arm == "A" & df$gender == "Male"`, and so on. -### Adding Row Structure +## Adding Row Structure So far, we have created layouts with analysis and column splitting instructions, i.e. `analyze()` and `split_cols_by()`, @@ -249,7 +272,7 @@ Note that if you print or render a table without pagination, the page_by splits are currently rendered as normal row splits. This may change in future releases. -### Adding Group Information +## Adding Group Information When adding row splits, we get by default label rows for each split level, for example `CAN` and `USA` in the table above. Besides the @@ -321,91 +344,41 @@ tbl <- build_table(lyt, df) tbl ``` +## Comparing with Other Tabulation Frameworks -## Introspecting `rtables` Table Objects - -Once we have created a table, we can inspect its structure using a -number of functions. - -The `table_structure()` function prints a summary of a table's row -structure at one of two levels of detail. By default, it summarizes -the structure at the subtable level. - -```{r} -table_structure(tbl) -``` - -When the `detail` argument is set to `"row"`, however, it provides a -more detailed row-level summary, which acts as a useful alternative to -how we might normally use the `str()` function to interrogate compound -nested lists. - -```{r} -table_structure(tbl, detail = "row") -``` - -The `make_row_df()` and `make_col_df()` functions create a data.frame -which has a variety of information about the table's structure. Most -useful for introspection purposes are the `label`, `name`, -`abs_rownumber`, `path` and `node_class` columns (the remainder of -information in the returned data.frame is used for pagination) - -```{r} -make_row_df(tbl)[, c("label", "name", "abs_rownumber", "path", "node_class")] -``` - -By default `make_row_df()` summarizes only visible rows, but setting -`visible_only` to `FALSE` gives us a structural summary of the table, -including the full hierarchy of subtables, including those that aren't -represented directly by any visible rows: - -```{r} -make_row_df(tbl, visible_only = FALSE)[, c("label", "name", "abs_rownumber", "path", "node_class")] -``` - -`make_col_df()` similarly accepts `visible_only`, though here the -meaning is slightly different, indicating whether only *leaf* columns -should be summarized (`TRUE`, the default) or whether higher level -groups of columns, analogous to subtables in row space, should be -summarized as well. - -```{r} -make_col_df(tbl) -``` - -```{r} -make_col_df(tbl, visible_only = FALSE) -``` - -The `row_paths_summary()` and `col_paths_summary()` functions wrap the -respective `make_*_df` functions, printing the `name`, `node_class` -and `path` information (in the row case), or the `label` and `path` -information (in the column case), indented to illustrate table -structure: - -```{r} -row_paths_summary(tbl) -``` - - -```{r} -col_paths_summary(tbl) -``` - - +There are a number of other table frameworks available in `R`, including: +* [gt](https://gt.rstudio.com/) +* [xtable](https://CRAN.R-project.org/package=xtable) +* [tableone](https://CRAN.R-project.org/package=tableone) +* [tables](https://CRAN.R-project.org/package=tables) +There are a number of reasons to choose `rtables` (yet another tables R package): +* Output tables in ASCII to text files. +* Table rendering (ASCII, HTML, etc.) is separate from the data + model. Hence, one always has access to the non-rounded/non-formatted + numbers. +* Pagination in both horizontal and vertical directions to meet the + health authority submission requirements. +* Cell, row, column, and table reference system. +* Titles, footers, and referential footnotes. +* Path based access to cell content which is useful for automated + content generation. + +More in depth comparisons of the various tabulation frameworks can be found in the +[Overview of table R packages](https://rconsortium.github.io/rtrs-wg/tablepkgs.html#tablepkgs) +chapter of the Tables in Clinical Trials with R book compiled by the R Consortium +Tables Working Group. ## Summary In this vignette you have learned: -* every cell has an associated subset of data - * this means that much of tabulation has to do with - splitting/subsetting data -* tables can be described pre-data using layouts -* tables are a form of visualization of data +* Every cell has an associated subset of data - this means that much of tabulation + has to do with splitting/subsetting data. +* Tables can be described with pre-data using layouts. +* Tables are a form of visualization of data. The other vignettes in the `rtables` package will provide more detailed information about the `rtables` package. We recommend that diff --git a/vignettes/introspecting_tables.Rmd b/vignettes/introspecting_tables.Rmd new file mode 100644 index 000000000..567ffc091 --- /dev/null +++ b/vignettes/introspecting_tables.Rmd @@ -0,0 +1,136 @@ +--- +title: "Introspecting Tables" +author: "Gabriel Becker and Adrian Waddell" +date: "`r Sys.Date()`" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Introspecting Tables} + %\VignetteEncoding{UTF-8} + %\VignetteEngine{knitr::rmarkdown} +editor_options: + chunk_output_type: console +--- + +```{r, echo=FALSE} +knitr::opts_chunk$set(comment = "#") +``` + +The packages used in this vignette are `rtables` and `dplyr`: + +```{r, message=FALSE} +library(rtables) +library(dplyr) +``` + +## Introduction + +First, let's set up a simple table. + +```{r} +lyt <- basic_table(show_colcounts = TRUE) %>% + split_cols_by("ARMCD") %>% + split_cols_by("STRATA2") %>% + split_rows_by("STRATA1") %>% + add_overall_col("All") %>% + summarize_row_groups() %>% + analyze("AGE", afun = max, format = "xx.x") + +tbl <- build_table(lyt, ex_adsl) +tbl +``` + +## Getting Started + +We can get basic table dimensions, the number of rows, and the number of columns with the following code: + +```{r} +dim(tbl) +nrow(tbl) +ncol(tbl) +``` + +## Detailed Table Structure + +The `table_structure()` function prints a summary of a table's row +structure at one of two levels of detail. By default, it summarizes +the structure at the subtable level. + +```{r} +table_structure(tbl) +``` + +When the `detail` argument is set to `"row"`, however, it provides a +more detailed row-level summary which acts as a useful alternative to +how we might normally use the `str()` function to interrogate compound +nested lists. + +```{r} +table_structure(tbl, detail = "row") +``` + +The `make_row_df()` and `make_col_df()` functions each create a `data.frame` with a variety of information about +the table's structure. Most useful for introspection purposes are the `label`, `name`, `abs_rownumber`, `path` and +`node_class` columns (the remainder of the information in the returned `data.frame` is used for pagination) + +```{r} +make_row_df(tbl)[, c("label", "name", "abs_rownumber", "path", "node_class")] +``` + +There is also a wrapper function, `row_paths()` available for `make_row_df` to display only the row path structure: + +```{r} +row_paths(tbl) +``` + +By default `make_row_df()` summarizes only visible rows, but setting `visible_only` to `FALSE` gives us a structural +summary of the table with the full hierarchy of subtables, including those that are not represented directly by any +visible rows: + +```{r} +make_row_df(tbl, visible_only = FALSE)[, c("label", "name", "abs_rownumber", "path", "node_class")] +``` + +`make_col_df()` similarly accepts `visible_only`, though here the meaning is slightly different, indicating whether +only *leaf* columns should be summarized (defaults to `TRUE`) or whether higher level groups of columns - analogous to +subtables in row space - should be summarized as well. + +```{r} +make_col_df(tbl)[, c("label", "name", "abs_pos", "path", "leaf_indices")] +``` + +```{r} +make_col_df(tbl, visible_only = FALSE)[, c("label", "name", "abs_pos", "path", "leaf_indices")] +``` + +Similarly, there is wrapper function `col_paths()` available, which displays only the column structure: + +```{r} +col_paths(tbl) +``` + +The `row_paths_summary()` and `col_paths_summary()` functions wrap the respective `make_*_df` functions, printing the +`name`, `node_class`, and `path` information (in the row case), or the `label` and `path` information (in the column +case), indented to illustrate table structure: + +```{r} +row_paths_summary(tbl) +``` + +```{r} +col_paths_summary(tbl) +``` + +## Applications + +Knowing the structure of an `rtable` object is helpful for retrieving specific values from the table. +For examples, see the [Path Based Cell Value Accessing](https://insightsengineering.github.io/rtables/latest-tag/articles/subsetting_tables.html#path-based-cell-value-accessing) +section of the Subsetting and Manipulating Table Contents vignette. + +Understanding table structure is also important for post-processing processes such as sorting and pruning. More details +on this are covered in the [Pruning and Sorting Tables vignette](https://insightsengineering.github.io/rtables/latest-tag/articles/sorting_pruning.html) +vignette. + +## Summary + +In this vignette you have learned a number of utility functions that are available for examining the underlying +structure of `rtable` objects.