diff --git a/v0.6.7/404.html b/v0.6.7/404.html new file mode 100644 index 000000000..d4eacf375 --- /dev/null +++ b/v0.6.7/404.html @@ -0,0 +1,127 @@ + + +
+ + + + +.github/CODE_OF_CONDUCT.md
+ In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.
+Examples of behavior that contributes to creating a positive environment include:
+Examples of unacceptable behavior by participants include:
+Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.
+Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.
+This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers.
+Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at support@github.com. All complaints will be reviewed and investigated and will result in a response that is deemed necessary and appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately.
+Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project’s leadership.
+This Code of Conduct is adapted from the Contributor Covenant, version 1.4, available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
+For answers to common questions about this code of conduct, see https://www.contributor-covenant.org/faq
+.github/CONTRIBUTING.md
+ We welcome contributions big and small to the ongoing development of the {rtables} package. For most, the best way to contribute to the package is by filing issues for feature requests or bugs that you have encountered. For those who are interested in contributing code to the package, contributions can be made by working on current issues and opening pull requests with code changes. Any help that you are able to provide is greatly appreciated!
+Contributions to this project are released to the public under the project’s open source license.
+Issues are used to establish a prioritized timeline and track development progress within the package. If there is a new feature that you feel would be enhance the experience of package users, please open a Feature Request issue. If you notice a bug in the existing code, please file a Bug Fix issue with a description of the bug and a reprex (reproducible example). Other types of issues (questions, typos you’ve noticed, improvements to documentation, etc.) can be filed as well. Click here to file a new issue, and here to see the list of current issues. Please utilize labels wherever possible when creating issues for organization purposes and to narrow down the scope of the work required.
+Development of the {rtables} package relies on an Issue → Branch → PR → Code Review → Merge pipeline facilitated through GitHub. If you are a more experienced programmer interested in contributing to the package code, please begin by filing an issue describing the changes you would like to make. It may be the case that your idea has already been implemented in some way, and the package maintainers can help to determine whether the feature is necessary before you begin development. Whether you are opening an issue or a pull request, the more detailed your description, the easier it will be for package maintainers to help you! To make code changes in the package, please follow the following process.
+The {rtables} package is part of the NEST project and utilizes staged.dependencies to ensure to simplify the development process and track upstream and downstream package dependencies. We highly recommend installing and using this package when developing within {rtables}.
+In order to work on a new pull request, please first create a branch off of main
upon which you can work and commit changes. To comply with staged.dependencies
standards, {rtables} uses the following branch naming convention:
issue#_description_of_issue@target_merge_branch
For example, 443_refactor_splits@main
. In most cases, the target merge branch is the base (main
) branch.
In some cases, a change in {rtables} may first require upstream changes in the {formatters} package. Suppose we have branch 100_update_fmts@main
in {formatters} containing the required upstream changes. Then the branch created in {rtables} would be named as follows for this example: 443_refactor_splits@100_update_fmts@main
. This ensures that the correct branches are checked out when running tests, etc.
For more details on staged.dependencies
branch naming conventions, click here.
Work within the {rtables} package to apply your code changes. Avoid combining issues on a single branch - ideally, each branch should be associated with a single issue and be prefixed by the issue number.
+For information on the basics of the {rtables} package, please read the package vignettes, which are available here.
+For advanced development work within {rtables}, consider reading through the {rtables} Developer Guide. The Developer Guide can be accessed from the {rtables} site navigation bar, and is listed here for your convenience:
+The {rtables} package follows the tidyverse style guide so please adhere to these guidelines in your submitted code. After making changes to a file within the package, you can apply the package styler automatically and check for lint by running the following two lines of code while within the file:
+styler:::style_active_file()
+lintr:::addin_lint()
+Package documentation uses roxygen2
. If your contribution requires updates to documentation, ensure that the roxygen comments are updated within the source code file. After updating roxygen documentation, run devtools::document()
to update the accompanying .Rd
files (do not update these files by hand!).
To ensure high code coverage, we create tests using the testthat
package. In most cases, changes to package code necessitate the addition of one or more tests to ensure that any added features are working as expected and no existing features were broken.
After making updates to the package, please add a descriptive entry to the NEWS file that reflects your changes. See the tidyverse style guide for guidelines on creating a NEWS entry.
+Once the previous two steps are complete, you can create a pull request. Indicate in the description which issue is addressed in the pull request, and again utilize labels to help reviewers identify the category of the changes contained within the pull request.
+Once your pull request has been created, a series of checks will be automatically triggered, including R CMD check
, tests/code coverage, auto-documentation, and more. All checks must be passing in order to eventually merge your pull request, and further changes may be required in order to resolve the status of these checks. All pull requests must also be reviewed and approved by at least one of the package maintainers before they can be merged. A review will be automatically requested from several {rtables} maintainers upon creating your pull request. When a maintainer reviews your pull request, please try to address the comments in short order - the {rtables} package is updated on a regular basis and leaving a pull request open too long is likely to result in merge conflicts which create more work for the developer.
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
+.github/ISSUE_TEMPLATE.md
+ Please briefly describe your problem and, when relevant, the output you expect. Please also provide the output of utils::sessionInfo()
or devtools::session_info()
at the end of your post.
If at all possible, please include a minimal, reproducible example. The rtables
team will be much more likely to resolve your issue if they are able to reproduce it themselves locally.
Please delete this preamble after you have read it.
+your brief description of the problem
+ +The rtables package as a whole is distributed under Apache Liscence +Version 2 (see below). + +The rtables package also includes the following open source software +components: + +- Bootstrap, https://github.com/twbs/bootstrap + + + +boostrap liscence: +--------------------------------------------------------------------------- +The MIT License (MIT) + +Copyright (c) 2011-2017 Twitter, Inc. +Copyright (c) 2011-2017 The Bootstrap Authors + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. + + + + +rtables liscence: +--------------------------------------------------------------------------- + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [2022] [F. Hoffman-La Roche Ltd] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. ++ +
vignettes/advanced_usage.Rmd
+ advanced_usage.Rmd
This vignette is currently under development. Any code or prose which
+appears in a version of this vignette on the main
branch of
+the repository will work/be correct, but they likely are not in their
+final form.
Initialization
+ +rtables provides an array of functions to control the splitting logic
+without creating an entirely new split functions. By default
+split_*_by
facets data based on categorical variable.
+d1 <- subset(ex_adsl, AGE < 25)
+d1$AGE <- as.factor(d1$AGE)
+lyt1 <- basic_table() %>%
+ split_cols_by("AGE") %>%
+ analyze("SEX")
+
+build_table(lyt1, d1)
## 20 21 23 24
+## ————————————————————————————————————
+## F 0 2 4 5
+## M 1 1 2 3
+## U 0 0 0 0
+## UNDIFFERENTIATED 0 0 0 0
+For continuous variables, the split_*_by_cutfun
can be
+leveraged to create categories and the corresponding faceting, when the
+break points are dependent from the data.
+sd_cutfun <- function(x) {
+ cutpoints <- c(
+ min(x),
+ mean(x) - sd(x),
+ mean(x) + sd(x),
+ max(x)
+ )
+
+ names(cutpoints) <- c("", "Low", "Medium", "High")
+ cutpoints
+}
+
+lyt1 <- basic_table() %>%
+ split_cols_by_cutfun("AGE", cutfun = sd_cutfun) %>%
+ analyze("SEX")
+
+build_table(lyt1, ex_adsl)
## Low Medium High
+## ——————————————————————————————————————
+## F 36 165 21
+## M 21 115 30
+## U 1 8 0
+## UNDIFFERENTIATED 0 1 2
+Alternatively, split_*_by_cuts
can be used when
+breakpoints are predefined and split_*_by_quartiles
when
+the data should be faceted by quantile.
+lyt1 <- basic_table() %>%
+ split_cols_by_cuts(
+ "AGE",
+ cuts = c(0, 30, 60, 100),
+ cutlabels = c("0-30 y.o.", "30-60 y.o.", "60-100 y.o.")
+ ) %>%
+ analyze("SEX")
+
+build_table(lyt1, ex_adsl)
## 0-30 y.o. 30-60 y.o. 60-100 y.o.
+## ———————————————————————————————————————————————————————
+## F 71 150 1
+## M 48 116 2
+## U 2 7 0
+## UNDIFFERENTIATED 1 2 0
+Our custom split functions can do anything, including conditionally +applying one or more other existing custom split functions.
+Here we define a function constructor which accepts the variable name +we want to check, and then return a custom split function that has the +behavior you want using functions provided by rtables for both +cases:
+
+picky_splitter <- function(var) {
+ function(df, spl, vals, labels, trim) {
+ orig_vals <- vals
+ if (is.null(vals)) {
+ vec <- df[[var]]
+ vals <- if (is.factor(vec)) levels(vec) else unique(vec)
+ }
+ if (length(vals) == 1) {
+ do_base_split(spl = spl, df = df, vals = vals, labels = labels, trim = trim)
+ } else {
+ add_overall_level(
+ "Overall",
+ label = "All Obs", first = FALSE
+ )(df = df, spl = spl, vals = orig_vals, trim = trim)
+ }
+ }
+}
+
+
+d1 <- subset(ex_adsl, ARM == "A: Drug X")
+d1$ARM <- factor(d1$ARM)
+
+lyt1 <- basic_table() %>%
+ split_cols_by("ARM", split_fun = picky_splitter("ARM")) %>%
+ analyze("AGE")
This gives us the desired behavior in both the one column corner +case:
+
+build_table(lyt1, d1)
## A: Drug X
+## ————————————————
+## Mean 33.77
+and the standard multi-column case:
+
+build_table(lyt1, ex_adsl)
## A: Drug X B: Placebo C: Combination All Obs
+## ————————————————————————————————————————————————————————
+## Mean 33.77 35.43 35.43 34.88
+Notice we use add_overall_level which is itself a function +constructor, and then immediately call the constructed function in the +more-than-one-columns case.
+.spl_context
+.spl_context
?
+.spl_context
(see ?spl_context
) is a
+mechanism by which the rtables
tabulation machinery gives
+custom split, analysis or content (row-group summary) functions
+information about the overarching facet-structure the splits or cells
+they generate will reside in.
In particular .spl_context
ensures that your functions
+know (and thus do computations based on) the following types of
+information:
+dta_test <- data.frame(
+ USUBJID = rep(1:6, each = 3),
+ PARAMCD = rep("lab", 6 * 3),
+ AVISIT = rep(paste0("V", 1:3), 6),
+ ARM = rep(LETTERS[1:3], rep(6, 3)),
+ AVAL = c(9:1, rep(NA, 9)),
+ CHG = c(1:9, rep(NA, 9))
+)
+
+my_afun <- function(x, .spl_context) {
+ n <- sum(!is.na(x))
+ meanval <- mean(x, na.rm = TRUE)
+ sdval <- sd(x, na.rm = TRUE)
+
+ ## get the split value of the most recent parent
+ ## (row) split above this analyze
+ val <- .spl_context[nrow(.spl_context), "value"]
+ ## do a silly thing to decide the different format precisiosn
+ ## your real logic would go here
+ valnum <- min(2L, as.integer(gsub("[^[:digit:]]*", "", val)))
+ fstringpt <- paste0("xx.", strrep("x", valnum))
+ fmt_mnsd <- sprintf("%s (%s)", fstringpt, fstringpt)
+ in_rows(
+ n = n,
+ "Mean, SD" = c(meanval, sdval),
+ .formats = c(n = "xx", "Mean, SD" = fmt_mnsd)
+ )
+}
+
+lyt <- basic_table() %>%
+ split_cols_by("ARM") %>%
+ split_rows_by("AVISIT") %>%
+ split_cols_by_multivar(vars = c("AVAL", "CHG")) %>%
+ analyze_colvars(my_afun)
+
+build_table(lyt, dta_test)
## A B C
+## AVAL CHG AVAL CHG AVAL CHG
+## ———————————————————————————————————————————————————————————————————————————
+## V1
+## n 2 2 1 1 0 0
+## Mean, SD 7.5 (2.1) 2.5 (2.1) 3.0 (NA) 7.0 (NA) NA NA
+## V2
+## n 2 2 1 1 0 0
+## Mean, SD 6.50 (2.12) 3.50 (2.12) 2.00 (NA) 8.00 (NA) NA NA
+## V3
+## n 2 2 1 1 0 0
+## Mean, SD 5.50 (2.12) 4.50 (2.12) 1.00 (NA) 9.00 (NA) NA NA
+
+my_afun <- function(x, .var, .spl_context) {
+ n <- sum(!is.na(x))
+ meanval <- mean(x, na.rm = TRUE)
+ sdval <- sd(x, na.rm = TRUE)
+
+ ## get the split value of the most recent parent
+ ## (row) split above this analyze
+ val <- .spl_context[nrow(.spl_context), "value"]
+ ## we show it if its not a CHG within V1
+ show_it <- val != "V1" || .var != "CHG"
+ ## do a silly thing to decide the different format precisiosn
+ ## your real logic would go here
+ valnum <- min(2L, as.integer(gsub("[^[:digit:]]*", "", val)))
+ fstringpt <- paste0("xx.", strrep("x", valnum))
+ fmt_mnsd <- if (show_it) sprintf("%s (%s)", fstringpt, fstringpt) else "xx"
+ in_rows(
+ n = if (show_it) n, ## NULL otherwise
+ "Mean, SD" = if (show_it) c(meanval, sdval), ## NULL otherwise
+ .formats = c(n = "xx", "Mean, SD" = fmt_mnsd)
+ )
+}
+
+lyt <- basic_table() %>%
+ split_cols_by("ARM") %>%
+ split_rows_by("AVISIT") %>%
+ split_cols_by_multivar(vars = c("AVAL", "CHG")) %>%
+ analyze_colvars(my_afun)
+
+build_table(lyt, dta_test)
## A B C
+## AVAL CHG AVAL CHG AVAL CHG
+## ———————————————————————————————————————————————————————————————————————————
+## V1
+## n 2 1 0
+## Mean, SD 7.5 (2.1) 3.0 (NA) NA
+## V2
+## n 2 2 1 1 0 0
+## Mean, SD 6.50 (2.12) 3.50 (2.12) 2.00 (NA) 8.00 (NA) NA NA
+## V3
+## n 2 2 1 1 0 0
+## Mean, SD 5.50 (2.12) 4.50 (2.12) 1.00 (NA) 9.00 (NA) NA NA
+We can further simulate the formal modeling of reference row(s) using
+the extra_args
machinery
+my_afun <- function(x, .var, ref_rowgroup, .spl_context) {
+ n <- sum(!is.na(x))
+ meanval <- mean(x, na.rm = TRUE)
+ sdval <- sd(x, na.rm = TRUE)
+
+ ## get the split value of the most recent parent
+ ## (row) split above this analyze
+ val <- .spl_context[nrow(.spl_context), "value"]
+ ## we show it if its not a CHG within V1
+ show_it <- val != ref_rowgroup || .var != "CHG"
+ fmt_mnsd <- if (show_it) "xx.x (xx.x)" else "xx"
+ in_rows(
+ n = if (show_it) n, ## NULL otherwise
+ "Mean, SD" = if (show_it) c(meanval, sdval), ## NULL otherwise
+ .formats = c(n = "xx", "Mean, SD" = fmt_mnsd)
+ )
+}
+
+lyt2 <- basic_table() %>%
+ split_cols_by("ARM") %>%
+ split_rows_by("AVISIT") %>%
+ split_cols_by_multivar(vars = c("AVAL", "CHG")) %>%
+ analyze_colvars(my_afun, extra_args = list(ref_rowgroup = "V1"))
+
+build_table(lyt2, dta_test)
## A B C
+## AVAL CHG AVAL CHG AVAL CHG
+## —————————————————————————————————————————————————————————————————————
+## V1
+## n 2 1 0
+## Mean, SD 7.5 (2.1) 3.0 (NA) NA
+## V2
+## n 2 2 1 1 0 0
+## Mean, SD 6.5 (2.1) 3.5 (2.1) 2.0 (NA) 8.0 (NA) NA NA
+## V3
+## n 2 2 1 1 0 0
+## Mean, SD 5.5 (2.1) 4.5 (2.1) 1.0 (NA) 9.0 (NA) NA NA
+vignettes/baseline.Rmd
+ baseline.Rmd
Often the data from one column is considered the +reference/baseline/comparison group and is compared to the data from the +other columns.
+For example, lets calculate the average age:
+
+library(rtables)
+
+lyt <- basic_table() %>%
+ split_cols_by("ARM") %>%
+ analyze("AGE")
+
+tbl <- build_table(lyt, DM)
+tbl
# A: Drug X B: Placebo C: Combination
+# ——————————————————————————————————————————————
+# Mean 34.91 33.02 34.57
+and then the difference of the average AGE
between the
+placebo arm and the other arms:
+lyt2 <- basic_table() %>%
+ split_cols_by("ARM", ref_group = "B: Placebo") %>%
+ analyze("AGE", afun = function(x, .ref_group) {
+ in_rows(
+ "Difference of Averages" = rcell(mean(x) - mean(.ref_group), format = "xx.xx")
+ )
+ })
+
+tbl2 <- build_table(lyt2, DM)
+tbl2
# A: Drug X B: Placebo C: Combination
+# ————————————————————————————————————————————————————————————————
+# Difference of Averages 1.89 0.00 1.55
+Note that the column order has changed and the reference group is +displayed in the first column.
+In cases where we want cells to be blank in the reference column,
+(e.g., “B: Placebo”) we use non_ref_rcell()
instead of
+rcell()
, and pass .in_ref_col
as the second
+argument:
+lyt3 <- basic_table() %>%
+ split_cols_by("ARM", ref_group = "B: Placebo") %>%
+ analyze(
+ "AGE",
+ afun = function(x, .ref_group, .in_ref_col) {
+ in_rows(
+ "Difference of Averages" = non_ref_rcell(mean(x) - mean(.ref_group), is_ref = .in_ref_col, format = "xx.xx")
+ )
+ }
+ )
+
+tbl3 <- build_table(lyt3, DM)
+tbl3
# A: Drug X B: Placebo C: Combination
+# ————————————————————————————————————————————————————————————————
+# Difference of Averages 1.89 1.55
+
+lyt4 <- basic_table() %>%
+ split_cols_by("ARM", ref_group = "B: Placebo") %>%
+ analyze(
+ "AGE",
+ afun = function(x, .ref_group, .in_ref_col) {
+ in_rows(
+ "Difference of Averages" = non_ref_rcell(mean(x) - mean(.ref_group), is_ref = .in_ref_col, format = "xx.xx"),
+ "another row" = non_ref_rcell("aaa", .in_ref_col)
+ )
+ }
+ )
+
+tbl4 <- build_table(lyt4, DM)
+tbl4
# A: Drug X B: Placebo C: Combination
+# ————————————————————————————————————————————————————————————————
+# Difference of Averages 1.89 1.55
+# another row aaa aaa
+You can see which arguments are available for afun
in
+the manual for analyze()
.
When adding row-splitting the reference data may be represented by +the column with or without row splitting. For example:
+
+lyt5 <- basic_table(show_colcounts = TRUE) %>%
+ split_cols_by("ARM", ref_group = "B: Placebo") %>%
+ split_rows_by("SEX", split_fun = drop_split_levels) %>%
+ analyze("AGE", afun = function(x, .ref_group, .ref_full, .in_ref_col) {
+ in_rows(
+ "is reference (.in_ref_col)" = rcell(.in_ref_col),
+ "ref cell N (.ref_group)" = rcell(length(.ref_group)),
+ "ref column N (.ref_full)" = rcell(length(.ref_full))
+ )
+ })
+
+tbl5 <- build_table(lyt5, subset(DM, SEX %in% c("M", "F")))
+tbl5
# A: Drug X B: Placebo C: Combination
+# (N=121) (N=106) (N=129)
+# ——————————————————————————————————————————————————————————————————————
+# F
+# is reference (.in_ref_col) FALSE TRUE FALSE
+# ref cell N (.ref_group) 56 56 56
+# ref column N (.ref_full) 106 106 106
+# M
+# is reference (.in_ref_col) FALSE TRUE FALSE
+# ref cell N (.ref_group) 50 50 50
+# ref column N (.ref_full) 106 106 106
+The data assigned to .ref_full
is the full data of the
+reference column whereas the data assigned to .ref_group
+respects the subsetting defined by row-splitting and hence is from the
+same subset as the argument x
or df
to
+afun
.
vignettes/clinical_trials.Rmd
+ clinical_trials.Rmd
In this vignette we create a
+using the rtables
layout facility. That is, we
+demonstrate how the layout based tabulation framework can specify the
+structure and relations that are commonly found when analyzing clinical
+trials data.
Note that all the data is created using random number generators. All
+ex_*
data which is currently attached to the
+rtables
package is provided by the formatters
+package and was created using the publicly available random.cdisc.data
+R package.
The packages used in this vignette are:
+ +Demographic tables summarize the variables content for different +population subsets (encoded in the columns).
+One feature of analyze()
that we have not introduced in
+the previous vignette is that the analysis function afun
+can specify multiple rows with the in_rows()
function:
+ADSL <- ex_adsl # Example ADSL dataset
+
+lyt <- basic_table() %>%
+ split_cols_by("ARM") %>%
+ analyze(vars = "AGE", afun = function(x) {
+ in_rows(
+ "Mean (sd)" = rcell(c(mean(x), sd(x)), format = "xx.xx (xx.xx)"),
+ "Range" = rcell(range(x), format = "xx.xx - xx.xx")
+ )
+ })
+
+tbl <- build_table(lyt, ADSL)
+tbl
# A: Drug X B: Placebo C: Combination
+# ——————————————————————————————————————————————————————————
+# Mean (sd) 33.77 (6.55) 35.43 (7.90) 35.43 (7.72)
+# Range 21.00 - 50.00 21.00 - 62.00 20.00 - 69.00
+Multiple variables can be analyzed in one analyze()
+call:
+lyt2 <- basic_table() %>%
+ split_cols_by("ARM") %>%
+ analyze(vars = c("AGE", "BMRKR1"), afun = function(x) {
+ in_rows(
+ "Mean (sd)" = rcell(c(mean(x), sd(x)), format = "xx.xx (xx.xx)"),
+ "Range" = rcell(range(x), format = "xx.xx - xx.xx")
+ )
+ })
+
+tbl2 <- build_table(lyt2, ADSL)
+tbl2
# A: Drug X B: Placebo C: Combination
+# ————————————————————————————————————————————————————————————
+# AGE
+# Mean (sd) 33.77 (6.55) 35.43 (7.90) 35.43 (7.72)
+# Range 21.00 - 50.00 21.00 - 62.00 20.00 - 69.00
+# BMRKR1
+# Mean (sd) 5.97 (3.55) 5.70 (3.31) 5.62 (3.49)
+# Range 0.41 - 17.67 0.65 - 14.24 0.17 - 21.39
+Hence, if afun
can process different data vector types
+(i.e. variables selected from the data) then we are fairly close to a
+standard demographic table. Here is a function that either creates a
+count table or some number summary if the argument x
is a
+factor or numeric, respectively:
+s_summary <- function(x) {
+ if (is.numeric(x)) {
+ in_rows(
+ "n" = rcell(sum(!is.na(x)), format = "xx"),
+ "Mean (sd)" = rcell(c(mean(x, na.rm = TRUE), sd(x, na.rm = TRUE)), format = "xx.xx (xx.xx)"),
+ "IQR" = rcell(IQR(x, na.rm = TRUE), format = "xx.xx"),
+ "min - max" = rcell(range(x, na.rm = TRUE), format = "xx.xx - xx.xx")
+ )
+ } else if (is.factor(x)) {
+ vs <- as.list(table(x))
+ do.call(in_rows, lapply(vs, rcell, format = "xx"))
+ } else {
+ stop("type not supported")
+ }
+}
Note we use rcell
to wrap the results in order to add
+formatting instructions for rtables
. We can use
+s_summary
outside the context of tabulation:
+s_summary(ADSL$AGE)
# RowsVerticalSection (in_rows) object print method:
+# ----------------------------
+# row_name formatted_cell indent_mod row_label
+# 1 n 400 0 n
+# 2 Mean (sd) 34.88 (7.44) 0 Mean (sd)
+# 3 IQR 10.00 0 IQR
+# 4 min - max 20.00 - 69.00 0 min - max
+and
+
+s_summary(ADSL$SEX)
# RowsVerticalSection (in_rows) object print method:
+# ----------------------------
+# row_name formatted_cell indent_mod row_label
+# 1 F 222 0 F
+# 2 M 166 0 M
+# 3 U 9 0 U
+# 4 UNDIFFERENTIATED 3 0 UNDIFFERENTIATED
+We can now create a commonly used variant of the demographic +table:
+
+summary_lyt <- basic_table() %>%
+ split_cols_by(var = "ARM") %>%
+ analyze(c("AGE", "SEX"), afun = s_summary)
+
+summary_tbl <- build_table(summary_lyt, ADSL)
+summary_tbl
# A: Drug X B: Placebo C: Combination
+# ———————————————————————————————————————————————————————————————————
+# AGE
+# n 134 134 132
+# Mean (sd) 33.77 (6.55) 35.43 (7.90) 35.43 (7.72)
+# IQR 11.00 10.00 10.00
+# min - max 21.00 - 50.00 21.00 - 62.00 20.00 - 69.00
+# SEX
+# F 79 77 66
+# M 51 55 60
+# U 3 2 4
+# UNDIFFERENTIATED 1 0 2
+Note that analyze()
can also be called multiple times in
+sequence:
+summary_lyt2 <- basic_table() %>%
+ split_cols_by(var = "ARM") %>%
+ analyze("AGE", s_summary) %>%
+ analyze("SEX", s_summary)
+
+summary_tbl2 <- build_table(summary_lyt2, ADSL)
+summary_tbl2
# A: Drug X B: Placebo C: Combination
+# ———————————————————————————————————————————————————————————————————
+# AGE
+# n 134 134 132
+# Mean (sd) 33.77 (6.55) 35.43 (7.90) 35.43 (7.72)
+# IQR 11.00 10.00 10.00
+# min - max 21.00 - 50.00 21.00 - 62.00 20.00 - 69.00
+# SEX
+# F 79 77 66
+# M 51 55 60
+# U 3 2 4
+# UNDIFFERENTIATED 1 0 2
+which leads to the table identical to summary_tbl
:
+identical(summary_tbl, summary_tbl2)
# [1] TRUE
+In clinical trials analyses the number of patients per column is
+often referred to as N
(rather than the overall population
+which outside of clinical trials is commonly referred to as
+N
). Column N
s are added by setting the
+show_colcounts
argument in basic_table()
to
+TRUE
:
+summary_lyt3 <- basic_table(show_colcounts = TRUE) %>%
+ split_cols_by(var = "ARMCD") %>%
+ analyze(c("AGE", "SEX"), s_summary)
+
+summary_tbl3 <- build_table(summary_lyt3, ADSL)
+summary_tbl3
# ARM A ARM B ARM C
+# (N=134) (N=134) (N=132)
+# ——————————————————————————————————————————————————————————————————
+# AGE
+# n 134 134 132
+# Mean (sd) 33.77 (6.55) 35.43 (7.90) 35.43 (7.72)
+# IQR 11.00 10.00 10.00
+# min - max 21.00 - 50.00 21.00 - 62.00 20.00 - 69.00
+# SEX
+# F 79 77 66
+# M 51 55 60
+# U 3 2 4
+# UNDIFFERENTIATED 1 0 2
+We will now show a couple of variations of the demographic table that
+we developed above. These variations are in structure and not in
+analysis, hence they don’t require a modification to the
+s_summary
function.
We will start with a standard table analyzing the variables
+AGE
and BMRKR2
variables:
+lyt <- basic_table(show_colcounts = TRUE) %>%
+ split_cols_by(var = "ARM") %>%
+ analyze(c("AGE", "BMRKR2"), s_summary)
+
+tbl <- build_table(lyt, ADSL)
+tbl
# A: Drug X B: Placebo C: Combination
+# (N=134) (N=134) (N=132)
+# ————————————————————————————————————————————————————————————
+# AGE
+# n 134 134 132
+# Mean (sd) 33.77 (6.55) 35.43 (7.90) 35.43 (7.72)
+# IQR 11.00 10.00 10.00
+# min - max 21.00 - 50.00 21.00 - 62.00 20.00 - 69.00
+# BMRKR2
+# LOW 50 45 40
+# MEDIUM 37 56 42
+# HIGH 47 33 50
+Assume we would like to have this analysis carried out per gender +encoded in the row space:
+
+lyt <- basic_table(show_colcounts = TRUE) %>%
+ split_cols_by(var = "ARM") %>%
+ split_rows_by("SEX") %>%
+ analyze(c("AGE", "BMRKR2"), s_summary)
+
+tbl <- build_table(lyt, ADSL)
+tbl
# A: Drug X B: Placebo C: Combination
+# (N=134) (N=134) (N=132)
+# —————————————————————————————————————————————————————————————————
+# F
+# AGE
+# n 79 77 66
+# Mean (sd) 32.76 (6.09) 34.12 (7.06) 35.20 (7.43)
+# IQR 9.00 8.00 6.75
+# min - max 21.00 - 47.00 23.00 - 58.00 21.00 - 64.00
+# BMRKR2
+# LOW 26 21 26
+# MEDIUM 21 38 17
+# HIGH 32 18 23
+# M
+# AGE
+# n 51 55 60
+# Mean (sd) 35.57 (7.08) 37.44 (8.69) 35.38 (8.24)
+# IQR 11.00 9.00 11.00
+# min - max 23.00 - 50.00 21.00 - 62.00 20.00 - 69.00
+# BMRKR2
+# LOW 21 23 11
+# MEDIUM 15 18 23
+# HIGH 15 14 26
+# U
+# AGE
+# n 3 2 4
+# Mean (sd) 31.67 (3.21) 31.00 (5.66) 35.25 (3.10)
+# IQR 3.00 4.00 3.25
+# min - max 28.00 - 34.00 27.00 - 35.00 31.00 - 38.00
+# BMRKR2
+# LOW 2 1 1
+# MEDIUM 1 0 2
+# HIGH 0 1 1
+# UNDIFFERENTIATED
+# AGE
+# n 1 0 2
+# Mean (sd) 28.00 (NA) NA 45.00 (1.41)
+# IQR 0.00 NA 1.00
+# min - max 28.00 - 28.00 Inf - -Inf 44.00 - 46.00
+# BMRKR2
+# LOW 1 0 2
+# MEDIUM 0 0 0
+# HIGH 0 0 0
+We will now subset ADSL
to include only males and
+females in the analysis in order to reduce the number of rows in the
+table:
+ADSL_M_F <- filter(ADSL, SEX %in% c("M", "F"))
+
+lyt2 <- basic_table(show_colcounts = TRUE) %>%
+ split_cols_by(var = "ARM") %>%
+ split_rows_by("SEX") %>%
+ analyze(c("AGE", "BMRKR2"), s_summary)
+
+tbl2 <- build_table(lyt2, ADSL_M_F)
+tbl2
# A: Drug X B: Placebo C: Combination
+# (N=130) (N=132) (N=126)
+# —————————————————————————————————————————————————————————————————
+# F
+# AGE
+# n 79 77 66
+# Mean (sd) 32.76 (6.09) 34.12 (7.06) 35.20 (7.43)
+# IQR 9.00 8.00 6.75
+# min - max 21.00 - 47.00 23.00 - 58.00 21.00 - 64.00
+# BMRKR2
+# LOW 26 21 26
+# MEDIUM 21 38 17
+# HIGH 32 18 23
+# M
+# AGE
+# n 51 55 60
+# Mean (sd) 35.57 (7.08) 37.44 (8.69) 35.38 (8.24)
+# IQR 11.00 9.00 11.00
+# min - max 23.00 - 50.00 21.00 - 62.00 20.00 - 69.00
+# BMRKR2
+# LOW 21 23 11
+# MEDIUM 15 18 23
+# HIGH 15 14 26
+# U
+# AGE
+# n 0 0 0
+# Mean (sd) NA NA NA
+# IQR NA NA NA
+# min - max Inf - -Inf Inf - -Inf Inf - -Inf
+# BMRKR2
+# LOW 0 0 0
+# MEDIUM 0 0 0
+# HIGH 0 0 0
+# UNDIFFERENTIATED
+# AGE
+# n 0 0 0
+# Mean (sd) NA NA NA
+# IQR NA NA NA
+# min - max Inf - -Inf Inf - -Inf Inf - -Inf
+# BMRKR2
+# LOW 0 0 0
+# MEDIUM 0 0 0
+# HIGH 0 0 0
+Note that the UNDIFFERENTIATED
and U
levels
+still show up in the table. This is because tabulation respects the
+factor levels and level order, exactly as the split
and
+table
function do. If empty levels should be dropped then
+rtables
needs to know that at splitting time via the
+split_fun
argument in split_rows_by()
. There
+are a number of predefined functions. For this example
+drop_split_levels()
is required to drop the empty levels at
+splitting time. Splitting is a big topic and will be eventually
+addressed in a specific package vignette.
+lyt3 <- basic_table(show_colcounts = TRUE) %>%
+ split_cols_by(var = "ARM") %>%
+ split_rows_by("SEX", split_fun = drop_split_levels, child_labels = "visible") %>%
+ analyze(c("AGE", "BMRKR2"), s_summary)
+
+tbl3 <- build_table(lyt3, ADSL_M_F)
+tbl3
# A: Drug X B: Placebo C: Combination
+# (N=130) (N=132) (N=126)
+# ——————————————————————————————————————————————————————————————
+# F
+# AGE
+# n 79 77 66
+# Mean (sd) 32.76 (6.09) 34.12 (7.06) 35.20 (7.43)
+# IQR 9.00 8.00 6.75
+# min - max 21.00 - 47.00 23.00 - 58.00 21.00 - 64.00
+# BMRKR2
+# LOW 26 21 26
+# MEDIUM 21 38 17
+# HIGH 32 18 23
+# M
+# AGE
+# n 51 55 60
+# Mean (sd) 35.57 (7.08) 37.44 (8.69) 35.38 (8.24)
+# IQR 11.00 9.00 11.00
+# min - max 23.00 - 50.00 21.00 - 62.00 20.00 - 69.00
+# BMRKR2
+# LOW 21 23 11
+# MEDIUM 15 18 23
+# HIGH 15 14 26
+In the table above the labels M
and F
are
+not very descriptive. You can add the full labels as follows:
+ADSL_M_F_l <- ADSL_M_F %>%
+ mutate(lbl_sex = case_when(
+ SEX == "M" ~ "Male",
+ SEX == "F" ~ "Female",
+ SEX == "U" ~ "Unknown",
+ SEX == "UNDIFFERENTIATED" ~ "Undifferentiated"
+ ))
+
+lyt4 <- basic_table(show_colcounts = TRUE) %>%
+ split_cols_by(var = "ARM") %>%
+ split_rows_by("SEX", labels_var = "lbl_sex", split_fun = drop_split_levels, child_labels = "visible") %>%
+ analyze(c("AGE", "BMRKR2"), s_summary)
+
+tbl4 <- build_table(lyt4, ADSL_M_F_l)
+tbl4
# A: Drug X B: Placebo C: Combination
+# (N=130) (N=132) (N=126)
+# ——————————————————————————————————————————————————————————————
+# Female
+# AGE
+# n 79 77 66
+# Mean (sd) 32.76 (6.09) 34.12 (7.06) 35.20 (7.43)
+# IQR 9.00 8.00 6.75
+# min - max 21.00 - 47.00 23.00 - 58.00 21.00 - 64.00
+# BMRKR2
+# LOW 26 21 26
+# MEDIUM 21 38 17
+# HIGH 32 18 23
+# Male
+# AGE
+# n 51 55 60
+# Mean (sd) 35.57 (7.08) 37.44 (8.69) 35.38 (8.24)
+# IQR 11.00 9.00 11.00
+# min - max 23.00 - 50.00 21.00 - 62.00 20.00 - 69.00
+# BMRKR2
+# LOW 21 23 11
+# MEDIUM 15 18 23
+# HIGH 15 14 26
+For the next table variation we only stratify by gender for the
+AGE
analysis. To do this the nested
argument
+has to be set to FALSE
in analyze()
call:
+lyt5 <- basic_table(show_colcounts = TRUE) %>%
+ split_cols_by(var = "ARM") %>%
+ split_rows_by("SEX", labels_var = "lbl_sex", split_fun = drop_split_levels, child_labels = "visible") %>%
+ analyze("AGE", s_summary, show_labels = "visible") %>%
+ analyze("BMRKR2", s_summary, nested = FALSE, show_labels = "visible")
+
+tbl5 <- build_table(lyt5, ADSL_M_F_l)
+tbl5
# A: Drug X B: Placebo C: Combination
+# (N=130) (N=132) (N=126)
+# ——————————————————————————————————————————————————————————————
+# Female
+# AGE
+# n 79 77 66
+# Mean (sd) 32.76 (6.09) 34.12 (7.06) 35.20 (7.43)
+# IQR 9.00 8.00 6.75
+# min - max 21.00 - 47.00 23.00 - 58.00 21.00 - 64.00
+# Male
+# AGE
+# n 51 55 60
+# Mean (sd) 35.57 (7.08) 37.44 (8.69) 35.38 (8.24)
+# IQR 11.00 9.00 11.00
+# min - max 23.00 - 50.00 21.00 - 62.00 20.00 - 69.00
+# BMRKR2
+# LOW 47 44 37
+# MEDIUM 36 56 40
+# HIGH 47 32 49
+Once we split the rows into groups (Male
and
+Female
here) one might want to summarize groups: usually by
+showing count and column percentages. This is especially important if we
+have missing data. For example, if we create the above table but add
+missing data to the AGE
variable:
+insert_NAs <- function(x) {
+ x[sample(c(TRUE, FALSE), length(x), TRUE, prob = c(0.2, 0.8))] <- NA
+ x
+}
+
+set.seed(1)
+ADSL_NA <- ADSL_M_F_l %>%
+ mutate(AGE = insert_NAs(AGE))
+
+lyt6 <- basic_table(show_colcounts = TRUE) %>%
+ split_cols_by(var = "ARM") %>%
+ split_rows_by(
+ "SEX",
+ labels_var = "lbl_sex",
+ split_fun = drop_split_levels,
+ child_labels = "visible"
+ ) %>%
+ analyze("AGE", s_summary) %>%
+ analyze("BMRKR2", s_summary, nested = FALSE, show_labels = "visible")
+
+tbl6 <- build_table(lyt6, filter(ADSL_NA, SEX %in% c("M", "F")))
+tbl6
# A: Drug X B: Placebo C: Combination
+# (N=130) (N=132) (N=126)
+# ————————————————————————————————————————————————————————————
+# Female
+# n 65 61 54
+# Mean (sd) 32.71 (6.07) 34.33 (7.31) 34.61 (6.78)
+# IQR 9.00 10.00 6.75
+# min - max 21.00 - 47.00 23.00 - 58.00 21.00 - 54.00
+# Male
+# n 44 44 50
+# Mean (sd) 35.66 (6.78) 36.93 (8.18) 35.64 (8.42)
+# IQR 10.50 8.25 10.75
+# min - max 24.00 - 48.00 21.00 - 58.00 20.00 - 69.00
+# BMRKR2
+# LOW 47 44 37
+# MEDIUM 36 56 40
+# HIGH 47 32 49
+Here it is not easy to see how many females and males there are in
+each arm as n
represents the number of non-missing data
+elements in the variables. Groups within rows that are defined by
+splitting can be summarized with summarize_row_groups()
,
+for example:
+lyt7 <- basic_table(show_colcounts = TRUE) %>%
+ split_cols_by(var = "ARM") %>%
+ split_rows_by("SEX", labels_var = "lbl_sex", split_fun = drop_split_levels) %>%
+ summarize_row_groups() %>%
+ analyze("AGE", s_summary) %>%
+ analyze("BMRKR2", afun = s_summary, nested = FALSE, show_labels = "visible")
+
+tbl7 <- build_table(lyt7, filter(ADSL_NA, SEX %in% c("M", "F")))
+tbl7
# A: Drug X B: Placebo C: Combination
+# (N=130) (N=132) (N=126)
+# ————————————————————————————————————————————————————————————
+# Female 79 (60.8%) 77 (58.3%) 66 (52.4%)
+# n 65 61 54
+# Mean (sd) 32.71 (6.07) 34.33 (7.31) 34.61 (6.78)
+# IQR 9.00 10.00 6.75
+# min - max 21.00 - 47.00 23.00 - 58.00 21.00 - 54.00
+# Male 51 (39.2%) 55 (41.7%) 60 (47.6%)
+# n 44 44 50
+# Mean (sd) 35.66 (6.78) 36.93 (8.18) 35.64 (8.42)
+# IQR 10.50 8.25 10.75
+# min - max 24.00 - 48.00 21.00 - 58.00 20.00 - 69.00
+# BMRKR2
+# LOW 47 44 37
+# MEDIUM 36 56 40
+# HIGH 47 32 49
+There are a couple of things to note here:
+summarize_row_groups()
).We can recreate this default behavior (count percentage) by defining
+a cfun
for illustrative purposes here as it results in the
+same table as above:
+lyt8 <- basic_table(show_colcounts = TRUE) %>%
+ split_cols_by(var = "ARM") %>%
+ split_rows_by("SEX", labels_var = "lbl_sex", split_fun = drop_split_levels) %>%
+ summarize_row_groups(cfun = function(df, labelstr, .N_col, ...) {
+ in_rows(
+ rcell(nrow(df) * c(1, 1 / .N_col), format = "xx (xx.xx%)"),
+ .labels = labelstr
+ )
+ }) %>%
+ analyze("AGE", s_summary) %>%
+ analyze("BEP01FL", afun = s_summary, nested = FALSE, show_labels = "visible")
+
+tbl8 <- build_table(lyt8, filter(ADSL_NA, SEX %in% c("M", "F")))
+tbl8
# A: Drug X B: Placebo C: Combination
+# (N=130) (N=132) (N=126)
+# ————————————————————————————————————————————————————————————
+# Female 79 (60.77%) 77 (58.33%) 66 (52.38%)
+# n 65 61 54
+# Mean (sd) 32.71 (6.07) 34.33 (7.31) 34.61 (6.78)
+# IQR 9.00 10.00 6.75
+# min - max 21.00 - 47.00 23.00 - 58.00 21.00 - 54.00
+# Male 51 (39.23%) 55 (41.67%) 60 (47.62%)
+# n 44 44 50
+# Mean (sd) 35.66 (6.78) 36.93 (8.18) 35.64 (8.42)
+# IQR 10.50 8.25 10.75
+# min - max 24.00 - 48.00 21.00 - 58.00 20.00 - 69.00
+# BEP01FL
+# Y 67 63 65
+# N 63 69 61
+Note that cfun
, like afun
(which is used in
+analyze()
), can operate on either variables, passed via the
+x
argument, or data.frame
s or
+tibble
s, which are passed via the df
argument
+(afun
can optionally request df
too). Unlike
+afun
, cfun
must accept labelstr
+as the second argument which gives the default group label (factor level
+from splitting) and hence it could be modified:
+lyt9 <- basic_table() %>%
+ split_cols_by(var = "ARM") %>%
+ split_rows_by("SEX", labels_var = "lbl_sex", split_fun = drop_split_levels, child_labels = "hidden") %>%
+ summarize_row_groups(cfun = function(df, labelstr, .N_col, ...) {
+ in_rows(
+ rcell(nrow(df) * c(1, 1 / .N_col), format = "xx (xx.xx%)"),
+ .labels = paste0(labelstr, ": count (perc.)")
+ )
+ }) %>%
+ analyze("AGE", s_summary) %>%
+ analyze("BEP01FL", s_summary, nested = FALSE, show_labels = "visible")
+
+tbl9 <- build_table(lyt9, filter(ADSL_NA, SEX %in% c("M", "F")))
+tbl9
# A: Drug X B: Placebo C: Combination
+# ——————————————————————————————————————————————————————————————————————
+# Female: count (perc.) 79 (60.77%) 77 (58.33%) 66 (52.38%)
+# n 65 61 54
+# Mean (sd) 32.71 (6.07) 34.33 (7.31) 34.61 (6.78)
+# IQR 9.00 10.00 6.75
+# min - max 21.00 - 47.00 23.00 - 58.00 21.00 - 54.00
+# Male: count (perc.) 51 (39.23%) 55 (41.67%) 60 (47.62%)
+# n 44 44 50
+# Mean (sd) 35.66 (6.78) 36.93 (8.18) 35.64 (8.42)
+# IQR 10.50 8.25 10.75
+# min - max 24.00 - 48.00 21.00 - 58.00 20.00 - 69.00
+# BEP01FL
+# Y 67 63 65
+# N 63 69 61
+Layouts have a couple of advantages over tabulating the tables +directly:
+Here is an example that demonstrates the reusability of layouts:
+
+adsl_lyt <- basic_table(show_colcounts = TRUE) %>%
+ split_cols_by("ARM") %>%
+ analyze(c("AGE", "SEX"), afun = s_summary)
+
+adsl_lyt
# A Pre-data Table Layout
+#
+# Column-Split Structure:
+# ARM (lvls)
+#
+# Row-Split Structure:
+# AGE:SEX (** multivar analysis **)
+We can now build a table for ADSL
+adsl_tbl <- build_table(adsl_lyt, ADSL)
+adsl_tbl
# A: Drug X B: Placebo C: Combination
+# (N=134) (N=134) (N=132)
+# ———————————————————————————————————————————————————————————————————
+# AGE
+# n 134 134 132
+# Mean (sd) 33.77 (6.55) 35.43 (7.90) 35.43 (7.72)
+# IQR 11.00 10.00 10.00
+# min - max 21.00 - 50.00 21.00 - 62.00 20.00 - 69.00
+# SEX
+# F 79 77 66
+# M 51 55 60
+# U 3 2 4
+# UNDIFFERENTIATED 1 0 2
+or for all patients that are older than 18:
+
+adsl_f_tbl <- build_table(lyt, ADSL %>% filter(AGE > 18))
# Warning in min(x): no non-missing arguments to min; returning Inf
+# Warning in max(x): no non-missing arguments to max; returning -Inf
+
+adsl_f_tbl
# A: Drug X B: Placebo C: Combination
+# (N=134) (N=134) (N=132)
+# —————————————————————————————————————————————————————————————————
+# F
+# AGE
+# n 79 77 66
+# Mean (sd) 32.76 (6.09) 34.12 (7.06) 35.20 (7.43)
+# IQR 9.00 8.00 6.75
+# min - max 21.00 - 47.00 23.00 - 58.00 21.00 - 64.00
+# BMRKR2
+# LOW 26 21 26
+# MEDIUM 21 38 17
+# HIGH 32 18 23
+# M
+# AGE
+# n 51 55 60
+# Mean (sd) 35.57 (7.08) 37.44 (8.69) 35.38 (8.24)
+# IQR 11.00 9.00 11.00
+# min - max 23.00 - 50.00 21.00 - 62.00 20.00 - 69.00
+# BMRKR2
+# LOW 21 23 11
+# MEDIUM 15 18 23
+# HIGH 15 14 26
+# U
+# AGE
+# n 3 2 4
+# Mean (sd) 31.67 (3.21) 31.00 (5.66) 35.25 (3.10)
+# IQR 3.00 4.00 3.25
+# min - max 28.00 - 34.00 27.00 - 35.00 31.00 - 38.00
+# BMRKR2
+# LOW 2 1 1
+# MEDIUM 1 0 2
+# HIGH 0 1 1
+# UNDIFFERENTIATED
+# AGE
+# n 1 0 2
+# Mean (sd) 28.00 (NA) NA 45.00 (1.41)
+# IQR 0.00 NA 1.00
+# min - max 28.00 - 28.00 Inf - -Inf 44.00 - 46.00
+# BMRKR2
+# LOW 1 0 2
+# MEDIUM 0 0 0
+# HIGH 0 0 0
+There are a number of different adverse event tables. We will now +present two tables that show adverse events by ID and then by grade and +by ID.
+This time we won’t use the ADAE
dataset from random.cdisc.data
+but rather generate a dataset on the fly (see Adrian’s
+2016 Phuse paper):
+set.seed(1)
+
+lookup <- tribble(
+ ~AEDECOD, ~AEBODSYS, ~AETOXGR,
+ "HEADACHE", "NERVOUS SYSTEM DISORDERS", "5",
+ "BACK PAIN", "MUSCULOSKELETAL AND CONNECTIVE TISSUE DISORDERS", "2",
+ "GINGIVAL BLEEDING", "GASTROINTESTINAL DISORDERS", "1",
+ "HYPOTENSION", "VASCULAR DISORDERS", "3",
+ "FAECES SOFT", "GASTROINTESTINAL DISORDERS", "2",
+ "ABDOMINAL DISCOMFORT", "GASTROINTESTINAL DISORDERS", "1",
+ "DIARRHEA", "GASTROINTESTINAL DISORDERS", "1",
+ "ABDOMINAL FULLNESS DUE TO GAS", "GASTROINTESTINAL DISORDERS", "1",
+ "NAUSEA (INTERMITTENT)", "GASTROINTESTINAL DISORDERS", "2",
+ "WEAKNESS", "MUSCULOSKELETAL AND CONNECTIVE TISSUE DISORDERS", "3",
+ "ORTHOSTATIC HYPOTENSION", "VASCULAR DISORDERS", "4"
+)
+
+normalize <- function(x) x / sum(x)
+weightsA <- normalize(c(0.1, dlnorm(seq(0, 5, length.out = 25), meanlog = 3)))
+weightsB <- normalize(c(0.2, dlnorm(seq(0, 5, length.out = 25))))
+
+N_pop <- 300
+ADSL2 <- data.frame(
+ USUBJID = seq(1, N_pop, by = 1),
+ ARM = sample(c("ARM A", "ARM B"), N_pop, TRUE),
+ SEX = sample(c("F", "M"), N_pop, TRUE),
+ AGE = 20 + rbinom(N_pop, size = 40, prob = 0.7)
+)
+
+l.adae <- mapply(
+ ADSL2$USUBJID,
+ ADSL2$ARM,
+ ADSL2$SEX,
+ ADSL2$AGE,
+ FUN = function(id, arm, sex, age) {
+ n_ae <- sample(0:25, 1, prob = if (arm == "ARM A") weightsA else weightsB)
+ i <- sample(seq_len(nrow(lookup)), size = n_ae, replace = TRUE, prob = c(6, rep(1, 10)) / 16)
+ lookup[i, ] %>%
+ mutate(
+ AESEQ = seq_len(n()),
+ USUBJID = id, ARM = arm, SEX = sex, AGE = age
+ )
+ },
+ SIMPLIFY = FALSE
+)
+
+ADAE2 <- do.call(rbind, l.adae)
+ADAE2 <- ADAE2 %>%
+ mutate(
+ ARM = factor(ARM, levels = c("ARM A", "ARM B")),
+ AEDECOD = as.factor(AEDECOD),
+ AEBODSYS = as.factor(AEBODSYS),
+ AETOXGR = factor(AETOXGR, levels = as.character(1:5))
+ ) %>%
+ select(USUBJID, ARM, AGE, SEX, AESEQ, AEDECOD, AEBODSYS, AETOXGR)
+
+ADAE2
# # A tibble: 3,118 × 8
+# USUBJID ARM AGE SEX AESEQ AEDECOD AEBODSYS AETOXGR
+# <dbl> <fct> <dbl> <chr> <int> <fct> <fct> <fct>
+# 1 1 ARM A 45 F 1 NAUSEA (INTERMITTENT) GASTROINTESTIN… 2
+# 2 1 ARM A 45 F 2 HEADACHE NERVOUS SYSTEM… 5
+# 3 1 ARM A 45 F 3 HEADACHE NERVOUS SYSTEM… 5
+# 4 1 ARM A 45 F 4 HEADACHE NERVOUS SYSTEM… 5
+# 5 1 ARM A 45 F 5 HEADACHE NERVOUS SYSTEM… 5
+# 6 1 ARM A 45 F 6 HEADACHE NERVOUS SYSTEM… 5
+# 7 1 ARM A 45 F 7 HEADACHE NERVOUS SYSTEM… 5
+# 8 1 ARM A 45 F 8 HEADACHE NERVOUS SYSTEM… 5
+# 9 1 ARM A 45 F 9 HEADACHE NERVOUS SYSTEM… 5
+# 10 1 ARM A 45 F 10 FAECES SOFT GASTROINTESTIN… 2
+# # ℹ 3,108 more rows
+We start by defining an events summary function:
+
+s_events_patients <- function(x, labelstr, .N_col) {
+ in_rows(
+ "Total number of patients with at least one event" =
+ rcell(length(unique(x)) * c(1, 1 / .N_col), format = "xx (xx.xx%)"),
+ "Total number of events" = rcell(length(x), format = "xx")
+ )
+}
So, for a population of 5
patients where
AE
sAE
+AE
swe would get the following summary:
+
+s_events_patients(x = c("id 1", "id 1", "id 2"), .N_col = 5)
# RowsVerticalSection (in_rows) object print method:
+# ----------------------------
+# row_name formatted_cell indent_mod
+# 1 Total number of patients with at least one event 2 (40.00%) 0
+# 2 Total number of events 3 0
+# row_label
+# 1 Total number of patients with at least one event
+# 2 Total number of events
+The .N_col
argument is a special keyword argument by
+which build_table()
passes the population size for each
+respective column. For a list of keyword arguments for the functions
+passed to afun
in analyze()
, refer to the
+documentation with ?analyze
.
We now use the s_events_patients
summary function in a
+tabulation:
+adae_lyt <- basic_table(show_colcounts = TRUE) %>%
+ split_cols_by("ARM") %>%
+ analyze("USUBJID", s_events_patients)
+
+adae_tbl <- build_table(adae_lyt, ADAE2)
+adae_tbl
# ARM A ARM B
+# (N=2060) (N=1058)
+# —————————————————————————————————————————————————————————————————————————————
+# Total number of patients with at least one event 114 (5.53%) 150 (14.18%)
+# Total number of events 2060 1058
+Note that the column N
s are wrong as by default they are
+set to the number of rows per group (i.e. number of AE
s per
+arm here). This also affects the percentages. For this table we are
+interested in the number of patients per column/arm which is usually
+taken from ADSL
(var ADSL2
here).
rtables
handles this by allowing us to override how the
+column counts are computed. We can specify an alt_counts_df
+in build_table()
. When we do this, rtables
+calculates the column counts by applying the same column faceting to
+alt_counts_df
as it does to the primary data during
+tabulation:
+adae_adsl_tbl <- build_table(adae_lyt, ADAE2, alt_counts_df = ADSL2)
+adae_adsl_tbl
# ARM A ARM B
+# (N=146) (N=154)
+# ——————————————————————————————————————————————————————————————————————————————
+# Total number of patients with at least one event 114 (78.08%) 150 (97.40%)
+# Total number of events 2060 1058
+Alternatively, if the desired column counts are already calculated,
+they can be specified directly via the col_counts
argument
+to build_table()
, though specifying an
+alt_counts_df
is the preferred mechanism.
We next calculate this information per system organ class:
+
+adae_soc_lyt <- basic_table(show_colcounts = TRUE) %>%
+ split_cols_by("ARM") %>%
+ analyze("USUBJID", s_events_patients) %>%
+ split_rows_by("AEBODSYS", child_labels = "visible", nested = FALSE) %>%
+ summarize_row_groups("USUBJID", cfun = s_events_patients)
+
+adae_soc_tbl <- build_table(adae_soc_lyt, ADAE2, alt_counts_df = ADSL2)
+adae_soc_tbl
# ARM A ARM B
+# (N=146) (N=154)
+# ————————————————————————————————————————————————————————————————————————————————
+# Total number of patients with at least one event 114 (78.08%) 150 (97.40%)
+# Total number of events 2060 1058
+# GASTROINTESTINAL DISORDERS
+# Total number of patients with at least one event 114 (78.08%) 130 (84.42%)
+# Total number of events 760 374
+# MUSCULOSKELETAL AND CONNECTIVE TISSUE DISORDERS
+# Total number of patients with at least one event 98 (67.12%) 81 (52.60%)
+# Total number of events 273 142
+# NERVOUS SYSTEM DISORDERS
+# Total number of patients with at least one event 113 (77.40%) 133 (86.36%)
+# Total number of events 787 420
+# VASCULAR DISORDERS
+# Total number of patients with at least one event 93 (63.70%) 75 (48.70%)
+# Total number of events 240 122
+We now have to add a count table of AEDECOD
for each
+AEBODSYS
. The default analyze()
behavior for a
+factor is to create the count table per level (using
+rtab_inner
):
+adae_soc_lyt2 <- basic_table(show_colcounts = TRUE) %>%
+ split_cols_by("ARM") %>%
+ split_rows_by("AEBODSYS", child_labels = "visible", indent_mod = 1) %>%
+ summarize_row_groups("USUBJID", cfun = s_events_patients) %>%
+ analyze("AEDECOD", indent_mod = -1)
+
+adae_soc_tbl2 <- build_table(adae_soc_lyt2, ADAE2, alt_counts_df = ADSL2)
+adae_soc_tbl2
# ARM A ARM B
+# (N=146) (N=154)
+# ——————————————————————————————————————————————————————————————————————————————————
+# GASTROINTESTINAL DISORDERS
+# Total number of patients with at least one event 114 (78.08%) 130 (84.42%)
+# Total number of events 760 374
+# ABDOMINAL DISCOMFORT 113 65
+# ABDOMINAL FULLNESS DUE TO GAS 119 65
+# BACK PAIN 0 0
+# DIARRHEA 107 53
+# FAECES SOFT 122 58
+# GINGIVAL BLEEDING 147 71
+# HEADACHE 0 0
+# HYPOTENSION 0 0
+# NAUSEA (INTERMITTENT) 152 62
+# ORTHOSTATIC HYPOTENSION 0 0
+# WEAKNESS 0 0
+# MUSCULOSKELETAL AND CONNECTIVE TISSUE DISORDERS
+# Total number of patients with at least one event 98 (67.12%) 81 (52.60%)
+# Total number of events 273 142
+# ABDOMINAL DISCOMFORT 0 0
+# ABDOMINAL FULLNESS DUE TO GAS 0 0
+# BACK PAIN 135 75
+# DIARRHEA 0 0
+# FAECES SOFT 0 0
+# GINGIVAL BLEEDING 0 0
+# HEADACHE 0 0
+# HYPOTENSION 0 0
+# NAUSEA (INTERMITTENT) 0 0
+# ORTHOSTATIC HYPOTENSION 0 0
+# WEAKNESS 138 67
+# NERVOUS SYSTEM DISORDERS
+# Total number of patients with at least one event 113 (77.40%) 133 (86.36%)
+# Total number of events 787 420
+# ABDOMINAL DISCOMFORT 0 0
+# ABDOMINAL FULLNESS DUE TO GAS 0 0
+# BACK PAIN 0 0
+# DIARRHEA 0 0
+# FAECES SOFT 0 0
+# GINGIVAL BLEEDING 0 0
+# HEADACHE 787 420
+# HYPOTENSION 0 0
+# NAUSEA (INTERMITTENT) 0 0
+# ORTHOSTATIC HYPOTENSION 0 0
+# WEAKNESS 0 0
+# VASCULAR DISORDERS
+# Total number of patients with at least one event 93 (63.70%) 75 (48.70%)
+# Total number of events 240 122
+# ABDOMINAL DISCOMFORT 0 0
+# ABDOMINAL FULLNESS DUE TO GAS 0 0
+# BACK PAIN 0 0
+# DIARRHEA 0 0
+# FAECES SOFT 0 0
+# GINGIVAL BLEEDING 0 0
+# HEADACHE 0 0
+# HYPOTENSION 104 58
+# NAUSEA (INTERMITTENT) 0 0
+# ORTHOSTATIC HYPOTENSION 136 64
+# WEAKNESS 0 0
+The indent_mod
argument enables relative indenting
+changes if the tree structure of the table does not result in the
+desired indentation by default.
This table so far is however not the usual adverse event table as it +counts the total number of events and not the number of subjects for one +or more events for a particular term. To get the correct table we need +to write a custom analysis function:
+
+table_count_once_per_id <- function(df, termvar = "AEDECOD", idvar = "USUBJID") {
+ x <- df[[termvar]]
+ id <- df[[idvar]]
+
+ counts <- table(x[!duplicated(id)])
+
+ in_rows(
+ .list = as.vector(counts),
+ .labels = names(counts)
+ )
+}
+
+table_count_once_per_id(ADAE2)
# RowsVerticalSection (in_rows) object print method:
+# ----------------------------
+# row_name formatted_cell indent_mod
+# 1 ABDOMINAL DISCOMFORT 23 0
+# 2 ABDOMINAL FULLNESS DUE TO GAS 21 0
+# 3 BACK PAIN 20 0
+# 4 DIARRHEA 7 0
+# 5 FAECES SOFT 11 0
+# 6 GINGIVAL BLEEDING 15 0
+# 7 HEADACHE 100 0
+# 8 HYPOTENSION 16 0
+# 9 NAUSEA (INTERMITTENT) 21 0
+# 10 ORTHOSTATIC HYPOTENSION 14 0
+# 11 WEAKNESS 16 0
+# row_label
+# 1 ABDOMINAL DISCOMFORT
+# 2 ABDOMINAL FULLNESS DUE TO GAS
+# 3 BACK PAIN
+# 4 DIARRHEA
+# 5 FAECES SOFT
+# 6 GINGIVAL BLEEDING
+# 7 HEADACHE
+# 8 HYPOTENSION
+# 9 NAUSEA (INTERMITTENT)
+# 10 ORTHOSTATIC HYPOTENSION
+# 11 WEAKNESS
+So the desired AE
table is:
+adae_soc_lyt3 <- basic_table(show_colcounts = TRUE) %>%
+ split_cols_by("ARM") %>%
+ split_rows_by("AEBODSYS", child_labels = "visible", indent_mod = 1) %>%
+ summarize_row_groups("USUBJID", cfun = s_events_patients) %>%
+ analyze("AEDECOD", afun = table_count_once_per_id, show_labels = "hidden", indent_mod = -1)
+
+adae_soc_tbl3 <- build_table(adae_soc_lyt3, ADAE2, alt_counts_df = ADSL2)
+adae_soc_tbl3
# ARM A ARM B
+# (N=146) (N=154)
+# ——————————————————————————————————————————————————————————————————————————————————
+# GASTROINTESTINAL DISORDERS
+# Total number of patients with at least one event 114 (78.08%) 130 (84.42%)
+# Total number of events 760 374
+# ABDOMINAL DISCOMFORT 24 28
+# ABDOMINAL FULLNESS DUE TO GAS 18 26
+# BACK PAIN 0 0
+# DIARRHEA 17 17
+# FAECES SOFT 17 14
+# GINGIVAL BLEEDING 18 25
+# HEADACHE 0 0
+# HYPOTENSION 0 0
+# NAUSEA (INTERMITTENT) 20 20
+# ORTHOSTATIC HYPOTENSION 0 0
+# WEAKNESS 0 0
+# MUSCULOSKELETAL AND CONNECTIVE TISSUE DISORDERS
+# Total number of patients with at least one event 98 (67.12%) 81 (52.60%)
+# Total number of events 273 142
+# ABDOMINAL DISCOMFORT 0 0
+# ABDOMINAL FULLNESS DUE TO GAS 0 0
+# BACK PAIN 58 45
+# DIARRHEA 0 0
+# FAECES SOFT 0 0
+# GINGIVAL BLEEDING 0 0
+# HEADACHE 0 0
+# HYPOTENSION 0 0
+# NAUSEA (INTERMITTENT) 0 0
+# ORTHOSTATIC HYPOTENSION 0 0
+# WEAKNESS 40 36
+# NERVOUS SYSTEM DISORDERS
+# Total number of patients with at least one event 113 (77.40%) 133 (86.36%)
+# Total number of events 787 420
+# ABDOMINAL DISCOMFORT 0 0
+# ABDOMINAL FULLNESS DUE TO GAS 0 0
+# BACK PAIN 0 0
+# DIARRHEA 0 0
+# FAECES SOFT 0 0
+# GINGIVAL BLEEDING 0 0
+# HEADACHE 113 133
+# HYPOTENSION 0 0
+# NAUSEA (INTERMITTENT) 0 0
+# ORTHOSTATIC HYPOTENSION 0 0
+# WEAKNESS 0 0
+# VASCULAR DISORDERS
+# Total number of patients with at least one event 93 (63.70%) 75 (48.70%)
+# Total number of events 240 122
+# ABDOMINAL DISCOMFORT 0 0
+# ABDOMINAL FULLNESS DUE TO GAS 0 0
+# BACK PAIN 0 0
+# DIARRHEA 0 0
+# FAECES SOFT 0 0
+# GINGIVAL BLEEDING 0 0
+# HEADACHE 0 0
+# HYPOTENSION 44 31
+# NAUSEA (INTERMITTENT) 0 0
+# ORTHOSTATIC HYPOTENSION 49 44
+# WEAKNESS 0 0
+Note that we are missing the overall summary in the first two rows.
+This can be added with an initial analyze()
call.
+adae_soc_lyt4 <- basic_table(show_colcounts = TRUE) %>%
+ split_cols_by("ARM") %>%
+ analyze("USUBJID", afun = s_events_patients) %>%
+ split_rows_by("AEBODSYS", child_labels = "visible", indent_mod = 1, section_div = "") %>%
+ summarize_row_groups("USUBJID", cfun = s_events_patients) %>%
+ analyze("AEDECOD", table_count_once_per_id, show_labels = "hidden", indent_mod = -1)
+
+adae_soc_tbl4 <- build_table(adae_soc_lyt4, ADAE2, alt_counts_df = ADSL2)
+adae_soc_tbl4
# ARM A ARM B
+# (N=146) (N=154)
+# ——————————————————————————————————————————————————————————————————————————————————
+# Total number of patients with at least one event 114 (78.08%) 150 (97.40%)
+# Total number of events 2060 1058
+# GASTROINTESTINAL DISORDERS
+# Total number of patients with at least one event 114 (78.08%) 130 (84.42%)
+# Total number of events 760 374
+# ABDOMINAL DISCOMFORT 24 28
+# ABDOMINAL FULLNESS DUE TO GAS 18 26
+# BACK PAIN 0 0
+# DIARRHEA 17 17
+# FAECES SOFT 17 14
+# GINGIVAL BLEEDING 18 25
+# HEADACHE 0 0
+# HYPOTENSION 0 0
+# NAUSEA (INTERMITTENT) 20 20
+# ORTHOSTATIC HYPOTENSION 0 0
+# WEAKNESS 0 0
+#
+# MUSCULOSKELETAL AND CONNECTIVE TISSUE DISORDERS
+# Total number of patients with at least one event 98 (67.12%) 81 (52.60%)
+# Total number of events 273 142
+# ABDOMINAL DISCOMFORT 0 0
+# ABDOMINAL FULLNESS DUE TO GAS 0 0
+# BACK PAIN 58 45
+# DIARRHEA 0 0
+# FAECES SOFT 0 0
+# GINGIVAL BLEEDING 0 0
+# HEADACHE 0 0
+# HYPOTENSION 0 0
+# NAUSEA (INTERMITTENT) 0 0
+# ORTHOSTATIC HYPOTENSION 0 0
+# WEAKNESS 40 36
+#
+# NERVOUS SYSTEM DISORDERS
+# Total number of patients with at least one event 113 (77.40%) 133 (86.36%)
+# Total number of events 787 420
+# ABDOMINAL DISCOMFORT 0 0
+# ABDOMINAL FULLNESS DUE TO GAS 0 0
+# BACK PAIN 0 0
+# DIARRHEA 0 0
+# FAECES SOFT 0 0
+# GINGIVAL BLEEDING 0 0
+# HEADACHE 113 133
+# HYPOTENSION 0 0
+# NAUSEA (INTERMITTENT) 0 0
+# ORTHOSTATIC HYPOTENSION 0 0
+# WEAKNESS 0 0
+#
+# VASCULAR DISORDERS
+# Total number of patients with at least one event 93 (63.70%) 75 (48.70%)
+# Total number of events 240 122
+# ABDOMINAL DISCOMFORT 0 0
+# ABDOMINAL FULLNESS DUE TO GAS 0 0
+# BACK PAIN 0 0
+# DIARRHEA 0 0
+# FAECES SOFT 0 0
+# GINGIVAL BLEEDING 0 0
+# HEADACHE 0 0
+# HYPOTENSION 44 31
+# NAUSEA (INTERMITTENT) 0 0
+# ORTHOSTATIC HYPOTENSION 49 44
+# WEAKNESS 0 0
+Finally, if we wanted to prune the 0 count rows we can do that with
+the trim_rows()
function:
+trim_rows(adae_soc_tbl4)
# ARM A ARM B
+# (N=146) (N=154)
+# ——————————————————————————————————————————————————————————————————————————————————
+# Total number of patients with at least one event 114 (78.08%) 150 (97.40%)
+# Total number of events 2060 1058
+# GASTROINTESTINAL DISORDERS
+# Total number of patients with at least one event 114 (78.08%) 130 (84.42%)
+# Total number of events 760 374
+# ABDOMINAL DISCOMFORT 24 28
+# ABDOMINAL FULLNESS DUE TO GAS 18 26
+# DIARRHEA 17 17
+# FAECES SOFT 17 14
+# GINGIVAL BLEEDING 18 25
+# NAUSEA (INTERMITTENT) 20 20
+#
+# MUSCULOSKELETAL AND CONNECTIVE TISSUE DISORDERS
+# Total number of patients with at least one event 98 (67.12%) 81 (52.60%)
+# Total number of events 273 142
+# BACK PAIN 58 45
+# WEAKNESS 40 36
+#
+# NERVOUS SYSTEM DISORDERS
+# Total number of patients with at least one event 113 (77.40%) 133 (86.36%)
+# Total number of events 787 420
+# HEADACHE 113 133
+#
+# VASCULAR DISORDERS
+# Total number of patients with at least one event 93 (63.70%) 75 (48.70%)
+# Total number of events 240 122
+# HYPOTENSION 44 31
+# ORTHOSTATIC HYPOTENSION 49 44
+Pruning is a larger topic with a separate
+rtables
package vignette.
The adverse events table by ID and by grade shows how many patients +had at least one adverse event per grade for different subsets of the +data (e.g. defined by system organ class).
+For this table we do not show the zero count grades. Note that we add +the “overall” groups with a custom split function.
+
+table_count_grade_once_per_id <- function(df,
+ labelstr = "",
+ gradevar = "AETOXGR",
+ idvar = "USUBJID",
+ grade_levels = NULL) {
+ id <- df[[idvar]]
+ grade <- df[[gradevar]]
+
+ if (!is.null(grade_levels)) {
+ stopifnot(all(grade %in% grade_levels))
+ grade <- factor(grade, levels = grade_levels)
+ }
+
+ id_sel <- !duplicated(id)
+
+ in_rows(
+ "--Any Grade--" = sum(id_sel),
+ .list = as.list(table(grade[id_sel]))
+ )
+}
+
+table_count_grade_once_per_id(ex_adae, grade_levels = 1:5)
# RowsVerticalSection (in_rows) object print method:
+# ----------------------------
+# row_name formatted_cell indent_mod row_label
+# 1 --Any Grade-- 365 0 --Any Grade--
+# 2 1 131 0 1
+# 3 2 70 0 2
+# 4 3 74 0 3
+# 5 4 25 0 4
+# 6 5 65 0 5
+All of the layouting concepts needed to create this table have +already been introduced so far:
+
+adae_grade_lyt <- basic_table(show_colcounts = TRUE) %>%
+ split_cols_by("ARM") %>%
+ analyze(
+ "AETOXGR",
+ afun = table_count_grade_once_per_id,
+ extra_args = list(grade_levels = 1:5),
+ var_labels = "- Any adverse events -",
+ show_labels = "visible"
+ ) %>%
+ split_rows_by("AEBODSYS", child_labels = "visible", indent_mod = 1) %>%
+ summarize_row_groups(cfun = table_count_grade_once_per_id, format = "xx", indent_mod = 1) %>%
+ split_rows_by("AEDECOD", child_labels = "visible", indent_mod = -2) %>%
+ analyze(
+ "AETOXGR",
+ afun = table_count_grade_once_per_id,
+ extra_args = list(grade_levels = 1:5),
+ show_labels = "hidden"
+ )
+
+adae_grade_tbl <- build_table(adae_grade_lyt, ADAE2, alt_counts_df = ADSL2)
+adae_grade_tbl
# ARM A ARM B
+# (N=146) (N=154)
+# —————————————————————————————————————————————————————————————————————
+# - Any adverse events -
+# --Any Grade-- 114 150
+# 1 32 34
+# 2 22 30
+# 3 11 21
+# 4 8 6
+# 5 41 59
+# GASTROINTESTINAL DISORDERS
+# --Any Grade-- 114 130
+# 1 77 96
+# 2 37 34
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# ABDOMINAL DISCOMFORT
+# --Any Grade-- 68 49
+# 1 68 49
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# ABDOMINAL FULLNESS DUE TO GAS
+# --Any Grade-- 73 51
+# 1 73 51
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# BACK PAIN
+# --Any Grade-- 0 0
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# DIARRHEA
+# --Any Grade-- 68 40
+# 1 68 40
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# FAECES SOFT
+# --Any Grade-- 76 44
+# 1 0 0
+# 2 76 44
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# GINGIVAL BLEEDING
+# --Any Grade-- 80 52
+# 1 80 52
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# HEADACHE
+# --Any Grade-- 0 0
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# HYPOTENSION
+# --Any Grade-- 0 0
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# NAUSEA (INTERMITTENT)
+# --Any Grade-- 83 50
+# 1 0 0
+# 2 83 50
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# ORTHOSTATIC HYPOTENSION
+# --Any Grade-- 0 0
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# WEAKNESS
+# --Any Grade-- 0 0
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# MUSCULOSKELETAL AND CONNECTIVE TISSUE DISORDERS
+# --Any Grade-- 98 81
+# 1 0 0
+# 2 58 45
+# 3 40 36
+# 4 0 0
+# 5 0 0
+# ABDOMINAL DISCOMFORT
+# --Any Grade-- 0 0
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# ABDOMINAL FULLNESS DUE TO GAS
+# --Any Grade-- 0 0
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# BACK PAIN
+# --Any Grade-- 79 62
+# 1 0 0
+# 2 79 62
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# DIARRHEA
+# --Any Grade-- 0 0
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# FAECES SOFT
+# --Any Grade-- 0 0
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# GINGIVAL BLEEDING
+# --Any Grade-- 0 0
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# HEADACHE
+# --Any Grade-- 0 0
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# HYPOTENSION
+# --Any Grade-- 0 0
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# NAUSEA (INTERMITTENT)
+# --Any Grade-- 0 0
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# ORTHOSTATIC HYPOTENSION
+# --Any Grade-- 0 0
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# WEAKNESS
+# --Any Grade-- 73 43
+# 1 0 0
+# 2 0 0
+# 3 73 43
+# 4 0 0
+# 5 0 0
+# NERVOUS SYSTEM DISORDERS
+# --Any Grade-- 113 133
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 113 133
+# ABDOMINAL DISCOMFORT
+# --Any Grade-- 0 0
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# ABDOMINAL FULLNESS DUE TO GAS
+# --Any Grade-- 0 0
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# BACK PAIN
+# --Any Grade-- 0 0
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# DIARRHEA
+# --Any Grade-- 0 0
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# FAECES SOFT
+# --Any Grade-- 0 0
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# GINGIVAL BLEEDING
+# --Any Grade-- 0 0
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# HEADACHE
+# --Any Grade-- 113 133
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 113 133
+# HYPOTENSION
+# --Any Grade-- 0 0
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# NAUSEA (INTERMITTENT)
+# --Any Grade-- 0 0
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# ORTHOSTATIC HYPOTENSION
+# --Any Grade-- 0 0
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# WEAKNESS
+# --Any Grade-- 0 0
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# VASCULAR DISORDERS
+# --Any Grade-- 93 75
+# 1 0 0
+# 2 0 0
+# 3 44 31
+# 4 49 44
+# 5 0 0
+# ABDOMINAL DISCOMFORT
+# --Any Grade-- 0 0
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# ABDOMINAL FULLNESS DUE TO GAS
+# --Any Grade-- 0 0
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# BACK PAIN
+# --Any Grade-- 0 0
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# DIARRHEA
+# --Any Grade-- 0 0
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# FAECES SOFT
+# --Any Grade-- 0 0
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# GINGIVAL BLEEDING
+# --Any Grade-- 0 0
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# HEADACHE
+# --Any Grade-- 0 0
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# HYPOTENSION
+# --Any Grade-- 66 43
+# 1 0 0
+# 2 0 0
+# 3 66 43
+# 4 0 0
+# 5 0 0
+# NAUSEA (INTERMITTENT)
+# --Any Grade-- 0 0
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+# ORTHOSTATIC HYPOTENSION
+# --Any Grade-- 70 54
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 70 54
+# 5 0 0
+# WEAKNESS
+# --Any Grade-- 0 0
+# 1 0 0
+# 2 0 0
+# 3 0 0
+# 4 0 0
+# 5 0 0
+The response table that we will create here is composed of 3 +parts:
+Let’s start with the first part which is fairly simple to derive:
+
+ADRS_BESRSPI <- ex_adrs %>%
+ filter(PARAMCD == "BESRSPI") %>%
+ mutate(
+ rsp = factor(AVALC %in% c("CR", "PR"), levels = c(TRUE, FALSE), labels = c("Responders", "Non-Responders")),
+ is_rsp = (rsp == "Responders")
+ )
+
+s_proportion <- function(x, .N_col) {
+ in_rows(
+ .list = lapply(
+ as.list(table(x)),
+ function(xi) rcell(xi * c(1, 1 / .N_col), format = "xx.xx (xx.xx%)")
+ )
+ )
+}
+
+rsp_lyt <- basic_table(show_colcounts = TRUE) %>%
+ split_cols_by("ARMCD", ref_group = "ARM A") %>%
+ analyze("rsp", s_proportion, show_labels = "hidden")
+
+rsp_tbl <- build_table(rsp_lyt, ADRS_BESRSPI)
+rsp_tbl
# ARM A ARM B ARM C
+# (N=134) (N=134) (N=132)
+# ———————————————————————————————————————————————————————————————————
+# Responders 114.00 (85.07%) 90.00 (67.16%) 120.00 (90.91%)
+# Non-Responders 20.00 (14.93%) 44.00 (32.84%) 12.00 (9.09%)
+Note that we did set the ref_group
argument in
+split_cols_by()
which for the current table had no effect
+as we only use the cell data for the responder and non-responder counts.
+The ref_group
argument is needed for the part 2 and 3 of
+the table.
We will now look the implementation of part 2: unstratified analysis +comparison vs. control group. Let’s start with the analysis +function:
+
+s_unstrat_resp <- function(x, .ref_group, .in_ref_col) {
+ if (.in_ref_col) {
+ return(in_rows(
+ "Difference in Response Rates (%)" = rcell(numeric(0)),
+ "95% CI (Wald, with correction)" = rcell(numeric(0)),
+ "p-value (Chi-Squared Test)" = rcell(numeric(0)),
+ "Odds Ratio (95% CI)" = rcell(numeric(0))
+ ))
+ }
+
+ fit <- stats::prop.test(
+ x = c(sum(x), sum(.ref_group)),
+ n = c(length(x), length(.ref_group)),
+ correct = FALSE
+ )
+
+ fit_glm <- stats::glm(
+ formula = rsp ~ group,
+ data = data.frame(
+ rsp = c(.ref_group, x),
+ group = factor(rep(c("ref", "x"), times = c(length(.ref_group), length(x))), levels = c("ref", "x"))
+ ),
+ family = binomial(link = "logit")
+ )
+
+ in_rows(
+ "Difference in Response Rates (%)" = non_ref_rcell(
+ (mean(x) - mean(.ref_group)) * 100,
+ .in_ref_col,
+ format = "xx.xx"
+ ),
+ "95% CI (Wald, with correction)" = non_ref_rcell(
+ fit$conf.int * 100,
+ .in_ref_col,
+ format = "(xx.xx, xx.xx)"
+ ),
+ "p-value (Chi-Squared Test)" = non_ref_rcell(
+ fit$p.value,
+ .in_ref_col,
+ format = "x.xxxx | (<0.0001)"
+ ),
+ "Odds Ratio (95% CI)" = non_ref_rcell(
+ c(
+ exp(stats::coef(fit_glm)[-1]),
+ exp(stats::confint.default(fit_glm, level = .95)[-1, , drop = FALSE])
+ ),
+ .in_ref_col,
+ format = "xx.xx (xx.xx - xx.xx)"
+ )
+ )
+}
+
+s_unstrat_resp(
+ x = ADRS_BESRSPI %>% filter(ARM == "A: Drug X") %>% pull(is_rsp),
+ .ref_group = ADRS_BESRSPI %>% filter(ARM == "B: Placebo") %>% pull(is_rsp),
+ .in_ref_col = FALSE
+)
# RowsVerticalSection (in_rows) object print method:
+# ----------------------------
+# row_name formatted_cell indent_mod
+# 1 Difference in Response Rates (%) 17.91 0
+# 2 95% CI (Wald, with correction) (7.93, 27.89) 0
+# 3 p-value (Chi-Squared Test) 0.0006 0
+# 4 Odds Ratio (95% CI) 2.79 (1.53 - 5.06) 0
+# row_label
+# 1 Difference in Response Rates (%)
+# 2 95% CI (Wald, with correction)
+# 3 p-value (Chi-Squared Test)
+# 4 Odds Ratio (95% CI)
+Hence we can now add the next vignette to the table:
+
+rsp_lyt2 <- basic_table(show_colcounts = TRUE) %>%
+ split_cols_by("ARMCD", ref_group = "ARM A") %>%
+ analyze("rsp", s_proportion, show_labels = "hidden") %>%
+ analyze(
+ "is_rsp", s_unstrat_resp,
+ show_labels = "visible",
+ var_labels = "Unstratified Response Analysis"
+ )
+
+rsp_tbl2 <- build_table(rsp_lyt2, ADRS_BESRSPI)
+rsp_tbl2
# ARM A ARM B ARM C
+# (N=134) (N=134) (N=132)
+# ——————————————————————————————————————————————————————————————————————————————————————————————
+# Responders 114.00 (85.07%) 90.00 (67.16%) 120.00 (90.91%)
+# Non-Responders 20.00 (14.93%) 44.00 (32.84%) 12.00 (9.09%)
+# Unstratified Response Analysis
+# Difference in Response Rates (%) -17.91 5.83
+# 95% CI (Wald, with correction) (-27.89, -7.93) (-1.94, 13.61)
+# p-value (Chi-Squared Test) 0.0006 0.1436
+# Odds Ratio (95% CI) 0.36 (0.20 - 0.65) 1.75 (0.82 - 3.75)
+Next we will add part 3: the multinomial response table. To do so, we +are adding a row-split by response level, and then doing the same thing +as we did for the binary response table above.
+
+s_prop <- function(df, .N_col) {
+ in_rows(
+ "95% CI (Wald, with correction)" = rcell(binom.test(nrow(df), .N_col)$conf.int * 100, format = "(xx.xx, xx.xx)")
+ )
+}
+
+s_prop(
+ df = ADRS_BESRSPI %>% filter(ARM == "A: Drug X", AVALC == "CR"),
+ .N_col = sum(ADRS_BESRSPI$ARM == "A: Drug X")
+)
# RowsVerticalSection (in_rows) object print method:
+# ----------------------------
+# row_name formatted_cell indent_mod
+# 1 95% CI (Wald, with correction) (49.38, 66.67) 0
+# row_label
+# 1 95% CI (Wald, with correction)
+We can now create the final response table with all three parts:
+
+rsp_lyt3 <- basic_table(show_colcounts = TRUE) %>%
+ split_cols_by("ARMCD", ref_group = "ARM A") %>%
+ analyze("rsp", s_proportion, show_labels = "hidden") %>%
+ analyze(
+ "is_rsp", s_unstrat_resp,
+ show_labels = "visible", var_labels = "Unstratified Response Analysis"
+ ) %>%
+ split_rows_by(
+ var = "AVALC",
+ split_fun = reorder_split_levels(neworder = c("CR", "PR", "SD", "NON CR/PD", "PD", "NE"), drlevels = TRUE),
+ nested = FALSE
+ ) %>%
+ summarize_row_groups() %>%
+ analyze("AVALC", afun = s_prop)
+
+rsp_tbl3 <- build_table(rsp_lyt3, ADRS_BESRSPI)
+rsp_tbl3
# ARM A ARM B ARM C
+# (N=134) (N=134) (N=132)
+# ——————————————————————————————————————————————————————————————————————————————————————————————
+# Responders 114.00 (85.07%) 90.00 (67.16%) 120.00 (90.91%)
+# Non-Responders 20.00 (14.93%) 44.00 (32.84%) 12.00 (9.09%)
+# Unstratified Response Analysis
+# Difference in Response Rates (%) -17.91 5.83
+# 95% CI (Wald, with correction) (-27.89, -7.93) (-1.94, 13.61)
+# p-value (Chi-Squared Test) 0.0006 0.1436
+# Odds Ratio (95% CI) 0.36 (0.20 - 0.65) 1.75 (0.82 - 3.75)
+# CR 78 (58.2%) 55 (41.0%) 97 (73.5%)
+# 95% CI (Wald, with correction) (49.38, 66.67) (32.63, 49.87) (65.10, 80.79)
+# PR 36 (26.9%) 35 (26.1%) 23 (17.4%)
+# 95% CI (Wald, with correction) (19.58, 35.20) (18.92, 34.41) (11.38, 24.99)
+# SD 20 (14.9%) 44 (32.8%) 12 (9.1%)
+# 95% CI (Wald, with correction) (9.36, 22.11) (24.97, 41.47) (4.79, 15.34)
+In the case that we wanted to rename the levels of AVALC
+and remove the CI for NE
we could do that as follows:
+rsp_label <- function(x) {
+ rsp_full_label <- c(
+ CR = "Complete Response (CR)",
+ PR = "Partial Response (PR)",
+ SD = "Stable Disease (SD)",
+ `NON CR/PD` = "Non-CR or Non-PD (NON CR/PD)",
+ PD = "Progressive Disease (PD)",
+ NE = "Not Evaluable (NE)",
+ Missing = "Missing",
+ `NE/Missing` = "Missing or unevaluable"
+ )
+ stopifnot(all(x %in% names(rsp_full_label)))
+ rsp_full_label[x]
+}
+
+
+rsp_lyt4 <- basic_table(show_colcounts = TRUE) %>%
+ split_cols_by("ARMCD", ref_group = "ARM A") %>%
+ analyze("rsp", s_proportion, show_labels = "hidden") %>%
+ analyze(
+ "is_rsp", s_unstrat_resp,
+ show_labels = "visible", var_labels = "Unstratified Response Analysis"
+ ) %>%
+ split_rows_by(
+ var = "AVALC",
+ split_fun = keep_split_levels(c("CR", "PR", "SD", "PD"), reorder = TRUE),
+ nested = FALSE
+ ) %>%
+ summarize_row_groups(cfun = function(df, labelstr, .N_col) {
+ in_rows(nrow(df) * c(1, 1 / .N_col), .formats = "xx (xx.xx%)", .labels = rsp_label(labelstr))
+ }) %>%
+ analyze("AVALC", afun = s_prop) %>%
+ analyze("AVALC", afun = function(x, .N_col) {
+ in_rows(rcell(sum(x == "NE") * c(1, 1 / .N_col), format = "xx.xx (xx.xx%)"), .labels = rsp_label("NE"))
+ }, nested = FALSE)
+
+rsp_tbl4 <- build_table(rsp_lyt4, ADRS_BESRSPI)
+rsp_tbl4
# ARM A ARM B ARM C
+# (N=134) (N=134) (N=132)
+# ——————————————————————————————————————————————————————————————————————————————————————————————
+# Responders 114.00 (85.07%) 90.00 (67.16%) 120.00 (90.91%)
+# Non-Responders 20.00 (14.93%) 44.00 (32.84%) 12.00 (9.09%)
+# Unstratified Response Analysis
+# Difference in Response Rates (%) -17.91 5.83
+# 95% CI (Wald, with correction) (-27.89, -7.93) (-1.94, 13.61)
+# p-value (Chi-Squared Test) 0.0006 0.1436
+# Odds Ratio (95% CI) 0.36 (0.20 - 0.65) 1.75 (0.82 - 3.75)
+# Complete Response (CR) 78 (58.21%) 55 (41.04%) 97 (73.48%)
+# 95% CI (Wald, with correction) (49.38, 66.67) (32.63, 49.87) (65.10, 80.79)
+# Partial Response (PR) 36 (26.87%) 35 (26.12%) 23 (17.42%)
+# 95% CI (Wald, with correction) (19.58, 35.20) (18.92, 34.41) (11.38, 24.99)
+# Stable Disease (SD) 20 (14.93%) 44 (32.84%) 12 (9.09%)
+# 95% CI (Wald, with correction) (9.36, 22.11) (24.97, 41.47) (4.79, 15.34)
+# Progressive Disease (PD) 0 (0.00%) 0 (0.00%) 0 (0.00%)
+# 95% CI (Wald, with correction) (0.00, 2.72) (0.00, 2.72) (0.00, 2.76)
+# Not Evaluable (NE) 0.00 (0.00%) 0.00 (0.00%) 0.00 (0.00%)
+Note that the table is missing the rows gaps to make it more
+readable. The row spacing feature is on the rtables
roadmap
+and will be implemented in future.
The time to event analysis table that will be constructed consists of +four parts:
+The table is constructed by sequential use of the
+analyze()
function, with four custom analysis functions
+corresponding to each of the four parts listed above. In addition the
+table includes referential footnotes relevant to the table contents. The
+table will be faceted column-wise by arm.
First we will start by loading the necessary packages and preparing +the data to be used in the construction of this table.
+
+library(survival)
+
+adtte <- ex_adaette %>%
+ dplyr::filter(PARAMCD == "AETTE2", SAFFL == "Y")
+
+# Add censoring to data for example
+adtte[adtte$AVAL > 1.0, ] <- adtte[adtte$AVAL > 1.0, ] %>% mutate(AVAL = 1.0, CNSR = 1)
+
+adtte2 <- adtte %>%
+ mutate(CNSDTDSC = ifelse(CNSDTDSC == "", "__none__", CNSDTDSC))
The adtte
dataset will be used in preparing the models
+while the adtte2
dataset handles missing values in the
+“Censor Date Description” column and will be used to produce the final
+table. We add censoring into the data for example purposes.
Next we create a basic analysis function, a_count_subjs
+which prints the overall unique subject counts and percentages within
+the data.
+a_count_subjs <- function(x, .N_col) {
+ in_rows(
+ "Subjects with Adverse Events n (%)" = rcell(length(unique(x)) * c(1, 1 / .N_col), format = "xx (xx.xx%)")
+ )
+}
Then an analysis function is created to generate the counts of
+censored subjects for each level of a factor variable in the dataset. In
+this case the cnsr_counter
function will be applied with
+the CNSDTDSC
variable which contains a censor date
+description for each censored subject.
+cnsr_counter <- function(df, .var, .N_col) {
+ x <- df[!duplicated(df$USUBJID), .var]
+ x <- x[x != "__none__"]
+ lapply(table(x), function(xi) rcell(xi * c(1, 1 / .N_col), format = "xx (xx.xx%)"))
+}
This function generates counts and fractions of unique subjects +corresponding to each factor level, excluding missing values (uncensored +patients).
+A Cox proportional-hazards (Cox P-H) analysis is generated next with
+a third custom analysis function, a_cph
. Prior to creating
+the analysis function, the Cox P-H model is fit to our data using the
+coxph()
and Surv()
functions from the
+survival
package. Then this model is used as input to the
+a_cph
analysis function which returns hazard ratios, 95%
+confidence intervals, and p-values comparing against the reference group
+- in this case the leftmost column.
+cph <- coxph(Surv(AVAL, CNSR == 0) ~ ACTARM + STRATA1, ties = "exact", data = adtte)
+
+a_cph <- function(df, .var, .in_ref_col, .ref_full, full_cox_fit) {
+ if (.in_ref_col) {
+ ret <- replicate(3, list(rcell(NULL)))
+ } else {
+ curtrt <- df[[.var]][1]
+ coefs <- coef(full_cox_fit)
+ sel_pos <- grep(curtrt, names(coefs), fixed = TRUE)
+ hrval <- exp(coefs[sel_pos])
+ sdf <- survdiff(Surv(AVAL, CNSR == 0) ~ ACTARM + STRATA1, data = rbind(df, .ref_full))
+ pval <- (1 - pchisq(sdf$chisq, length(sdf$n) - 1)) / 2
+ ci_val <- exp(unlist(confint(full_cox_fit)[sel_pos, ]))
+ ret <- list(
+ rcell(hrval, format = "xx.x"),
+ rcell(ci_val, format = "(xx.x, xx.x)"),
+ rcell(pval, format = "x.xxxx | (<0.0001)")
+ )
+ }
+ in_rows(
+ .list = ret,
+ .names = c("Hazard ratio", "95% confidence interval", "p-value (one-sided stratified log rank)")
+ )
+}
The fourth and final analysis function, a_tte
, generates
+a time to first adverse event table with three rows corresponding to
+Median, 95% Confidence Interval, and Min Max respectively. First a
+survival table is constructed from the summary table of a survival model
+using the survfit()
and Surv()
functions from
+the survival
package. This table is then given as input to
+a_tte
which produces the table of time to first adverse
+event consisting of the previously mentioned summary statistics.
+surv_tbl <- as.data.frame(
+ summary(survfit(Surv(AVAL, CNSR == 0) ~ ACTARM, data = adtte, conf.type = "log-log"))$table
+) %>%
+ dplyr::mutate(
+ ACTARM = factor(gsub("ACTARM=", "", row.names(.)), levels = levels(adtte$ACTARM)),
+ ind = FALSE
+ )
+
+a_tte <- function(df, .var, kp_table) {
+ ind <- grep(df[[.var]][1], row.names(kp_table), fixed = TRUE)
+ minmax <- range(df[["AVAL"]])
+ mm_val_str <- format_value(minmax, format = "xx.x, xx.x")
+ rowfn <- list()
+ if (all(df$CNSR[df$AVAL == minmax[2]])) {
+ mm_val_str <- paste0(mm_val_str, "*")
+ rowfn <- "* indicates censoring"
+ }
+ in_rows(
+ Median = kp_table[ind, "median", drop = TRUE],
+ "95% confidence interval" = unlist(kp_table[ind, c("0.95LCL", "0.95UCL")]),
+ "Min Max" = mm_val_str,
+ .formats = c("xx.xx", "xx.xx - xx.xx", "xx"),
+ .row_footnotes = list(NULL, NULL, rowfn)
+ )
+}
Additionally, the a_tte
function creates a referential
+footnote within the table to indicate where censoring occurred in the
+data.
Now we are able to use these four analysis functions to build our +time to event analysis table.
+
+lyt <- basic_table(show_colcounts = TRUE) %>%
+ ## Column faceting
+ split_cols_by("ARM", ref_group = "A: Drug X") %>%
+ ## Overall count
+ analyze("USUBJID", a_count_subjs, show_labels = "hidden") %>%
+ ## Censored subjects summary
+ analyze("CNSDTDSC", cnsr_counter, var_labels = "Censored Subjects", show_labels = "visible") %>%
+ ## Cox P-H analysis
+ analyze("ARM", a_cph, extra_args = list(full_cox_fit = cph), show_labels = "hidden") %>%
+ ## Time-to-event analysis
+ analyze(
+ "ARM", a_tte,
+ var_labels = "Time to first adverse event", show_labels = "visible",
+ extra_args = list(kp_table = surv_tbl),
+ table_names = "kapmeier"
+ )
+
+tbl_tte <- build_table(lyt, adtte2)
We set the show_colcounts
argument of
+basic_table()
to TRUE
to first print the total
+subject counts for each column. Next we use split_cols_by()
+to split the table into three columns corresponding to the three
+different levels of ARM
, and specify that the first arm,
+"A: Drug X"
should act as the reference group to be
+compared against - this reference group is used for the Cox P-H
+analysis. Then we call analyze()
sequentially using each of
+the four custom analysis functions as argument afun
and
+specifying additional arguments where necessary. Then we use
+build_table()
to construct our rtable
using
+the adtte2
dataset.
Finally, we annotate the table using the
+fnotes_at_path()
function to specify that product-limit
+estimates are used to calculate the statistics listed under the “Time to
+first adverse event” heading within the table. The referential footnote
+created earlier in the time-to-event analysis function
+(a_tte
) is also displayed.
+fnotes_at_path(
+ tbl_tte,
+ c("ma_USUBJID_CNSDTDSC_ARM_kapmeier", "kapmeier")
+) <- "Product-limit (Kaplan-Meier) estimates."
+
+tbl_tte
# A: Drug X B: Placebo C: Combination
+# (N=134) (N=134) (N=132)
+# ————————————————————————————————————————————————————————————————————————————————————————
+# Subjects with Adverse Events n (%) 134 (100.00%) 134 (100.00%) 132 (100.00%)
+# Censored Subjects
+# Clinical Cut Off 6 (4.48%) 3 (2.24%) 14 (10.61%)
+# Completion or Discontinuation 9 (6.72%) 5 (3.73%) 9 (6.82%)
+# End of AE Reporting Period 14 (10.45%) 7 (5.22%) 14 (10.61%)
+# Preferred Term 11 (8.21%) 5 (3.73%) 13 (9.85%)
+# Hazard ratio 0.7 1.0
+# 95% confidence interval (0.5, 0.9) (0.8, 1.4)
+# p-value (one-sided stratified log rank) 0.1070 0.4880
+# Time to first adverse event {1}
+# Median 0.23 0.39 0.29
+# 95% confidence interval 0.18 - 0.33 0.29 - 0.49 0.22 - 0.35
+# Min Max {2} 0.0, 1.0* 0.0, 1.0* 0.0, 1.0*
+# ————————————————————————————————————————————————————————————————————————————————————————
+#
+# {1} - Product-limit (Kaplan-Meier) estimates.
+# {2} - * indicates censoring
+# ————————————————————————————————————————————————————————————————————————————————————————
+vignettes/custom_appearance.Rmd
+ custom_appearance.Rmd
In this vignette, we describe the various ways we can modify and
+customize the appearance of rtables
.
Loading the package:
+ +It is possible to align the content by assigning "left"
,
+"center"
(default), and "right"
to
+.aligns
and align
arguments in
+in_rows()
and rcell()
, respectively. It is
+also possible to use decimal
, dec_right
, and
+dec_left
for decimal alignments. The first takes all
+numerical values and aligns the decimal character .
in
+every value of the column that has align = "decimal"
. Also
+numeric without decimal values are aligned according to an imaginary
+.
if specified as such. dec_left
and
+dec_right
behave similarly, with the difference that if the
+column present empty spaces at left or right, it pushes values towards
+left or right taking the one value that has most decimal characters, if
+right, or non-decimal values if left. For more details, please read the
+related documentation page help("decimal_align")
.
Please consider using ?in_rows
and ?rcell
+for further clarifications on the two arguments, and use
+formatters::list_valid_aligns()
to see all available
+alignment options.
In the following we show two simplified examples that use
+align
and .aligns
, respectively.
+# In rcell we use align.
+lyt <- basic_table() %>%
+ analyze("AGE", function(x) {
+ in_rows(
+ left = rcell("l", align = "left"),
+ right = rcell("r", align = "right"),
+ center = rcell("c", align = "center")
+ )
+ })
+
+tbl <- build_table(lyt, DM)
+tbl
# all obs
+# ————————————————
+# left l
+# right r
+# center c
+
+# In in_rows, we use .aligns. This can either set the general value or the
+# single values (see NB).
+lyt2 <- basic_table() %>%
+ analyze("AGE", function(x) {
+ in_rows(
+ left = rcell("l"),
+ right = rcell("r"),
+ center = rcell("c"),
+ .aligns = c("right")
+ ) # NB: .aligns = c("right", "left", "center")
+ })
+
+tbl2 <- build_table(lyt2, DM)
+tbl2
# all obs
+# ————————————————
+# left l
+# right r
+# center c
+These concepts can be well applied to any clinical table as shown in +the following, more complex, example.
+
+lyt3 <- basic_table() %>%
+ split_cols_by("ARM") %>%
+ split_rows_by("SEX") %>%
+ analyze(c("AGE", "STRATA1"), function(x) {
+ if (is.numeric(x)) {
+ in_rows(
+ "mean" = rcell(mean(x)),
+ "sd" = rcell(sd(x)),
+ .formats = c("xx.x"), .aligns = "left"
+ )
+ } else if (is.factor(x)) {
+ rcell(length(unique(x)), align = "right")
+ } else {
+ stop("Unsupported type")
+ }
+ }, show_labels = "visible", na_str = "NE")
+
+tbl3 <- build_table(lyt3, ex_adsl)
+tbl3
# A: Drug X B: Placebo C: Combination
+# ——————————————————————————————————————————————————————————
+# F
+# AGE
+# mean 32.8 34.1 35.2
+# sd 6.1 7.1 7.4
+# STRATA1
+# STRATA1 3 3 3
+# M
+# AGE
+# mean 35.6 37.4 35.4
+# sd 7.1 8.7 8.2
+# STRATA1
+# STRATA1 3 3 3
+# U
+# AGE
+# mean 31.7 31.0 35.2
+# sd 3.2 5.7 3.1
+# STRATA1
+# STRATA1 3 2 3
+# UNDIFFERENTIATED
+# AGE
+# mean 28.0 NE 45.0
+# sd NE NE 1.4
+# STRATA1
+# STRATA1 1 0 2
+The sequence of strings printed in the area between the column header
+display and the first row label can be modified during pre-processing
+using label position argument in row splits split_rows_by
,
+with the append_topleft
function, and during
+post-processing using the top_left()
function. Note:
+Indenting is automatically added label_pos = "topleft"
.
Within the layout initializer:
+
+lyt <- basic_table() %>%
+ split_cols_by("ARM") %>%
+ split_rows_by("STRATA1") %>%
+ analyze("AGE") %>%
+ append_topleft("New top_left material here")
+
+build_table(lyt, DM)
# New top_left material here A: Drug X B: Placebo C: Combination
+# ————————————————————————————————————————————————————————————————————
+# A
+# Mean 32.53 32.30 35.76
+# B
+# Mean 35.46 32.42 34.39
+# C
+# Mean 36.34 34.45 33.54
+Specify label position using the split_rows
function.
+Notice the position of STRATA1
and SEX
.
+lyt <- basic_table() %>%
+ split_cols_by("ARM") %>%
+ split_rows_by("STRATA1", label_pos = "topleft") %>%
+ split_rows_by("SEX", label_pos = "topleft") %>%
+ analyze("AGE")
+
+build_table(lyt, DM)
# STRATA1
+# SEX A: Drug X B: Placebo C: Combination
+# ————————————————————————————————————————————————————————————
+# A
+# F
+# Mean 30.91 32.91 35.95
+# M
+# Mean 35.07 31.09 35.60
+# U
+# Mean NA NA NA
+# UNDIFFERENTIATED
+# Mean NA NA NA
+# B
+# F
+# Mean 34.85 32.88 34.42
+# M
+# Mean 36.64 32.09 34.37
+# U
+# Mean NA NA NA
+# UNDIFFERENTIATED
+# Mean NA NA NA
+# C
+# F
+# Mean 35.19 36.00 34.32
+# M
+# Mean 37.39 32.81 32.83
+# U
+# Mean NA NA NA
+# UNDIFFERENTIATED
+# Mean NA NA NA
+Post-processing using the top_left()
function:
+lyt <- basic_table() %>%
+ split_cols_by("ARM") %>%
+ split_rows_by("SEX") %>%
+ analyze(c("AGE", "STRATA1"), function(x) {
+ if (is.numeric(x)) {
+ in_rows(
+ "mean" = rcell(mean(x)),
+ "sd" = rcell(sd(x)),
+ .formats = c("xx.x"), .aligns = "left"
+ )
+ } else if (is.factor(x)) {
+ rcell(length(unique(x)), align = "right")
+ } else {
+ stop("Unsupported type")
+ }
+ }, show_labels = "visible", na_str = "NE") %>%
+ build_table(ex_adsl)
+
+# Adding top-left material
+top_left(lyt) <- "New top-left material here"
+
+lyt
# New top-left material here A: Drug X B: Placebo C: Combination
+# ————————————————————————————————————————————————————————————————————
+# F
+# AGE
+# mean 32.8 34.1 35.2
+# sd 6.1 7.1 7.4
+# STRATA1
+# STRATA1 3 3 3
+# M
+# AGE
+# mean 35.6 37.4 35.4
+# sd 7.1 8.7 8.2
+# STRATA1
+# STRATA1 3 3 3
+# U
+# AGE
+# mean 31.7 31.0 35.2
+# sd 3.2 5.7 3.1
+# STRATA1
+# STRATA1 3 2 3
+# UNDIFFERENTIATED
+# AGE
+# mean 28.0 NE 45.0
+# sd NE NE 1.4
+# STRATA1
+# STRATA1 1 0 2
+Table title, table body, referential footnotes and and main footers
+can be inset from the left alignment of the titles and provenance footer
+materials. This can be modified within the layout initializer
+basic_table()
using the inset
argument or
+during post-processing with table_inset()
.
Using the layout initializer:
+
+lyt <- basic_table(inset = 5) %>%
+ analyze("AGE")
+
+build_table(lyt, DM)
# all obs
+# ——————————————
+# Mean 34.22
+Using the post-processing function:
+Without inset -
+
+lyt <- basic_table() %>%
+ analyze("AGE")
+
+tbl <- build_table(lyt, DM)
+tbl
# all obs
+# ——————————————
+# Mean 34.22
+With an inset of 5 characters -
+
+table_inset(tbl) <- 5
+tbl
# all obs
+# ——————————————
+# Mean 34.22
+Below is an example with a table produced for clinical data. Compare +the inset of the table and main footer between the two tables.
+Without inset -
+
+analysisfun <- function(x, ...) {
+ in_rows(
+ row1 = 5,
+ row2 = c(1, 2),
+ .row_footnotes = list(row1 = "row 1 rfn"),
+ .cell_footnotes = list(row2 = "row 2 cfn")
+ )
+}
+
+lyt <- basic_table(
+ title = "Title says Whaaaat", subtitles = "Oh, ok.",
+ main_footer = "ha HA! Footer!", prov_footer = "provenaaaaance"
+) %>%
+ split_cols_by("ARM") %>%
+ analyze("AGE", afun = analysisfun)
+
+result <- build_table(lyt, ex_adsl)
+result
# Title says Whaaaat
+# Oh, ok.
+#
+# ——————————————————————————————————————————————————
+# A: Drug X B: Placebo C: Combination
+# ——————————————————————————————————————————————————
+# row1 {1} 5 5 5
+# row2 1, 2 {2} 1, 2 {2} 1, 2 {2}
+# ——————————————————————————————————————————————————
+#
+# {1} - row 1 rfn
+# {2} - row 2 cfn
+# ——————————————————————————————————————————————————
+#
+# ha HA! Footer!
+#
+# provenaaaaance
+With inset -
+Notice, the inset does not apply to any title materials (main title, +subtitles, page titles), or provenance footer materials. Inset settings +is applied to top-left materials, referential footnotes main footer +materials and any horizontal dividers.
+
+table_inset(result) <- 5
+result
# Title says Whaaaat
+# Oh, ok.
+#
+# ——————————————————————————————————————————————————
+# A: Drug X B: Placebo C: Combination
+# ——————————————————————————————————————————————————
+# row1 {1} 5 5 5
+# row2 1, 2 {2} 1, 2 {2} 1, 2 {2}
+# ——————————————————————————————————————————————————
+#
+# {1} - row 1 rfn
+# {2} - row 2 cfn
+# ——————————————————————————————————————————————————
+#
+# ha HA! Footer!
+#
+# provenaaaaance
+A character value can be specified to modify the horizontal +separation between column headers and the table. Horizontal separation +applies when:
+Below, we replace the default line with “=”.
+
+tbl <- basic_table() %>%
+ split_cols_by("Species") %>%
+ add_colcounts() %>%
+ analyze(c("Sepal.Length", "Petal.Width"), function(x) {
+ in_rows(
+ mean_sd = c(mean(x), sd(x)),
+ var = var(x),
+ min_max = range(x),
+ .formats = c("xx.xx (xx.xx)", "xx.xxx", "xx.x - xx.x"),
+ .labels = c("Mean (sd)", "Variance", "Min - Max")
+ )
+ }) %>%
+ build_table(iris, hsep = "=")
+tbl
# setosa versicolor virginica
+# (N=50) (N=50) (N=50)
+# ======================================================
+# Sepal.Length
+# Mean (sd) 5.01 (0.35) 5.94 (0.52) 6.59 (0.64)
+# Variance 0.124 0.266 0.404
+# Min - Max 4.3 - 5.8 4.9 - 7.0 4.9 - 7.9
+# Petal.Width
+# Mean (sd) 0.25 (0.11) 1.33 (0.20) 2.03 (0.27)
+# Variance 0.011 0.039 0.075
+# Min - Max 0.1 - 0.6 1.0 - 1.8 1.4 - 2.5
+A character value can be specified as a section divider which succeed +every group defined by a split instruction. Note, a trailing divider at +the end of the table is never printed.
+Below, a “+” is repeated and used as a section divider.
+
+lyt <- basic_table() %>%
+ split_cols_by("Species") %>%
+ analyze(head(names(iris), -1), afun = function(x) {
+ list(
+ "mean / sd" = rcell(c(mean(x), sd(x)), format = "xx.xx (xx.xx)"),
+ "range" = rcell(diff(range(x)), format = "xx.xx")
+ )
+ }, section_div = "+")
+
+build_table(lyt, iris)
# setosa versicolor virginica
+# ——————————————————————————————————————————————————————
+# Sepal.Length
+# mean / sd 5.01 (0.35) 5.94 (0.52) 6.59 (0.64)
+# range 1.50 2.10 3.00
+# ++++++++++++++++++++++++++++++++++++++++++++++++++++++
+# Sepal.Width
+# mean / sd 3.43 (0.38) 2.77 (0.31) 2.97 (0.32)
+# range 2.10 1.40 1.60
+# ++++++++++++++++++++++++++++++++++++++++++++++++++++++
+# Petal.Length
+# mean / sd 1.46 (0.17) 4.26 (0.47) 5.55 (0.55)
+# range 0.90 2.10 2.40
+# ++++++++++++++++++++++++++++++++++++++++++++++++++++++
+# Petal.Width
+# mean / sd 0.25 (0.11) 1.33 (0.20) 2.03 (0.27)
+# range 0.50 0.80 1.10
+Section dividers can be set to ” ” to create a blank line.
+
+lyt <- basic_table() %>%
+ split_cols_by("Species") %>%
+ analyze(head(names(iris), -1), afun = function(x) {
+ list(
+ "mean / sd" = rcell(c(mean(x), sd(x)), format = "xx.xx (xx.xx)"),
+ "range" = rcell(diff(range(x)), format = "xx.xx")
+ )
+ }, section_div = " ")
+
+build_table(lyt, iris)
# setosa versicolor virginica
+# ——————————————————————————————————————————————————————
+# Sepal.Length
+# mean / sd 5.01 (0.35) 5.94 (0.52) 6.59 (0.64)
+# range 1.50 2.10 3.00
+#
+# Sepal.Width
+# mean / sd 3.43 (0.38) 2.77 (0.31) 2.97 (0.32)
+# range 2.10 1.40 1.60
+#
+# Petal.Length
+# mean / sd 1.46 (0.17) 4.26 (0.47) 5.55 (0.55)
+# range 0.90 2.10 2.40
+#
+# Petal.Width
+# mean / sd 0.25 (0.11) 1.33 (0.20) 2.03 (0.27)
+# range 0.50 0.80 1.10
+Separation characters can be specified for different row splits. +However, only one will be printed if they “pile up” next to each +other.
+
+lyt <- basic_table() %>%
+ split_cols_by("ARM") %>%
+ split_rows_by("RACE", section_div = "=") %>%
+ split_rows_by("STRATA1", section_div = "~") %>%
+ analyze("AGE", mean, var_labels = "Age", format = "xx.xx")
+
+build_table(lyt, DM)
# A: Drug X B: Placebo C: Combination
+# ———————————————————————————————————————————————————————————————————————————————————
+# ASIAN
+# A
+# mean 32.19 33.90 36.81
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# B
+# mean 34.12 31.62 34.73
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# C
+# mean 36.21 33.00 32.39
+# ===================================================================================
+# BLACK OR AFRICAN AMERICAN
+# A
+# mean 31.50 28.57 33.62
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# B
+# mean 35.60 30.83 33.67
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# C
+# mean 35.50 34.18 35.00
+# ===================================================================================
+# WHITE
+# A
+# mean 37.67 31.33 33.17
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# B
+# mean 39.86 39.00 34.75
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# C
+# mean 39.75 44.67 36.75
+# ===================================================================================
+# AMERICAN INDIAN OR ALASKA NATIVE
+# A
+# mean NA NA NA
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# B
+# mean NA NA NA
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# C
+# mean NA NA NA
+# ===================================================================================
+# MULTIPLE
+# A
+# mean NA NA NA
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# B
+# mean NA NA NA
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# C
+# mean NA NA NA
+# ===================================================================================
+# NATIVE HAWAIIAN OR OTHER PACIFIC ISLANDER
+# A
+# mean NA NA NA
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# B
+# mean NA NA NA
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# C
+# mean NA NA NA
+# ===================================================================================
+# OTHER
+# A
+# mean NA NA NA
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# B
+# mean NA NA NA
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# C
+# mean NA NA NA
+# ===================================================================================
+# UNKNOWN
+# A
+# mean NA NA NA
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# B
+# mean NA NA NA
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# C
+# mean NA NA NA
+Tables by default have indenting at each level of splitting. A custom
+indent value can be supplied with the indent_mod
argument
+within a split function to modify this default. Compare the indenting of
+the tables below:
Default Indent -
+
+basic_table(
+ title = "Study XXXXXXXX",
+ subtitles = c("subtitle YYYYYYYYYY", "subtitle2 ZZZZZZZZZ"),
+ main_footer = "Analysis was done using cool methods that are correct",
+ prov_footer = "file: /path/to/stuff/that/lives/there HASH:1ac41b242a"
+) %>%
+ split_cols_by("ARM") %>%
+ split_rows_by("SEX") %>%
+ split_rows_by("STRATA1") %>%
+ analyze("AGE", mean, format = "xx.x") %>%
+ build_table(DM)
# Study XXXXXXXX
+# subtitle YYYYYYYYYY
+# subtitle2 ZZZZZZZZZ
+#
+# ——————————————————————————————————————————————————————————
+# A: Drug X B: Placebo C: Combination
+# ——————————————————————————————————————————————————————————
+# F
+# A
+# mean 30.9 32.9 36.0
+# B
+# mean 34.9 32.9 34.4
+# C
+# mean 35.2 36.0 34.3
+# M
+# A
+# mean 35.1 31.1 35.6
+# B
+# mean 36.6 32.1 34.4
+# C
+# mean 37.4 32.8 32.8
+# U
+# A
+# mean NA NA NA
+# B
+# mean NA NA NA
+# C
+# mean NA NA NA
+# UNDIFFERENTIATED
+# A
+# mean NA NA NA
+# B
+# mean NA NA NA
+# C
+# mean NA NA NA
+# ——————————————————————————————————————————————————————————
+#
+# Analysis was done using cool methods that are correct
+#
+# file: /path/to/stuff/that/lives/there HASH:1ac41b242a
+Modified indent -
+
+basic_table(
+ title = "Study XXXXXXXX",
+ subtitles = c("subtitle YYYYYYYYYY", "subtitle2 ZZZZZZZZZ"),
+ main_footer = "Analysis was done using cool methods that are correct",
+ prov_footer = "file: /path/to/stuff/that/lives/there HASH:1ac41b242a"
+) %>%
+ split_cols_by("ARM") %>%
+ split_rows_by("SEX", indent_mod = 3) %>%
+ split_rows_by("STRATA1", indent_mod = 5) %>%
+ analyze("AGE", mean, format = "xx.x") %>%
+ build_table(DM)
# Study XXXXXXXX
+# subtitle YYYYYYYYYY
+# subtitle2 ZZZZZZZZZ
+#
+# ——————————————————————————————————————————————————————————————————
+# A: Drug X B: Placebo C: Combination
+# ——————————————————————————————————————————————————————————————————
+# F
+# A
+# mean 30.9 32.9 36.0
+# B
+# mean 34.9 32.9 34.4
+# C
+# mean 35.2 36.0 34.3
+# M
+# A
+# mean 35.1 31.1 35.6
+# B
+# mean 36.6 32.1 34.4
+# C
+# mean 37.4 32.8 32.8
+# U
+# A
+# mean NA NA NA
+# B
+# mean NA NA NA
+# C
+# mean NA NA NA
+# UNDIFFERENTIATED
+# A
+# mean NA NA NA
+# B
+# mean NA NA NA
+# C
+# mean NA NA NA
+# ——————————————————————————————————————————————————————————————————
+#
+# Analysis was done using cool methods that are correct
+#
+# file: /path/to/stuff/that/lives/there HASH:1ac41b242a
+With split instructions, visibility of the label for the variable
+being split can be modified to visible
, hidden
+and topleft
with the show_labels
argument,
+label_pos
argument, and child_labels
argument
+where applicable. Note: this is NOT the name of the levels contained in
+the variable. For analyze calls, indicates that the variable should be
+visible only if multiple variables are analyzed at the same level of
+nesting.
Visibility of labels for the groups generated by a split can also be
+modified using the child_label
argument with a split call.
+The child_label
argument can force labels to be visible in
+addition to content rows but we cannot hide or move the content
+rows.
Notice the placement of the “AGE” label in this example:
+
+lyt <- basic_table(show_colcounts = TRUE) %>%
+ split_cols_by(var = "ARM") %>%
+ split_rows_by("SEX", split_fun = drop_split_levels, child_labels = "visible") %>%
+ split_rows_by("STRATA1") %>%
+ analyze("AGE", mean, show_labels = "default")
+
+build_table(lyt, DM)
# A: Drug X B: Placebo C: Combination
+# (N=121) (N=106) (N=129)
+# —————————————————————————————————————————————————————————————————
+# F
+# A
+# mean 30.9090909090909 32.9090909090909 35.95
+# B
+# mean 34.8518518518519 32.8823529411765 34.4210526315789
+# C
+# mean 35.1904761904762 36 34.3181818181818
+# M
+# A
+# mean 35.0714285714286 31.0909090909091 35.6
+# B
+# mean 36.6428571428571 32.0869565217391 34.3684210526316
+# C
+# mean 37.3913043478261 32.8125 32.8333333333333
+When set to default, the label AGE
is not repeated since
+there is only one variable being analyzed at the same level of nesting.
+Override this by setting the show_labels
argument as
+“visible”.
+lyt2 <- basic_table(show_colcounts = TRUE) %>%
+ split_cols_by(var = "ARM") %>%
+ split_rows_by("SEX", split_fun = drop_split_levels, child_labels = "hidden") %>%
+ split_rows_by("STRATA1") %>%
+ analyze("AGE", mean, show_labels = "visible")
+
+build_table(lyt2, DM)
# A: Drug X B: Placebo C: Combination
+# (N=121) (N=106) (N=129)
+# —————————————————————————————————————————————————————————————————
+# A
+# AGE
+# mean 30.9090909090909 32.9090909090909 35.95
+# B
+# AGE
+# mean 34.8518518518519 32.8823529411765 34.4210526315789
+# C
+# AGE
+# mean 35.1904761904762 36 34.3181818181818
+# A
+# AGE
+# mean 35.0714285714286 31.0909090909091 35.6
+# B
+# AGE
+# mean 36.6428571428571 32.0869565217391 34.3684210526316
+# C
+# AGE
+# mean 37.3913043478261 32.8125 32.8333333333333
+Below is an example using the label_pos
argument for
+modifying label visibility:
Label order will mirror the order of split_rows_by
+calls. If the labels of any subgroups should be hidden, the
+label_pos
argument should be set to hidden.
“SEX” label position is hidden -
+
+basic_table(
+ title = "Study XXXXXXXX",
+ subtitles = c("subtitle YYYYYYYYYY", "subtitle2 ZZZZZZZZZ"),
+ main_footer = "Analysis was done using cool methods that are correct",
+ prov_footer = "file: /path/to/stuff/that/lives/there HASH:1ac41b242a"
+) %>%
+ split_cols_by("ARM") %>%
+ split_rows_by("SEX", split_fun = drop_split_levels, label_pos = "visible") %>%
+ split_rows_by("STRATA1", label_pos = "hidden") %>%
+ analyze("AGE", mean, format = "xx.x") %>%
+ build_table(DM)
# Study XXXXXXXX
+# subtitle YYYYYYYYYY
+# subtitle2 ZZZZZZZZZ
+#
+# ————————————————————————————————————————————————————
+# A: Drug X B: Placebo C: Combination
+# ————————————————————————————————————————————————————
+# SEX
+# F
+# A
+# mean 30.9 32.9 36.0
+# B
+# mean 34.9 32.9 34.4
+# C
+# mean 35.2 36.0 34.3
+# M
+# A
+# mean 35.1 31.1 35.6
+# B
+# mean 36.6 32.1 34.4
+# C
+# mean 37.4 32.8 32.8
+# ————————————————————————————————————————————————————
+#
+# Analysis was done using cool methods that are correct
+#
+# file: /path/to/stuff/that/lives/there HASH:1ac41b242a
+“SEX” label position is with the top-left materials -
+
+basic_table(
+ title = "Study XXXXXXXX",
+ subtitles = c("subtitle YYYYYYYYYY", "subtitle2 ZZZZZZZZZ"),
+ main_footer = "Analysis was done using cool methods that are correct",
+ prov_footer = "file: /path/to/stuff/that/lives/there HASH:1ac41b242a"
+) %>%
+ split_cols_by("ARM") %>%
+ split_rows_by("SEX", split_fun = drop_split_levels, label_pos = "topleft") %>%
+ split_rows_by("STRATA1", label_pos = "hidden") %>%
+ analyze("AGE", mean, format = "xx.x") %>%
+ build_table(DM)
# Study XXXXXXXX
+# subtitle YYYYYYYYYY
+# subtitle2 ZZZZZZZZZ
+#
+# ——————————————————————————————————————————————————
+# SEX A: Drug X B: Placebo C: Combination
+# ——————————————————————————————————————————————————
+# F
+# A
+# mean 30.9 32.9 36.0
+# B
+# mean 34.9 32.9 34.4
+# C
+# mean 35.2 36.0 34.3
+# M
+# A
+# mean 35.1 31.1 35.6
+# B
+# mean 36.6 32.1 34.4
+# C
+# mean 37.4 32.8 32.8
+# ——————————————————————————————————————————————————
+#
+# Analysis was done using cool methods that are correct
+#
+# file: /path/to/stuff/that/lives/there HASH:1ac41b242a
+An rtable
can be rendered with a customized width by
+setting custom rendering widths for cell contents, row labels, and
+titles/footers.
This is demonstrated using the sample data and table below. In this +section we aim to render this table with a reduced width since the table +has very wide contents in several cells, labels, and titles/footers.
+
+trimmed_data <- ex_adsl %>%
+ filter(SEX %in% c("M", "F")) %>%
+ filter(RACE %in% levels(RACE)[1:2])
+
+levels(trimmed_data$ARM)[1] <- "Incredibly long column name to be wrapped"
+levels(trimmed_data$ARM)[2] <- "This_column_name_should_be_split_somewhere"
+
+wide_tbl <- basic_table(
+ title = "Title that is too long and also needs to be wrapped to a smaller width",
+ subtitles = "Subtitle that is also long and also needs to be wrapped to a smaller width",
+ main_footer = "Footnote that is wider than expected for this table.",
+ prov_footer = "Provenance footer material that is also wider than expected for this table."
+) %>%
+ split_cols_by("ARM") %>%
+ split_rows_by("RACE", split_fun = drop_split_levels) %>%
+ analyze(
+ c("AGE", "EOSDY"),
+ na_str = "Very long cell contents to_be_wrapped_and_splitted",
+ inclNAs = TRUE
+ ) %>%
+ build_table(trimmed_data)
+
+wide_tbl
# Title that is too long and also needs to be wrapped to a smaller width
+# Subtitle that is also long and also needs to be wrapped to a smaller width
+#
+# ————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————
+# Incredibly long column name to be wrapped This_column_name_should_be_split_somewhere C: Combination
+# ————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————
+# ASIAN
+# AGE
+# Mean 32.50 36.68 36.99
+# EOSDY
+# Mean Very long cell contents to_be_wrapped_and_splitted Very long cell contents to_be_wrapped_and_splitted Very long cell contents to_be_wrapped_and_splitted
+# BLACK OR AFRICAN AMERICAN
+# AGE
+# Mean 34.27 34.93 33.71
+# EOSDY
+# Mean Very long cell contents to_be_wrapped_and_splitted Very long cell contents to_be_wrapped_and_splitted Very long cell contents to_be_wrapped_and_splitted
+# ————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————
+#
+# Footnote that is wider than expected for this table.
+#
+# Provenance footer material that is also wider than expected for this table.
+In the following sections we will use the toString()
+function to render the table in string form. This resulting string
+representation is ready to be printed or written to a plain text file,
+but we will use the strsplit()
function in combination with
+the matrix()
function to preview the rendered wrapped table
+in matrix form within this vignette.
The width of a rendered table can be customized by wrapping column
+widths. This is done by setting custom width values via the
+widths
argument of the toString()
function.
+The length of the vector passed to the widths
argument must
+be equal to the total number of columns in the table, including the row
+labels column, with each value of the vector corresponding to the
+maximum width (in characters) allowed in each column, from left to
+right.
Similarly, wrapping can be applied when exporting a table via one of
+the four export_as_*
functions and when implementing
+pagination via the paginate_table()
function from the
+rtables
package. In these cases, the rendered column widths
+are set using the colwidths
argument which takes input in
+the same format as the widths
argument of
+toString()
.
For example, wide_tbl
has four columns (1 row label
+column and 3 content columns) which we will set the widths of below to
+use in the rendered table. We set the width of the row label column to
+10 characters and the widths of each of the 3 content columns to 8
+characters. Any words longer than the specified width are broken and
+continued on the following line. By default there are 3 spaces
+separating each of the columns in the rendered table but this can be
+customized via the col_gap
argument to
+toString()
if further width customization is desired.
+result_wrap_cells <- toString(wide_tbl, widths = c(10, 8, 8, 8))
+matrix_wrap_cells <- matrix(strsplit(result_wrap_cells, "\n")[[1]], ncol = 1)
+matrix_wrap_cells
# [,1]
+# [1,] "Title that is too long and also needs to be wrapped to a smaller width"
+# [2,] "Subtitle that is also long and also needs to be wrapped to a smaller width"
+# [3,] ""
+# [4,] "———————————————————————————————————————————"
+# [5,] " Incredib This_col "
+# [6,] " ly long umn_name "
+# [7,] " column _should_ "
+# [8,] " name be_split "
+# [9,] " to be _somewhe C: Combi"
+# [10,] " wrapped re nation "
+# [11,] "———————————————————————————————————————————"
+# [12,] "ASIAN "
+# [13,] " AGE "
+# [14,] " Mean 32.50 36.68 36.99 "
+# [15,] " EOSDY "
+# [16,] " Mean Very Very Very "
+# [17,] " long long long "
+# [18,] " cell cell cell "
+# [19,] " contents contents contents"
+# [20,] " to_be_wr to_be_wr to_be_wr"
+# [21,] " apped_an apped_an apped_an"
+# [22,] " d_splitt d_splitt d_splitt"
+# [23,] " ed ed ed "
+# [24,] "BLACK OR "
+# [25,] "AFRICAN "
+# [26,] "AMERICAN "
+# [27,] " AGE "
+# [28,] " Mean 34.27 34.93 33.71 "
+# [29,] " EOSDY "
+# [30,] " Mean Very Very Very "
+# [31,] " long long long "
+# [32,] " cell cell cell "
+# [33,] " contents contents contents"
+# [34,] " to_be_wr to_be_wr to_be_wr"
+# [35,] " apped_an apped_an apped_an"
+# [36,] " d_splitt d_splitt d_splitt"
+# [37,] " ed ed ed "
+# [38,] "———————————————————————————————————————————"
+# [39,] ""
+# [40,] "Footnote that is wider than expected for this table."
+# [41,] ""
+# [42,] "Provenance footer material that is also wider than expected for this table."
+In the resulting output we can see that the table has been correctly +rendered using wrapping with a total width of 43 characters, but that +the titles and footers remain wider than the rendered table.
+In addition to wrapping column widths, titles and footers can be
+wrapped by setting tf_wrap = TRUE
in
+toString()
and setting the max_width
argument
+of toString()
to the maximum width (in characters) allowed
+for titles/footers. The four export_as_*
functions and
+paginate_table()
can also wrap titles/footers by setting
+the same two arguments. In the following code, we set
+max_width = 43
so that the rendered table and all of its
+annotations have a maximum width of 43 characters.
+result_wrap_cells_tf <- toString(
+ wide_tbl,
+ widths = c(10, 8, 8, 8),
+ tf_wrap = TRUE,
+ max_width = 43
+)
+matrix_wrap_cells_tf <- matrix(strsplit(result_wrap_cells_tf, "\n")[[1]], ncol = 1)
+matrix_wrap_cells_tf
# [,1]
+# [1,] "Title that is too long and also needs to be"
+# [2,] "wrapped to a smaller width"
+# [3,] "Subtitle that is also long and also needs"
+# [4,] "to be wrapped to a smaller width"
+# [5,] ""
+# [6,] "———————————————————————————————————————————"
+# [7,] " Incredib This_col "
+# [8,] " ly long umn_name "
+# [9,] " column _should_ "
+# [10,] " name be_split "
+# [11,] " to be _somewhe C: Combi"
+# [12,] " wrapped re nation "
+# [13,] "———————————————————————————————————————————"
+# [14,] "ASIAN "
+# [15,] " AGE "
+# [16,] " Mean 32.50 36.68 36.99 "
+# [17,] " EOSDY "
+# [18,] " Mean Very Very Very "
+# [19,] " long long long "
+# [20,] " cell cell cell "
+# [21,] " contents contents contents"
+# [22,] " to_be_wr to_be_wr to_be_wr"
+# [23,] " apped_an apped_an apped_an"
+# [24,] " d_splitt d_splitt d_splitt"
+# [25,] " ed ed ed "
+# [26,] "BLACK OR "
+# [27,] "AFRICAN "
+# [28,] "AMERICAN "
+# [29,] " AGE "
+# [30,] " Mean 34.27 34.93 33.71 "
+# [31,] " EOSDY "
+# [32,] " Mean Very Very Very "
+# [33,] " long long long "
+# [34,] " cell cell cell "
+# [35,] " contents contents contents"
+# [36,] " to_be_wr to_be_wr to_be_wr"
+# [37,] " apped_an apped_an apped_an"
+# [38,] " d_splitt d_splitt d_splitt"
+# [39,] " ed ed ed "
+# [40,] "———————————————————————————————————————————"
+# [41,] ""
+# [42,] "Footnote that is wider than expected for"
+# [43,] "this table."
+# [44,] ""
+# [45,] "Provenance footer material that is also"
+# [46,] "wider than expected for this table."
+vignettes/dev-guide/dg_debug_rtables.Rmd
+ dg_debug_rtables.Rmd
This is a short and non-comprehensive guide to debugging
+rtables
. Regardless, it is to be considered valid for
+personal use at your discretion.
Errors should be as close as possible to the source. For example, bad
+inputs should be found very early. The worst possible example is a
+software that is silently giving incorrect results. Common things that
+we can catch early are missing values, column length == 0
,
+or length > 1
.
debugcall
you can add the signature (formals)trace
is powerful because you can add the reactiontracer
is very good and precise to find where it
+happensoptions(error = recover)
is one of the best tools to
+debug at it is a core tool when developing that allows you to step into
+any point of the function call sequence.
dump.frames
and debugger
: it saves it to a
+file or an object and then you call debugger to step in it as you did
+recover.
warn
Global Option
+<0
ignored0
top level function call1
immediately as they occur>=2
throws errors<<-
for recover
or
+debugger
gives it to the global environment
browser()
can be used. For example, you
+can print the position or state of a function at a certain point until
+you find the break point.identity()
it is a step that does nothing but does not
+break the pipes)browser()
bombing%T>%
does print it midwaydebug_pipe()
-> it is like the T pipe going into
+browser()rtables
+We invite the smart developer to use the provided examples as a way
+to get an “interactive” and dynamic view of the internal algorithms as
+they are routinely executed when constructing tables with
+rtables
. This is achieved by using browser()
+and debugonce()
on internal and exported functions
+(rtables:::
or rtables::
), as we will see in a
+moment. We invite you to continuously and autonomously explore the
+multiple S3
and S4
objects that constitute the
+complexity and power of rtables
. To do so, we will use the
+following functions:
methods(generic_function)
: This function lists the
+methods that are available for a generic function. Specifically for
+S4
generic functions,
+showMethods(generic_function)
gives more detailed
+information about each method (e.g. inheritance).class(object)
: This function returns the class of an
+object. If the class is not one of the built-in classes in R, you can
+use this information to search for its documentation and examples.
+help(class)
may be informative as it will call the
+documentation of the specific class. Similarly, the ?
+operator will bring up the documentation page for different
+S4
methods. For S3
methods it is necessary to
+postfix the class name with a dot (e.g. ?summary.lm
).getClass(class)
: This describes the type of class in a
+compact way, the slots that it has, and the relationships that it may
+have with the other classes that may inherit from or be inherited by it.
+With getClass(object)
we can see to which values the slots
+of the object are assigned. It is possible to use
+str(object, max.level = 2)
to see less formal and more
+compact descriptions of the slots, but it may be problematic when there
+are one or more objects in the class slots. Hence, the maximum number of
+levels should always be limited to 2 or 3 (max.level = 2
).
+Similarly, attributes()
can be used to retrieve some
+information, but we need to remember that storing important variables in
+this way is not encouraged. Information regarding the type of class can
+be retrieved with mode()
and indirectly by
+summary()
and is.S4()
.
+*getAnywhere(function)
is very useful to get the source
+code of internal functions and specific generics. It works very well
+with S3
methods, and will display the relevant namespace
+for each of the methods found. Similarly,
+getMethod(S4_generic, S4_class)
can retrieve the source
+code of class-specific S4
methods.eval(debugcall(generic_function(obj)))
: this is a very
+useful way to browse a S4
method, specifically for a
+defined object, without having to manually insert browser()
+into the code. It is also possible to do similarly with R > 3.4.0
+where debug*()
calls can have the triggering signature
+(class) specified. Both of these are modern and simplified wrappers of
+the tracing function trace()
.vignettes/dev-guide/dg_notes.Rmd
+ dg_notes.Rmd
This is a collection of notes divided by issues and it is a working +document that will end up being a developer vignette one day.
+section_div
notes
+Everything in the layout is built over split objects, that reside in
+00_tabletrees.R
. There section_div
is defined
+internally in each split object as child_section_div
and
+assigned to NA_character
as default. This needs to be in
+all split objects that need to have a separator divisor. Object-wise,
+the virtual class Split
contains section_div
+and it has the following sub-classes. I tagged with “X” constructor that
+allows for section_div
to be assigned to a value different
+than NA_character
, and "NX"
otherwise.
## Loading required package: formatters
+## Loading required package: magrittr
+##
+## Attaching package: 'rtables'
+## The following object is masked from 'package:utils':
+##
+## str
+
+getClass("Split")
## Virtual Class "Split" [package "rtables"]
+##
+## Slots:
+##
+## Name: payload name split_label
+## Class: ANY character character
+##
+## Name: split_format split_na_str split_label_position
+## Class: FormatSpec character character
+##
+## Name: content_fun content_format content_na_str
+## Class: listOrNULL FormatSpec character
+##
+## Name: content_var label_children extra_args
+## Class: character logical list
+##
+## Name: indent_modifier content_indent_modifier content_extra_args
+## Class: integer integer list
+##
+## Name: page_title_prefix child_section_div
+## Class: character character
+##
+## Known Subclasses:
+## Class "CustomizableSplit", directly
+## Class "AllSplit", directly
+## Class "VarStaticCutSplit", directly
+## Class "VarDynCutSplit", directly
+## Class "VAnalyzeSplit", directly
+## Class "CompoundSplit", directly
+## Class "VarLevelSplit", by class "CustomizableSplit", distance 2
+## Class "MultiVarSplit", by class "CustomizableSplit", distance 2
+## Class "RootSplit", by class "AllSplit", distance 2
+## Class "ManualSplit", by class "AllSplit", distance 2
+## Class "CumulativeCutSplit", by class "VarStaticCutSplit", distance 2
+## Class "AnalyzeVarSplit", by class "VAnalyzeSplit", distance 2
+## Class "AnalyzeColVarSplit", by class "VAnalyzeSplit", distance 2
+## Class "AnalyzeMultiVars", by class "CompoundSplit", distance 2
+## Class "VarLevWBaselineSplit", by class "VarLevelSplit", distance 3
+
+# Known Subclasses:
+# ? Class "CustomizableSplit", directly # vclass used for grouping different split types (I guess)
+# Class "AllSplit", directly # NX
+# Class "VarStaticCutSplit", directly # X via make_static_cut_split
+# Class "VarDynCutSplit", directly # X
+# Class "VAnalyzeSplit", directly # X
+# ? Class "CompoundSplit", directly # Used only for AnalyzeMultiVars (maybe not needed?)
+# Class "VarLevelSplit", by class "CustomizableSplit", distance 2 # X
+# Class "MultiVarSplit", by class "CustomizableSplit", distance 2 # X
+# Class "RootSplit", by class "AllSplit", distance 2 # NX
+# Class "ManualSplit", by class "AllSplit", distance 2 # X
+# Class "CumulativeCutSplit", by class "VarStaticCutSplit", distance 2 # X via make_static_cut_split
+# Class "AnalyzeVarSplit", by class "VAnalyzeSplit", distance 2 # Virtual
+# Class "AnalyzeColVarSplit", by class "VAnalyzeSplit", distance 2 # X
+# Class "AnalyzeMultiVars", by class "CompoundSplit", distance 2 # X
+# Class "VarLevWBaselineSplit", by class "VarLevelSplit", distance 3 # NX
This can be updated only by related layout functions. The most
+important, that are covered by tests are analyze
and
+split_rows_by
.
Now it is relevant to understand where this information is saved in
+the table object built by build_table
. To do that we need
+to see where it is present and how it is assigned. Let’s go back to
+00tabletree.R
and look for
+trailing_section_div
. As classes definitions goes, you will
+notice from the search that trailing_section_div
is present
+in the virtual classes TableRow
and
+VTableTree
. In the following is the class hierarchy that
+makes `trailing_section_div:
+getClass("TableRow")
## Virtual Class "TableRow" [package "rtables"]
+##
+## Slots:
+##
+## Name: leaf_value var_analyzed label
+## Class: ANY character character
+##
+## Name: row_footnotes trailing_section_div level
+## Class: list character integer
+##
+## Name: name col_info format
+## Class: character InstantiatedColumnInfo FormatSpec
+##
+## Name: na_str indent_modifier table_inset
+## Class: character integer integer
+##
+## Extends:
+## Class "VLeaf", directly
+## Class "VTableNodeInfo", directly
+## Class "VNodeInfo", by class "VLeaf", distance 2
+##
+## Known Subclasses: "DataRow", "ContentRow", "LabelRow"
+
+# Extends:
+# Class "VLeaf", directly
+# Class "VTableNodeInfo", directly
+# Class "VNodeInfo", by class "VLeaf", distance 2
+#
+# Known Subclasses: "DataRow", "ContentRow", "LabelRow"
+
+getClass("VTableTree")
## Virtual Class "VTableTree" [package "rtables"]
+##
+## Slots:
+##
+## Name: children rowspans labelrow
+## Class: list data.frame LabelRow
+##
+## Name: page_titles horizontal_sep header_section_div
+## Class: character character character
+##
+## Name: trailing_section_div col_info format
+## Class: character InstantiatedColumnInfo FormatSpec
+##
+## Name: na_str indent_modifier table_inset
+## Class: character integer integer
+##
+## Name: level name main_title
+## Class: integer character character
+##
+## Name: subtitles main_footer provenance_footer
+## Class: character character character
+##
+## Extends:
+## Class "VTableNodeInfo", directly
+## Class "VTree", directly
+## Class "VTitleFooter", directly
+## Class "VNodeInfo", by class "VTableNodeInfo", distance 2
+##
+## Known Subclasses: "ElementaryTable", "TableTree"
+
+# Extends:
+# Class "VTableNodeInfo", directly
+# Class "VTree", directly
+# Class "VTitleFooter", directly
+# Class "VNodeInfo", by class "VTableNodeInfo", distance 2
+#
+# Known Subclasses: "ElementaryTable", "TableTree"
Always check the constructors after finding the classes. In the above
+case for example, the DataRow
and ContentRow
+share the constructor, so we do not need to add identical getter and
+setters for these two classes but only for the virtual class
+TableRow
. Different is the story for LabelRow
+which needs to be handle differently. Now, to understand why only these
+two have this feature, lets see the structure of a table built with
+section dividers:
+lyt <- basic_table() %>%
+ split_rows_by("ARM", section_div = "+") %>%
+ split_rows_by("STRATA1", section_div = "") %>%
+ analyze("AGE",
+ afun = function(x) list("Mean" = mean(x), "Standard deviation" = sd(x)),
+ format = list("Mean" = "xx.", "Standard deviation" = "xx."),
+ section_div = "~"
+ )
+
+tbl <- build_table(lyt, DM)
+
+print(tbl)
## all obs
+## ————————————————————————————————
+## A: Drug X
+## A
+## Mean 33
+## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+## Standard deviation 7
+##
+## B
+## Mean 35
+## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+## Standard deviation 7
+##
+## C
+## Mean 36
+## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+## Standard deviation 9
+## ++++++++++++++++++++++++++++++++
+## B: Placebo
+## A
+## Mean 32
+## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+## Standard deviation 6
+##
+## B
+## Mean 32
+## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+## Standard deviation 6
+##
+## C
+## Mean 34
+## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+## Standard deviation 7
+## ++++++++++++++++++++++++++++++++
+## C: Combination
+## A
+## Mean 36
+## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+## Standard deviation 7
+##
+## B
+## Mean 34
+## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+## Standard deviation 6
+##
+## C
+## Mean 34
+## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+## Standard deviation 6
+
+## [1] "TableTree"
+## attr(,"package")
+## [1] "rtables"
+
+# methods("trailing_section_div") # to see this please do devtools::load_all()
+# [1] trailing_section_div,LabelRow-method
+# trailing_section_div,TableRow-method
+# trailing_section_div,VTableTree-method
In the above, we show that trailing_section_div
has
+methods for TableRow
virtual object, LabelRow
,
+and VTableTree
. These three make the whole
+section_div
structure as the VTableTree
is
+present in TableTree
and ElementaryTable
that
+are the two main table objects. If these are not
+NA_character_
then the section_div
is printed
+at split divisions. The LabelRow
and TableRow
+are different as their assignment allows the row-wise modification of
+separators. When we have a special case for a ContentRow
,
+as it is represented as content_table(obj)
which is a
+one-line ElementaryTable
, while label row is turned off.
+Please take a moment to check the following setter:
+setMethod("section_div<-", "VTableTree", function(obj, value, only_sep_sections = FALSE) {
+ char_v <- as.character(value)
+ tree_depths <- unname(vapply(collect_leaves(obj), tt_level, numeric(1)))
+ max_tree_depth <- max(tree_depths)
+ stopifnot(is.logical(only_sep_sections))
+ .check_char_vector_for_section_div(char_v, max_tree_depth, nrow(obj))
+
+ # Automatic establishment of intent
+ if (length(char_v) < nrow(obj)) {
+ only_sep_sections <- TRUE
+ }
+
+ # Case where only separators or splits need to change externally
+ if (only_sep_sections && length(char_v) < nrow(obj)) {
+ if (length(char_v) == 1) {
+ char_v <- rep(char_v, max_tree_depth - 1) # -1 is the data row
+ }
+ # Case where char_v is longer than the max depth
+ char_v <- char_v[seq_len(min(max_tree_depth, length(char_v)))]
+ # Filling up with NAs the rest of the tree depth section div chr vector
+ missing_char_v_len <- max_tree_depth - length(char_v)
+ char_v <- c(char_v, rep(NA_character_, missing_char_v_len))
+ # char_v <- unlist(
+ # lapply(tree_depths, function(tree_depth_i) char_v[seq_len(tree_depth_i)]),
+ # use.names = FALSE
+ # )
+ }
+
+ # Retrieving if it is a contentRow (no need for labelrow to be visible in this case)
+ content_row_tbl <- content_table(obj)
+ is_content_table <- isS4(content_row_tbl) && nrow(content_row_tbl) > 0
+
+ # Main table structure change
+ if (labelrow_visible(obj) || is_content_table) {
+ if (only_sep_sections) {
+ # Only tables are modified
+ trailing_section_div(tt_labelrow(obj)) <- NA_character_
+ trailing_section_div(obj) <- char_v[1]
+ section_div(tree_children(obj), only_sep_sections = only_sep_sections) <- char_v[-1]
+ } else {
+ # All leaves are modified
+ trailing_section_div(tt_labelrow(obj)) <- char_v[1]
+ trailing_section_div(obj) <- NA_character_
+ section_div(tree_children(obj), only_sep_sections = only_sep_sections) <- char_v[-1]
+ }
+ } else {
+ section_div(tree_children(obj), only_sep_sections = only_sep_sections) <- char_v
+ }
+ obj
+})
only_sep_sections
is a parameter that is used to change
+only the separators (between splits) and not the data rows. It is
+happening forcefully if set to TRUE
, but it is
+automatically activated when section_div(tbl) <- char_v
+is a character vector of length < nrow(tbl)
. Notice that
+the exception for ContentRow
is activated by the switcher
+is_content_table
. This is because content rows do not have
+visible label row. You see that in the main table structure change we
+have two blocks depending on only_sep_sections
. If
+TRUE
only the VTableTree
are modified leading
+to only split section separators to be modified. Also consider looking
+at section_div
getter and tests in
+test-accessors.R
to have more insights on the structure.
+Also to understand exactly how this is bound to output, please check the
+result of make_row_df()
for the column
+trailing_sep
. Indeed, an alternative and iterative method
+is used by make_row_df
to retrieve the information about
+the separators for each table row. Being it a trailing separator by
+definition, we added header_section_div
as a function and a
+parameter of basic_table
, so to possibly add an empty line
+after the header (e.g. header_section_div(tbl) = " "
). This
+is not a trailing separator, but it is a separator that is added after
+the header. To close the circle, please check how
+trailing_sep
and header_section_div
is
+propagated and printed/used in formatters::toString
.
vignettes/dev-guide/dg_split_machinery.Rmd
+ dg_split_machinery.Rmd
This article is intended for use by developers only and will contain
+low-level explanations of the topics covered. For user-friendly
+vignettes, please see the Articles
+page on the rtables
website.
Any code or prose which appears in the version of this article on the
+main
branch of the repository may reflect a specific state
+of things that can be more or less recent. This guide describes very
+important pieces of the split machinery that are unlikely to change.
+Regardless, we invite the reader to keep in mind that the current
+repository code may have drifted from the following material in this
+document, and it is always the best practice to read the code directly
+on main
.
Please keep in mind that rtables
is still under active
+development, and it has seen the efforts of multiple contributors across
+different years. Therefore, there may be legacy mechanisms and ongoing
+transformations that could look different in the future.
Being that this a working document that may be subjected to both
+deprecation and updates, we keep xxx
comments to indicate
+placeholders for warnings and to-do’s that need further work.
The scope of this article is understanding how rtables
+creates facets by splitting the incoming data into hierarchical groups
+that go from the root node to singular rcell
s. The latter
+level, also called the leaf-level, contains the final partition that is
+subjected to analysis functions. More details from the user perspective
+can be found in the Split
+Functions vignette and in function documentation like
+?split_rows_by
and ?split_funcs
.
The following article will describe how the split machinery works in +the row domain. Further information on how the split machinery works in +the column domain will be covered in a separate article.
+Beforehand, we encourage the reader to familiarize themselves with
+the Debugging
+in {rtables} article from the rtables
Developers Guide.
+This document is generally valid for R programming, but has been
+tailored to study and understand complex packages that rely heavily on
+S3 and S4 object programming like rtables
.
Here, we explore and study the split machinery with a growing amount +of complexity, following relevant functions and methods throughout their +execution. By going from basic to complex and by discussing important +and special cases, we hope to be able to give you a good understanding +of how the split machinery works.
+In practice, the majority of the split engine resides in the source
+file R/split_funs.R
, with occasional incursion into
+R/make_split_fun.R
for custom split function creation and
+rarer references to other more general tabulation files.
do_split
+The split machinery is so fundamental to rtables
that
+relevant functions like do_split
are executed even when no
+split is requested. The following example shows how we can enter
+do_split
and start understanding the class hierarchy and
+the main split engine.
+library(rtables)
+# debugonce(rtables:::do_split) # Uncomment me to enter the function!!!
+basic_table() %>%
+ build_table(DM)
## all obs
+## ——————————
+In the following code, we copied the do_split
function
+code to allow the reader to go through the general structure with
+enhanced comments and sections. Each section in the code reflects
+roughly one section of this article.
+# rtables 0.6.2
+### NB This is called at EACH level of recursive splitting
+do_split <- function(spl,
+ df,
+ vals = NULL,
+ labels = NULL,
+ trim = FALSE,
+ spl_context) {
+ # - CHECKS - #
+ ## This will error if, e.g., df does not have columns
+ ## required by spl, or generally any time the split (spl)
+ ## can not be applied to df
+ check_validsplit(spl, df)
+
+ # - SPLIT FUNCTION - #
+ ## In special cases, we need to partition data (split)
+ ## in a very specific way, e.g. depending on the data or
+ ## external values. These can be achieved by using a custom
+ ## split function.
+
+ ## note the <- here!!!
+ if (!is.null(splfun <- split_fun(spl))) {
+ ## Currently split functions take df, vals, labels and
+ ## return list(values = ..., datasplit = ..., labels = ...),
+ ## with an optional additional 'extras' element
+ if (func_takes(splfun, ".spl_context")) {
+ ret <- tryCatch(
+ splfun(df, spl, vals, labels,
+ trim = trim,
+ .spl_context = spl_context
+ ),
+ error = function(e) e
+ ) ## rawvalues(spl_context))
+ } else {
+ ret <- tryCatch(splfun(df, spl, vals, labels, trim = trim),
+ error = function(e) e
+ )
+ }
+ if (is(ret, "error")) {
+ stop(
+ "Error applying custom split function: ", ret$message, "\n\tsplit: ",
+ class(spl), " (", payloadmsg(spl), ")\n",
+ "\toccured at path: ",
+ spl_context_to_disp_path(spl_context), "\n"
+ )
+ }
+ } else {
+ # - .apply_split_inner - #
+ ## This is called when no split function is provided. Please note that this function
+ ## will also probably be called when the split function is provided, as long as the
+ ## main splitting method is not willingly modified by the split function.
+ ret <- .apply_split_inner(df = df, spl = spl, vals = vals, labels = labels, trim = trim)
+ }
+
+ # - EXTRA - #
+ ## this adds .ref_full and .in_ref_col
+ if (is(spl, "VarLevWBaselineSplit")) {
+ ret <- .add_ref_extras(spl, df, ret)
+ }
+
+ # - FIXUPVALS - #
+ ## This:
+ ## - guarantees that ret$values contains SplitValue objects
+ ## - removes the extras element since its redundant after the above
+ ## - ensures datasplit and values lists are named according to labels
+ ## - ensures labels are character not factor
+ ret <- .fixupvals(ret)
+
+ # - RETURN - #
+ ret
+}
We will see where and how input parameters are used. The most
+important parameters are spl
and df
- the
+split objects and the input data.frame
, respectively.
We will start by looking at the first function called from
+do_split
. This will give us a good overview of how the
+split itself is defined. This function is, of course, the check function
+(check_validsplit
) that is used to verify if the split is
+valid for the data. In the following we will describe the split-class
+hierarchy step-by-step, but we invite the reader to explore this further
+on their own as well.
Let’s first search the package for check_validsplit
. You
+will find that it is defined as a generic in
+R/split_funs.R
, where it is applied to the following
+“split” classes: VarLevelSplit
, MultiVarSplit
,
+VAnalyzeSplit
, CompoundSplit
, and
+Split
. Another way to find this information, which is more
+useful for more spread out and complicated objects, is by using
+showMethods(check_validsplit)
. The virtual class
+VAnalyzeSplit
(by convention virtual classes start with
+“V”) defines the main parent of the analysis split which we discuss in
+detail in the related vignette vignette()
(xxx). From this,
+we can see that the analyze()
calls actually mimic split
+objects as they create different results under a specific final split
+(or node). Now, notice that check_validsplit
is also called
+in another location, the main R/tt_dotabulation.R
source
+file. This is again something related to making “analyze” rows as it
+mainly checks for VAnalyzeSplit
. See the Tabulation
+article for more details. We will discuss the other classes as they
+appear in our examples. See more about class hierarchy in the Table
+Hierarchy article.
For the moment, we see with class(spl)
(from the main
+do_split
function) that we are dealing with an
+AllSplit
object. By calling
+showMethods(check_validsplit)
we produce the following:
# rtables 0.6.2
+Function: check_validsplit (package rtables)
+spl="AllSplit"
+ (inherited from: spl="Split")
+spl="CompoundSplit"
+spl="MultiVarSplit"
+spl="Split"
+spl="VAnalyzeSplit"
+spl="VarLevelSplit"
+This means that each of the listed classes has a dedicated definition
+of check_validsplit
that may largely differ from the
+others. Only the class AllSplit
does not have its own
+function definition as it is inherited from the Split
+class. Therefore, we understand that AllSplit
is a parent
+class of Split
. This is one of the first definitions of a
+virtual class in the package and it is the only one that does not
+include the “V” prefix. These classes are defined along with their
+constructors in R/00tabletrees.R
. Reading about how
+AllSplit
is structured can be useful in understanding how
+split objects are expected to work. Please see the comments in the
+following:
+# rtables 0.6.2
+setClass("AllSplit", contains = "Split")
+
+AllSplit <- function(split_label = "",
+ cfun = NULL,
+ cformat = NULL,
+ cna_str = NA_character_,
+ split_format = NULL,
+ split_na_str = NA_character_,
+ split_name = NULL,
+ extra_args = list(),
+ indent_mod = 0L,
+ cindent_mod = 0L,
+ cvar = "",
+ cextra_args = list(),
+ ...) {
+ if (is.null(split_name)) { # If the split has no name
+ if (nzchar(split_label)) { # (std is "")
+ split_name <- split_label
+ } else {
+ split_name <- "all obs" # No label, a standard split with all
+ # observations is assigned.
+ }
+ }
+ new("AllSplit",
+ split_label = split_label,
+ content_fun = cfun,
+ content_format = cformat,
+ content_na_str = cna_str,
+ split_format = split_format,
+ split_na_str = split_na_str,
+ name = split_name,
+ label_children = FALSE,
+ extra_args = extra_args,
+ indent_modifier = as.integer(indent_mod),
+ content_indent_modifier = as.integer(cindent_mod),
+ content_var = cvar,
+ split_label_position = "hidden",
+ content_extra_args = cextra_args,
+ page_title_prefix = NA_character_,
+ child_section_div = NA_character_
+ )
+}
We can also print this information by calling
+getClass("AllSplit")
for the general slot definition, or by
+calling getClass(spl)
. Note that the first call will give
+also a lot of information about the class hierarchy. For more
+information regarding class hierarchy, please refer to the relevant
+article here.
+We will discuss the majority of the slots by the end of this document.
+Now, let’s see if we can find some of the values described in the
+constructor within our object. To do so, we will show the more compact
+representation given by str
. When there are multiple and
+hierarchical slots that contain objects themselves, calling
+str
will be much less or not at all informative if the
+maximum level of nesting is not set
+(e.g. max.level = 2
).
# rtables 0.6.2
+Browse[2]> str(spl, max.level = 2)
+Formal class 'AllSplit' [package "rtables"] with 17 slots
+ ..@ payload : NULL
+ ..@ name : chr "all obs"
+ ..@ split_label : chr ""
+ ..@ split_format : NULL
+ ..@ split_na_str : chr NA
+ ..@ split_label_position : chr "hidden"
+ ..@ content_fun : NULL
+ ..@ content_format : NULL
+ ..@ content_na_str : chr NA
+ ..@ content_var : chr ""
+ ..@ label_children : logi FALSE
+ ..@ extra_args : list()
+ ..@ indent_modifier : int 0
+ ..@ content_indent_modifier: int 0
+ ..@ content_extra_args : list()
+ ..@ page_title_prefix : chr NA
+ ..@ child_section_div : chr NA
Details about these slots will become necessary in future examples,
+and we will deal with them at that time. Now, we gave you a hint of the
+complex class hierarchy that makes up rtables
, and how to
+explore it autonomously. Let’s go forward in do_split
. In
+our case, with AllSplit
inherited from Split
,
+we are sure that the called function will be the following (read the
+comment!):
.apply_split_inner
+Before diving into custom split functions, we need to take a moment
+to analyze how .apply_split_inner
works. This function is
+routinely called whether or not we have a split function. Let’s see why
+this is the case by entering it with
+debugonce(.apply_split_inner)
. Of course, we are still
+currently browsing within do_split
in debug mode from the
+first example. We print and comment on the function in the
+following:
+# rtables 0.6.2
+.apply_split_inner <- function(spl, df, vals = NULL, labels = NULL, trim = FALSE) {
+ # - INPUTS - #
+ # In this case .applysplit_rawvals will attempt to find the split values if vals is NULL.
+ # Please notice that there may be a non-mutually exclusive set or subset of elements that
+ # will constitute the split.
+
+ # - SPLIT VALS - #
+ ## Try to calculate values first - most of the time we can
+ if (is.null(vals)) {
+ vals <- .applysplit_rawvals(spl, df)
+ }
+
+ # - EXTRA PARAMETERS - #
+ # This call extracts extra parameters from the split, according to the split values
+ extr <- .applysplit_extras(spl, df, vals)
+
+ # If there are no values to do the split upon, we return an empty final split
+ if (is.null(vals)) {
+ return(list(
+ values = list(),
+ datasplit = list(),
+ labels = list(),
+ extras = list()
+ ))
+ }
+
+ # - DATA SUBSETTING - #
+ dpart <- .applysplit_datapart(spl, df, vals)
+
+ # - LABEL RETRIEVAL - #
+ if (is.null(labels)) {
+ labels <- .applysplit_partlabels(spl, df, vals, labels)
+ } else {
+ stopifnot(names(labels) == names(vals))
+ }
+
+ # - TRIM - #
+ ## Get rid of columns that would not have any observations,
+ ## but only if there were any rows to start with - if not
+ ## we're in a manually constructed table column tree
+ if (trim) {
+ hasdata <- sapply(dpart, function(x) nrow(x) > 0)
+ if (nrow(df) > 0 && length(dpart) > sum(hasdata)) { # some empties
+ dpart <- dpart[hasdata]
+ vals <- vals[hasdata]
+ extr <- extr[hasdata]
+ labels <- labels[hasdata]
+ }
+ }
+
+ # - ORDER RESULTS - #
+ # Finds relevant order depending on spl_child_order()
+ if (is.null(spl_child_order(spl)) || is(spl, "AllSplit")) {
+ vord <- seq_along(vals)
+ } else {
+ vord <- match(
+ spl_child_order(spl),
+ vals
+ )
+ vord <- vord[!is.na(vord)]
+ }
+
+ ## FIXME: should be an S4 object, not a list
+ ret <- list(
+ values = vals[vord],
+ datasplit = dpart[vord],
+ labels = labels[vord],
+ extras = extr[vord]
+ )
+ ret
+}
After reading through .apply_split_inner
, we see that
+there are some fundamental functions - defined strictly for internal use
+(by convention they start with “.”) - that are generics and depend on
+the kind of split in input. R/split_funs.R
is very kind and
+groups generic definitions at the beginning of the file. These functions
+are the main dispatchers for the majority of the split machinery. This
+is a clear example that shows how using S4
logic enables
+better clarity and flexibility in programming, allowing for easy
+extension of the program. For compactness we also show the
+showMethods
result for each generic.
+# rtables 0.6.2
+# Retrieves the values that will constitute the splits (facets), not necessarily a unique list.
+# They could come from the data cuts for example -> it can be anything that produces a set of strings.
+setGeneric(
+ ".applysplit_rawvals",
+ function(spl, df) standardGeneric(".applysplit_rawvals")
+)
+# Browse[2]> showMethods(.applysplit_rawvals)
+# Function: .applysplit_rawvals (package rtables)
+# spl="AllSplit"
+# spl="ManualSplit"
+# spl="MultiVarSplit"
+# spl="VAnalyzeSplit"
+# spl="VarLevelSplit"
+# spl="VarStaticCutSplit"
+# Nothing here is inherited from the virtual class Split!!!
+
+# Contains the subset of the data (default, but these can overlap and can also NOT be mutually exclusive).
+setGeneric(
+ ".applysplit_datapart",
+ function(spl, df, vals) standardGeneric(".applysplit_datapart")
+)
+# Same as .applysplit_rawvals
+
+# Extract the extra parameter for the split
+setGeneric(
+ ".applysplit_extras",
+ function(spl, df, vals) standardGeneric(".applysplit_extras")
+)
+# Browse[2]> showMethods(.applysplit_extras)
+# Function: .applysplit_extras (package rtables)
+# spl="AllSplit"
+# (inherited from: spl="Split")
+# spl="Split"
+# This means there is only a function for the virtual class Split.
+# So all splits behave the same!!!
+
+# Split label retrieval and assignment if visible.
+setGeneric(
+ ".applysplit_partlabels",
+ function(spl, df, vals, labels) standardGeneric(".applysplit_partlabels")
+)
+# Browse[2]> showMethods(.applysplit_partlabels)
+# Function: .applysplit_partlabels (package rtables)
+# spl="AllSplit"
+# (inherited from: spl="Split")
+# spl="MultiVarSplit"
+# spl="Split"
+# spl="VarLevelSplit"
+
+setGeneric(
+ "check_validsplit", # our friend
+ function(spl, df) standardGeneric("check_validsplit")
+)
+# Note: check_validsplit is an internal function but may one day be exported.
+# This is why it does not have the "." prefix.
+
+setGeneric(
+ ".applysplit_ref_vals",
+ function(spl, df, vals) standardGeneric(".applysplit_ref_vals")
+)
+# Browse[2]> showMethods(.applysplit_ref_vals)
+# Function: .applysplit_ref_vals (package rtables)
+# spl="Split"
+# spl="VarLevWBaselineSplit"
Now, we know that .applysplit_extras
is the function
+that will be called first. This is because we did not specify any
+vals
and it is therefore NULL
. This is an
+S4
generic function as can be seen by
+showMethod(.applysplit_extras)
, and its definition can be
+seen in the following:
# rtables 0.6.2
+Browse[3]> getMethod(".applysplit_rawvals", "AllSplit")
+Method Definition:
+
+function (spl, df)
+obj_name(spl)
+
+Signatures:
+ spl
+target "AllSplit"
+defined "AllSplit"
+
+# What is obj_name -> slot in spl
+Browse[3]> obj_name(spl)
+[1] "all obs"
+
+# coming from
+Browse[3]> getMethod("obj_name", "Split")
+Method Definition:
+
+function (obj)
+obj@name ##### Slot that we could see from str(spl, max.level = 2)
+
+Signatures:
+ obj
+target "Split"
+defined "Split"
Then we have .applysplit_extras
, which simply extracts
+the extra arguments from the split objects and assigns them to their
+relative split values. This function will be covered in more detail in a
+later section. If still no split values are available, the function will
+exit here with an empty split. Otherwise, the data will be divided into
+different splits or data subsets (facets) with
+.applysplit_datapart
. In our current example, the resulting
+list comprises the whole input dataset (do
+getMethod(".applysplit_datapart", "AllSplit")
and the list
+will be evident: function (spl, df, vals) list(df)
).
Next, split labels are checked. If they are not present, split values
+(vals
) will be used with
+.applysplit_partlabels
, transformed into
+as.character(vals)
if applied to a Split
+object. Otherwise, the inserted labels are checked against the names of
+split values.
Lastly, the split values are ordered according to
+spl_child_order
. In our case, which concerns the general
+AllSplit
, the sorting will not happen, i.e. it will be
+dependent simply on the number of split values
+(seq_along(vals)
).
In the following, we demonstrate how row splits work using the
+features that we have already described. We will add two splits and see
+how the behavior of do_split
changes. Note that if we do
+not add an analyze
call the split will behave as before,
+giving an empty table with all observations. By default, calling
+analyze
on a variable will calculate the mean for each data
+subset that has been generated by the splits. We want to go beyond the
+first call of do_split
that is by design applied on all
+observations, with the purpose of generating the root split that
+contains all data and all splits (indeed AllSplit
). To
+achieve this we use debug(rtables:::do_split)
instead of
+debugonce(rtables:::do_split)
as we will need to step into
+each of the splits. Alternatively, it is possible to use the more
+powerful trace
function to enter in cases where input is
+from a specific class. To do so, the following can be used:
+trace("do_split", quote(if(!is(spl, "AllSplit")) browser()), where = asNamespace("rtables"))
.
+Note that we specify the namespace with where
. Multiple
+tracer elements can be added with expression(E1, E2)
, which
+is the same as c(quote(E1), quote(E2))
. Specific
+steps can be specified with the at
parameter.
+Remember to call
+untrace("do_split", quote(if(!is(spl, "AllSplit")) browser()), where = asNamespace("rtables"))
+once finished to remove the trace.
+# rtables 0.6.2
+library(rtables)
+library(dplyr)
+
+# This filter is added to avoid having too many calls to do_split
+DM_tmp <- DM %>%
+ filter(ARM %in% names(table(DM$ARM)[1:2])) %>% # limit to two
+ filter(SEX %in% c("M", "F")) %>% # limit to two
+ mutate(SEX = factor(SEX), ARM = factor(ARM)) # to drop unused levels
+
+# debug(rtables:::do_split)
+lyt <- basic_table() %>%
+ split_rows_by("ARM") %>%
+ split_rows_by("SEX") %>%
+ analyze("BMRKR1") # analyze() is needed for the table to have non-label rows
+
+lyt %>%
+ build_table(DM_tmp)
## all obs
+## ————————————————————
+## A: Drug X
+## F
+## Mean 6.06
+## M
+## Mean 5.42
+## B: Placebo
+## F
+## Mean 6.24
+## M
+## Mean 5.97
+
+# undebug(rtables:::do_split)
Before continuing, we want to check the formal class of
+spl
.
# rtables 0.6.2
+Browse[2]> str(spl, max.level = 2)
+Formal class 'VarLevelSplit' [package "rtables"] with 20 slots
+ ..@ value_label_var : chr "ARM"
+ ..@ value_order : chr [1:2] "A: Drug X" "B: Placebo"
+ ..@ split_fun : NULL
+ ..@ payload : chr "ARM"
+ ..@ name : chr "ARM"
+ ..@ split_label : chr "ARM"
+ ..@ split_format : NULL
+ ..@ split_na_str : chr NA
+ ..@ split_label_position : chr "hidden"
+ ..@ content_fun : NULL
+ ..@ content_format : NULL
+ ..@ content_na_str : chr NA
+ ..@ content_var : chr ""
+ ..@ label_children : logi NA
+ ..@ extra_args : list()
+ ..@ indent_modifier : int 0
+ ..@ content_indent_modifier: int 0
+ ..@ content_extra_args : list()
+ ..@ page_title_prefix : chr NA
+ ..@ child_section_div : chr NA
From this, we can directly infer that the class is different now
+(VarLevelSplit
) and understand that the split label will be
+hidden (split_label_position
slot). Moreover, we see a
+specific value order with specific split values.
+VarLevelSplit
also seems to have three more slots than
+AllSplit
. What are they precisely?
+# rtables 0.6.2
+slots_as <- getSlots("AllSplit") # inherits virtual class Split and is general class for all splits
+# getClass("CustomizableSplit") # -> Extends: "Split", Known Subclasses: Class "VarLevelSplit", directly
+slots_cs <- getSlots("CustomizableSplit") # Adds split function
+slots_vls <- getSlots("VarLevelSplit")
+
+slots_cs[!(names(slots_cs) %in% names(slots_as))]
+# split_fun
+# "functionOrNULL"
+slots_vls[!(names(slots_vls) %in% names(slots_cs))]
+# value_label_var value_order
+# "character" "ANY"
Remember to always check the constructor and class definition in
+R/00tabletrees.R
if exploratory tools do not suffice. Now,
+check_validsplit(spl, df)
will use a different method than
+before (getMethod("check_validsplit", "VarLevelSplit")
). It
+uses the internal utility function .checkvarsok
to check if
+vars
, i.e. the payload
, is actually present in
+names(df)
.
The next relevant function will be .apply_split_inner
,
+and we will exactly what changes using
+debugonce(.apply_split_inner)
. Of course, this function is
+called directly as no custom split function is provided. Since parameter
+vals
is not specified (NULL
), the split values
+are retrieved from df
by using the split payload to select
+specific columns (varvec <- df[[spl_payload(spl)]]
).
+Whenever no split values are specified they are retrieved from the
+selected column as unique values (character
) or levels
+(factor
).
Next, .applysplit_datapart
creates a named list of
+facets or data subsets. In this case, the result is actually a mutually
+exclusive partition of the data. This is because we did not specify any
+split values and as such the column content was retrieved via
+unique
(in case of a character vector) or
+levels
(in case of factors).
+.applysplit_partlabels
is a bit less linear as it has to
+take into account the possibility of having specified labels in the
+payload. Instead of looking at the function source code with
+getMethod(".applysplit_partlabels", "VarLevelSplit")
, we
+can enter the S4
generic function in debugging mode as
+follows:
+# rtables 0.6.2
+eval(debugcall(.applysplit_partlabels(spl, df, vals, labels)))
+# We leave to the smart developer to see how the labels are assigned
+
+# Remember to undebugcall() similarly!
In our case, the final labels are vals
because they were
+not explicitly assigned. Their order is retrieved from the split object
+(spl_child_order(spl)
) and matched with current split
+values. The returned list is then processed as it was before.
If we continue with the next call of do_split
, the same
+procedure is followed for the second ARM
split. This is
+applied to the partition that was created in the first split. The main
+df
is now constituted by a subset (facet) of the total
+data, determined by the first split. This will be repeated iteratively
+for as many data splits as requested. Before concluding this iteration,
+we take a moment to discuss in detail how
+.fixupvals(partinfo)
works. This is not a generic function
+and the source code can be easily accessed. We suggest running through
+it with debugonce(.fixupvals)
to understand what it does in
+practice. The fundamental aspects of .fixupvals(partinfo)
+are as follows:
ret$values
contains
+SplitValue
objects.extra
since it is now included
+in the SplitValue
.Note that this function can occasionally be called more than once on +the same return object (a named list for now). Of course, after the +first call only checks are applied.
+# rtables 0.6.2
+
+# Can find the following core function:
+# vals <- make_splvalue_vec(vals, extr, labels = labels)
+# ---> Main list of SplitValue objects: iterative call of
+# new("SplitValue", value = val, extra = extr, label = label)
+
+# Structure of ret before calling .fixupvals
+Browse[2]> str(ret, max.level = 2)
+List of 4
+ $ values : chr [1:2] "A: Drug X" "B: Placebo"
+ $ datasplit:List of 2
+ ..$ A: Drug X : tibble [121 × 8] (S3: tbl_df/tbl/data.frame)
+ ..$ B: Placebo: tibble [106 × 8] (S3: tbl_df/tbl/data.frame)
+ $ labels : Named chr [1:2] "A: Drug X" "B: Placebo"
+ ..- attr(*, "names")= chr [1:2] "A: Drug X" "B: Placebo"
+ $ extras :List of 2
+ ..$ : list()
+ ..$ : list()
+
+# Structure of ret after the function call
+Browse[2]> str(.fixupvals(ret), max.level = 2)
+List of 3
+ $ values :List of 2
+ ..$ A: Drug X :Formal class 'SplitValue' [package "rtables"] with 3 slots
+ ..$ B: Placebo:Formal class 'SplitValue' [package "rtables"] with 3 slots
+ $ datasplit:List of 2
+ ..$ A: Drug X : tibble [121 × 8] (S3: tbl_df/tbl/data.frame)
+ ..$ B: Placebo: tibble [106 × 8] (S3: tbl_df/tbl/data.frame)
+ $ labels : Named chr [1:2] "A: Drug X" "B: Placebo"
+ ..- attr(*, "names")= chr [1:2] "A: Drug X" "B: Placebo"
+
+# The SplitValue object is fundamental
+Browse[2]> str(ret$values)
+List of 2
+ $ A: Drug X :Formal class 'SplitValue' [package "rtables"] with 3 slots
+ .. ..@ extra: list()
+ .. ..@ value: chr "A: Drug X"
+ .. ..@ label: chr "A: Drug X"
+ $ B: Placebo:Formal class 'SplitValue' [package "rtables"] with 3 slots
+ .. ..@ extra: list()
+ .. ..@ value: chr "B: Placebo"
+ .. ..@ label: chr "B: Placebo"
We start by examining a split function that is already defined in
+rtables
. Its scope is filtering out specific values as
+follows:
+library(rtables)
+# debug(rtables:::do_split) # uncomment to see into the main split function
+basic_table() %>%
+ split_rows_by("SEX", split_fun = drop_split_levels) %>%
+ analyze("BMRKR1") %>%
+ build_table(DM)
## all obs
+## ————————————————
+## F
+## Mean 6.04
+## M
+## Mean 5.64
+
+# undebug(rtables:::do_split)
+
+# This produces the same output as before (when filters were used)
After the root split, we enter the split based on SEX
.
+As we have specified a split function, we can retrieve the split
+function by using splfun <- split_fun(spl)
and enter an
+if-else statement for the two possible cases: whether there is split
+context or not. In both cases, an error catching framework is used to
+give informative errors in case of failure. Later we will see more in
+depth how this works.
We invite the reader to always keep an eye on
+spl_context
, as it is fundamental to more sophisticated
+splits, e.g. in the cases where the split itself depends mainly on
+preceding splits or values. When the split function is called, please
+take a moment to look at how drop_split_levels
is defined.
+You will see that the function is fundamentally a wrapper of
+.apply_split_inner
that drops empty factor levels,
+therefore avoiding empty splits.
+# rtables 0.6.2
+# > drop_split_levels
+function(df,
+ spl,
+ vals = NULL,
+ labels = NULL,
+ trim = FALSE) {
+ # Retrieve split column
+ var <- spl_payload(spl)
+ df2 <- df
+
+ ## This call is exactly the one we used when filtering to get rid of empty levels
+ df2[[var]] <- factor(df[[var]])
+
+ ## Our main function!
+ .apply_split_inner(spl, df2,
+ vals = vals,
+ labels = labels,
+ trim = trim
+ )
+}
There are many pre-made split functions included in
+rtables
. A list of these functions can be found in the Split
+Functions vignette, or via ?split_funcs
. We leave it to
+the developer to look into how some of these split functions work, in
+particular trim_levels_to_map
may be of interest.
Now we will create a custom split function. Firstly, we will see how
+the system manages error messages. For a general understanding of how
+custom split functions are created, please read the Custom
+Split Functions section of the Advanced Usage vignette or see
+?custom_split_funs
. In the following code we use
+browser()
to enter our custom split functions. We invite
+the reader to activate options(error = recover)
to
+investigate cases where we encounter an error. Note that you can revert
+to default behavior by restarting your R
session, by
+caching the default option value, or by using callr
to
+retrieve the default as follows:
+default_opts <- callr::r(function(){options()}); options(error = default_opts$error)
.
+# rtables 0.6.2
+# Table call with only the function changing
+simple_table <- function(DM, f) {
+ lyt <- basic_table() %>%
+ split_rows_by("ARM", split_fun = f) %>%
+ analyze("BMRKR1")
+
+ lyt %>%
+ build_table(DM)
+}
+# First round will fail because there are unused arguments
+exploratory_split_fun <- function(df, spl) NULL
+# debug(rtables:::do_split)
+err_msg <- tryCatch(simple_table(DM, exploratory_split_fun), error = function(e) e)
+# undebug(rtables:::do_split)
+
+message(err_msg$message)
## Error applying custom split function: unused arguments (vals, labels, trim = trim)
+## split: VarLevelSplit (ARM)
+## occured at path: root
+The commented debugging lines above will allow you to inspect the
+error. Alternatively, using the recover option will allow you the
+possibility to select the frame number, i.e. the trace level, to enter.
+Selecting the last frame number (10 in this case) will allow you to see
+the value of ret
from rtables:::do_split
that
+causes the error and how the informative error message that follows is
+created.
# rtables 0.6.2
+# Debugging level
+10: tt_dotabulation.R#627: do_split(spl, df, spl_context = spl_context)
+
+# Original call and final error
+> simple_table(DM, exploratory_split_fun)
+Error in do_split(spl, df, spl_context = spl_context) :
+ Error applying custom split function: unused arguments (vals, labels, trim = trim) # This is main error
+ split: VarLevelSplit (ARM) # Split reference
+ occured at path: root # Path level (where it occurred)
The previous split function fails because
+exploratory_split_fun
is given more arguments than it
+accepts. A simple way to avoid this is to add ...
to the
+function call. Now let’s construct an interesting split function (and
+error):
+# rtables 0.6.2
+f_brakes_if <- function(split_col = NULL, error = FALSE) {
+ function(df, spl, ...) { # order matters! more than naming
+ # browser() # To check how it works
+ if (is.null(split_col)) { # Retrieves the default
+ split_col <- spl_variable(spl) # Internal accessor to split obj
+ }
+ my_payload <- split_col # Changing split column value
+
+ vals <- levels(df[[my_payload]]) # Extracting values to split
+ datasplit <- lapply(seq_along(vals), function(i) {
+ df[df[[my_payload]] == vals[[i]], ]
+ })
+ names(datasplit) <- as.character(vals)
+
+ # Error
+ if (isTRUE(error)) {
+ # browser() # If you need to check how it works
+ mystery_error_values <- sapply(datasplit, function(x) mean(x$BMRKR1))
+ if (any(mystery_error_values > 6)) {
+ stop(
+ "It should not be more than 6! Should it be? Found in split values: ",
+ names(datasplit)[which(mystery_error_values > 6)]
+ )
+ }
+ }
+
+ # Handy function to return a split result!!
+ make_split_result(vals, datasplit, vals)
+ }
+}
+simple_table(DM, f_brakes_if()) # works!
## all obs
+## ————————————————————————
+## A: Drug X
+## Mean 5.79
+## B: Placebo
+## Mean 6.11
+## C: Combination
+## Mean 5.69
+
+simple_table(DM, f_brakes_if(split_col = "STRATA1")) # works!
## all obs
+## ————————————————
+## A
+## Mean 5.95
+## B
+## Mean 5.90
+## C
+## Mean 5.71
+
+# simple_table(DM, f_brakes_if(error = TRUE)) # does not work, but returns an informative message
+
+# Error in do_split(spl, df, spl_context = spl_context) :
+# Error applying custom split function: It should not be more than 6! Should it be? Found in split values: B: Placebo
+# split: VarLevelSplit (ARM)
+# occurred at path: root
Now we will take a moment to dwell on the machinery included in
+rtables
to create custom split functions. Before doing so,
+please read the relevant documentation at ?make_split_fun
.
+Most of the pre-made split functions included in rtables
+are or will be written with make_split_fun
as it is a more
+stable constructor for such functions than was previously used. We
+invite the reader to take a look at make_split_fun.R
. The
+majority of the functions here should be understandable with the
+knowledge you have gained from this guide so far. It is important to
+note that if no core split function is specified, which is commonly the
+case, make_split_fun
calls do_base_split
+directly, which is a minimal wrapper of the well-known
+do_split
. drop_facet_levels
, for example, is a
+pre-processing function that at its core simply removes empty factor
+levels from the split “column”, thus avoiding showing empty lines.
It is also possible to provide a list of functions, as it can be seen
+in the examples of ?make_split_fun
. Note that pre- and
+post-processing requires a list as input to support the possibility of
+combining multiple functions. In contrast, the core splitting function
+must be a single function call as it is not expected to have stacked
+features. This rarely needs to be modified and the majority of the
+included split functions work with pre- or post-processing. Included
+post-processing functions are interesting as they interact with the
+split object, e.g. by reordering the facets or by adding an overall
+facet (add_overall_facet
). The attentive reader will have
+noticed that the core function relies on do_split
and many
+of the post-processing functions rely on make_split_result
,
+which is the best way to get the correct split return structure. Note
+that modifying the core split only works in the row space at the
+moment.
.spl_context
- Adding Context to Our Splits
+The best way to understand what split context does, and how to use
+it, is to read the Leveraging
+.spl_context
section of the Advanced Usage vignette,
+and to use browser()
within a split function to see how it
+is structured. As .spl_context
is needed for rewriting core
+functions, we propose a wrapper of do_base_split
here,
+which is a handy redirection to the standard do_split
+without the split function part (i.e. it is a wrapper of
+.apply_split_inner
, the real core splitting machinery). Out
+of curiosity, we set trim = TRUE
here. This trimming only
+works when there is a mixed table (some values are 0s and some have
+content), for which it will trim 0s. This is rarely the case, and we
+encourage using the replacement functions
+trim_levels_to_group
and trim_levels_to_map
+for trimming. Nowadays, it should even be impossible to set it
+differently from trim = FALSE
.
(write an issue informative error for not list xxx).
+
+# rtables 0.6.2
+browsing_f <- function(df, spl, .spl_context, ...) {
+ # browser()
+ # do_base_split(df, spl, ...) # order matters!! This would fail if done
+ do_base_split(spl = spl, df = df, vals = NULL, labels = NULL, trim = TRUE)
+}
+
+fnc_tmp <- function(innervar) { # Exploring trim_levels_in_facets (check its form)
+ function(ret, ...) {
+ # browser()
+ for (var in innervar) { # of course AGE is not here, so nothing is dropped!!
+ ret$datasplit <- lapply(ret$datasplit, function(df) {
+ df[[var]] <- factor(df[[var]])
+ df
+ })
+ }
+ ret
+ }
+}
+
+basic_table() %>%
+ split_rows_by("ARM") %>%
+ split_rows_by("STRATA1") %>%
+ split_rows_by_cuts("AGE",
+ cuts = c(0, 50, 100),
+ cutlabels = c("young", "old")
+ ) %>%
+ split_rows_by("SEX", split_fun = make_split_fun(
+ pre = list(drop_facet_levels), # This is dropping the SEX levels (AGE is upper level)
+ core_split = browsing_f,
+ post = list(fnc_tmp("AGE")) # To drop these we should use a split_fun in the above level
+ )) %>%
+ summarize_row_groups() %>%
+ build_table(DM)
# The following is the .spl_contest printout:
+Browse[1]> .spl_context
+ split value full_parent_df all_cols_n all obs
+1 root root c("S1", .... 356 TRUE, TR....
+2 ARM A: Drug X c("S6", .... 121 TRUE, TR....
+3 STRATA1 A c("S14",.... 36 TRUE, TR....
+4 AGE young c("S14",.... 36 TRUE, TR....
+
+# NOTE: make_split_fun(pre = list(drop_facet_levels)) and drop_split_levels
+# do the same thing in this case
Here we can see what the split column variable is
+(split
, first column) at this level of the splitting
+procedure. value
is the current split value that is being
+dealt with. For the next column, let’s see the number of rows of these
+data frames:
+sapply(.spl_context$full_parent_df, nrow) # [1] 356 121 36 36
.
+Indeed, the root
level contains the full input data frame,
+while the other levels are subgroups of the full data according to the
+split value. all_cols_n
shows exactly the numbers just
+described. all obs
is the current filter applied to the
+columns. Applying this to the root data (or the row subgroup data)
+reveals the current column-wise facet (or row-wise for a row split). It
+is also possible to use the same information to make complex splits in
+the column space by using the full data frame and the value splits to
+select the interested values. This is something we will change and
+simplify within rtables
as the need becomes apparent.
extra_args
+This functionality is well-known and used in the setting of analysis +functions (a somewhat complicated example of this can be found in the Example +Complex Analysis Function vignette), but we will show here how this +can also apply to splits.
+
+# rtables 0.6.2
+
+# Let's use the tracer!!
+my_tracer <- quote(if (length(spl@extra_args) > 0) browser())
+trace(
+ what = "do_split",
+ tracer = my_tracer,
+ where = asNamespace("rtables")
+)
+
+custom_mean_var <- function(var) {
+ function(df, labelstr, na.rm = FALSE, ...) {
+ # browser()
+ mean(df[[var]], na.rm = na.rm)
+ }
+}
+
+DM_ageNA <- DM
+DM_ageNA$AGE[1] <- NA
+
+basic_table() %>%
+ split_rows_by("ARM") %>%
+ split_rows_by("SEX", split_fun = drop_split_levels) %>%
+ summarize_row_groups(
+ cfun = custom_mean_var("AGE"),
+ extra_args = list(na.rm = TRUE), format = "xx.x",
+ label_fstr = "label %s"
+ ) %>%
+ # content_extra_args, c_extra_args are different slots!! (xxx)
+ split_rows_by("STRATA1", split_fun = keep_split_levels("A")) %>%
+ analyze("AGE") %>% # check with the extra_args (xxx)
+ build_table(DM_ageNA)
+# You can pass extra_args down to other splits. It is possible this will not not
+# work. Should it? That is why extra_args lives only in splits (xxx) check if it works
+# as is. Difficult to find an use case for this. Maybe it could work for the ref_group
+# info. That does not work with nesting already (fairly sure that it will break stuff).
+# Does it make sense to have more than one ref_group at any point of the analysis? No docs,
+# send a warning if users try to nest things with ref_group (that is passed around via
+# extra_args)
+
+# As we can see that was not possible. What if we now force it a bit?
+my_split_fun <- function(df, spl, .spl_context, ...) {
+ spl@extra_args <- list(na.rm = TRUE)
+ # does not work because do_split is not changing the object
+ # the split does not do anything with it
+ drop_split_levels(df, spl)
+} # does not work
+
+basic_table() %>%
+ split_rows_by("ARM") %>%
+ split_rows_by("SEX", split_fun = my_split_fun) %>%
+ analyze("AGE", inclNAs = TRUE, afun = mean) %>% # include_NAs is set FALSE
+ build_table(DM_ageNA)
+# extra_args is in available in cols but not in rows, because different columns
+# may need it for different col space. Row-wise it seems not necessary.
+# The only thing that works is adding it to analyze (xxx) check if it is worth adding
+
+# We invite the developer now to test all the test files of this package with the tracer on
+# therefore -> extra_args is not currently used in splits (xxx could be wrong)
+# could be not being hooked up
+untrace(what = "do_split", where = asNamespace("rtables"))
+
+# Let's try with the other variables identically
+my_tracer <- quote(if (!is.null(vals) || !is.null(labels) || isTRUE(trim)) {
+ print("A LOT TO SAY")
+ message("CANT BLOCK US ALL")
+ stop("NOW FOR SURE")
+ browser()
+})
+trace(
+ what = "do_split",
+ tracer = my_tracer,
+ where = asNamespace("rtables")
+)
+# Run tests by copying the above in setup-fakedata.R (then devtools::test())
+untrace(
+ what = "do_split",
+ where = asNamespace("rtables")
+)
As we have demonstrated, all of the above seem like impossible cases +and are to be considered as vestigial and to be deprecated.
+MultiVarSplit
& CompoundSplit
+Examples
+The final part of this article is still under construction, hence the
+non-specific mentions and the to do list. xxx CompoundSplit
+generates facets from one variable (e.g. cumulative distributions) while
+MultiVarSplit
uses different variables for the split. See
+AnalyzeMultiVars
, which inherits from
+CompoundSplit
for more details on how it analyzes the same
+facets multiple times. MultiVarColSplit
works with
+analyze_colvars
, which is out of the scope of this article.
+.set_kids_sect_sep
adds things between children (can be set
+from split).
First, we want to see how the MultiVarSplit
class
+behaves for an example case taken from
+?split_rows_by_multivar
.
+# rtables 0.6.2
+
+my_tracer <- quote(if (is(spl, "MultiVarSplit")) browser())
+trace(
+ what = "do_split",
+ tracer = my_tracer,
+ where = asNamespace("rtables")
+)
+# We want also to take a look at the following:
+debugonce(rtables:::.apply_split_inner)
+lyt <- basic_table() %>%
+ split_cols_by("ARM") %>%
+ split_rows_by_multivar(c("BMRKR1", "BMRKR1"),
+ varlabels = c("SD", "MEAN")
+ ) %>%
+ split_rows_by("COUNTRY",
+ split_fun = keep_split_levels("PAK")
+ ) %>% # xxx for #690 #691
+ summarize_row_groups() %>%
+ analyze(c("AGE", "SEX"))
+
+build_table(lyt, DM)
+
+# xxx check empty space on top -> check if it is a bug, file it
+untrace(
+ what = "do_split",
+ where = asNamespace("rtables")
+)
If we print the output, we will notice that the two groups (one +called “SEX” and the other “STRATA1”) are identical along the columns. +This is because no subgroup was actually created. This is an interesting +way to personalize splits with the help of custom split functions and +their split context, and to have widely different subgroups in the +table.
+We invite the reader to try to understand why
+split_rows_by_multivar
can have other row splits under it
+(see xxx
comment in the previous code), while
+split_cols_by_multivar
does not. This is a known bug at the
+moment, and we will work towards a fix for this. Known issues are often
+linked in the source code by their GitHub issue number
+(e.g. #690
).
Lastly, we will briefly show an example of a split by cut function +and how to replace it to solve the empty age groups problem as we did +before. We propose the same simplified situation:
+
+# rtables 0.6.2
+
+cutfun <- function(x) {
+ # browser()
+ cutpoints <- c(0, 50, 100)
+ names(cutpoints) <- c("", "Younger", "Older")
+ cutpoints
+}
+
+tbl <- basic_table(show_colcounts = TRUE) %>%
+ split_rows_by("ARM", split_fun = drop_and_remove_levels(c("B: Placebo", "C: Combination"))) %>%
+ split_rows_by("STRATA1") %>%
+ split_rows_by_cutfun("AGE", cutfun = cutfun) %>%
+ # split_rows_by_cuts("AGE", cuts = c(0, 50, 100),
+ # cutlabels = c("young", "old")) %>% # Works the same
+ split_rows_by("SEX", split_fun = drop_split_levels) %>%
+ summarize_row_groups() %>% # This is degenerate!!!
+ build_table(DM)
+
+tbl
## all obs
+## (N=356)
+## —————————————————————————
+## A: Drug X
+## A
+## AGE
+## Younger
+## F 22 (6.2%)
+## M 14 (3.9%)
+## Older
+## B
+## AGE
+## Younger
+## F 26 (7.3%)
+## M 14 (3.9%)
+## Older
+## F 1 (0.3%)
+## C
+## AGE
+## Younger
+## F 19 (5.3%)
+## M 21 (5.9%)
+## Older
+## F 2 (0.6%)
+## M 2 (0.6%)
+For both row split cases (*_cuts
and
+*_cutfun
), we have empty levels that are not dropped. This
+is to be expected and can be avoided by using a dedicated split
+function. Intentionally looking at the future split is possible in order
+to determine if an element is present in it. At the moment it is not
+possible to add spl_fun
to dedicated split functions like
+split_rows_by_cuts
.
Note that in the previous table we only used
+summarize_row_groups
, with no analyze
calls.
+This rendered the table nicely, but it is not the standard method to use
+as summarize_row_groups
is intended only to
+decorate row groups, i.e. rows with labels. Internally, these rows are
+called content rows and that is why analysis functions in
+summarize_row_groups
are called cfun
instead
+of afun
. Indeed, the tabulation machinery also presents
+these two differently as is described in the Tabulation
+with Row Structure section of the Tabulation vignette.
We can try to construct the split function for cuts manually with
+make_split_fun
:
+my_count_afun <- function(x, .N_col, .spl_context, ...) {
+ # browser()
+ out <- list(c(length(x), length(x) / .N_col))
+ names(out) <- .spl_context$value[nrow(.spl_context)] # workaround (xxx #689)
+ in_rows(
+ .list = out,
+ .formats = c("xx (xx.x%)")
+ )
+}
+# ?make_split_fun # To check for docs/examples
+
+# Core split
+cuts_core <- function(spl, df, vals, labels, .spl_context) {
+ # browser() # file an issue xxx
+ # variables that are split on are converted to factor during the original clean-up
+ # cut split are not doing it but it is an exception. xxx
+ # young_v <- as.numeric(df[["AGE"]]) < 50
+ # current solution:
+ young_v <- as.numeric(as.character(df[["AGE"]])) < 50
+ make_split_result(c("young", "old"),
+ datasplit = list(df[young_v, ], df[!young_v, ]),
+ labels = c("Younger", "Older")
+ )
+}
+drop_empties <- function(splret, spl, fulldf, ...) {
+ # browser()
+ nrows_data_split <- vapply(splret$datasplit, nrow, numeric(1))
+ to_keep <- nrows_data_split > 0
+ make_split_result(
+ splret$values[to_keep],
+ splret$datasplit[to_keep],
+ splret$labels[to_keep]
+ )
+}
+gen_split <- make_split_fun(
+ core_split = cuts_core,
+ post = list(drop_empties)
+)
+
+tbl <- basic_table(show_colcounts = TRUE) %>%
+ split_rows_by("ARM", split_fun = keep_split_levels(c("A: Drug X"))) %>%
+ split_rows_by("STRATA1") %>%
+ split_rows_by("AGE", split_fun = gen_split) %>%
+ analyze("SEX") %>% # It is the last step!! No need of BMRKR1 right?
+ # split_rows_by("SEX", split_fun = drop_split_levels,
+ # child_labels = "hidden") %>% # close issue #689. would it work for
+ # analyze_colvars? probably (xxx)
+ # analyze("BMRKR1", afun = my_count_afun) %>% # This is NOT degenerate!!! BMRKR1 is only placeholder
+ build_table(DM)
+
+tbl
Alternatively, we could choose to prune these rows out with
+prune_table
!
+# rtables 0.6.2
+
+tbl <- basic_table(show_colcounts = TRUE) %>%
+ split_rows_by("ARM", split_fun = keep_split_levels(c("A: Drug X"))) %>%
+ split_rows_by("STRATA1") %>%
+ split_rows_by_cuts(
+ "AGE",
+ cuts = c(0, 50, 100),
+ cutlabels = c("young", "old")
+ ) %>%
+ split_rows_by("SEX", split_fun = drop_split_levels) %>%
+ summarize_row_groups() %>% # This is degenerate!!! # we keep it until #689
+ build_table(DM)
+
+tbl
## all obs
+## (N=356)
+## —————————————————————
+## A: Drug X
+## A
+## young
+## F 22 (6.2%)
+## M 14 (3.9%)
+## old
+## B
+## young
+## F 26 (7.3%)
+## M 14 (3.9%)
+## old
+## F 1 (0.3%)
+## C
+## young
+## F 19 (5.3%)
+## M 21 (5.9%)
+## old
+## F 2 (0.6%)
+## M 2 (0.6%)
+
+# Trying with pruning
+prune_table(tbl) # (xxx) what is going on here? it is degenerate so it has no real leaves
## NULL
+
+# It is degenerate -> what to do?
+# The same mechanism is applied in the case of NULL leaves, they are rolled up in the
+# table tree
vignettes/dev-guide/dg_table_hierarchy.Rmd
+ dg_table_hierarchy.Rmd
This article is intended for use by developers only and will contain
+low-level explanations of the topics covered. For user-friendly
+vignettes, please see the Articles
+page on the rtables
website.
Any code or prose which appears in the version of this article on the
+main
branch of the repository may reflect a specific state
+of things that can be more or less recent. This guide describes very
+important aspects of table hierarchy that are unlikely to change.
+Regardless, we invite the reader to keep in mind that the current
+repository code may have drifted from the following material in this
+document, and it is always the best practice to read the code directly
+on main
.
Please keep in mind that rtables
is still under active
+development, and it has seen the efforts of multiple contributors across
+different years. Therefore, there may be legacy mechanisms and ongoing
+transformations that could look different in the future.
The scope of this vignette is to understand the structure of
+rtable
objects, class hierarchy with an exploration of tree
+structures as S4 objects. Exploring table structure enables a better
+understanding of rtables
concepts such as split machinery,
+tabulation, pagination and export. More details from the user’s
+perspective of table structure can be found in the relevant
+vignettes.
isS4
getclass
- for class structure
We invite developers to use the provided examples to interactively
+explore the rtables
hierarchy. The most helpful command is
+getClass
for a list of the slots associated with a class,
+in addition to related classes and their relative distances.
PredataAxisLayout
class is used to define the data
+subset instructions for tabulation. 2 sub-classes (one for each axis):
+PredataColLayout
, PredataRowLayout
Splits are core functionality for rtables
as tabulation
+and calculations are often required on subsets of the data.
## Class "TreePos" [package "rtables"]
+##
+## Slots:
+##
+## Name: splits s_values sval_labels subset
+## Class: list list character SubsetDef
+TreePos
class contains split information as a list of
+the splits, split label values, and the subsets of the data that are
+generated by the split.
AllSplit
RootSplit
+MultiVarSplit
VarStaticCutSplit
+CumulativeCutSplit
VarDynCutSplit
+CompoundSplit
VarLevWBaselineSplit
The highest level of the table hierarchy belong to
+TableTree
. The code below identifies the slots associated
+with with this class.
+getClass("TableTree")
## Class "TableTree" [package "rtables"]
+##
+## Slots:
+##
+## Name: content page_title_prefix children
+## Class: ElementaryTable character list
+##
+## Name: rowspans labelrow page_titles
+## Class: data.frame LabelRow character
+##
+## Name: horizontal_sep header_section_div trailing_section_div
+## Class: character character character
+##
+## Name: col_info format na_str
+## Class: InstantiatedColumnInfo FormatSpec character
+##
+## Name: indent_modifier table_inset level
+## Class: integer integer integer
+##
+## Name: name main_title subtitles
+## Class: character character character
+##
+## Name: main_footer provenance_footer
+## Class: character character
+##
+## Extends:
+## Class "VTableTree", directly
+## Class "VTableNodeInfo", by class "VTableTree", distance 2
+## Class "VTree", by class "VTableTree", distance 2
+## Class "VTitleFooter", by class "VTableTree", distance 2
+## Class "VNodeInfo", by class "VTableTree", distance 3
+As an S4 object, the slots can be accessed using @
+(similar to the use of $
for list objects). You’ll notice
+there are classes that fall under “Extends”. The classes contained here
+have a relationship to the TableTree
object and are
+“virtual” classes. To avoid the repetition of slots and carrying the
+same data (set of slots for example) that multiple classes may need,
+rtables
extensively uses virtual classes. A virtual class
+cannot be instantiated, the purpose is for other classes to inherit
+information from it.
+lyt <- basic_table(title = "big title") %>%
+ split_rows_by("SEX", page_by = TRUE) %>%
+ analyze("AGE")
+
+tt <- build_table(lyt, DM)
+
+# Though we don't recommend using str for studying rtable objects,
+# we do find it useful in this instance to visualize the parent/child relationships.
+str(tt, max.level = 2)
## Formal class 'TableTree' [package "rtables"] with 20 slots
+## ..@ content :Formal class 'ElementaryTable' [package "rtables"] with 19 slots
+## ..@ page_title_prefix : chr "SEX"
+## ..@ children :List of 4
+## ..@ rowspans :'data.frame': 0 obs. of 0 variables
+## ..@ labelrow :Formal class 'LabelRow' [package "rtables"] with 13 slots
+## ..@ page_titles : chr(0)
+## ..@ horizontal_sep : chr "—"
+## ..@ header_section_div : chr NA
+## ..@ trailing_section_div: chr NA
+## ..@ col_info :Formal class 'InstantiatedColumnInfo' [package "rtables"] with 9 slots
+## ..@ format : NULL
+## ..@ na_str : chr NA
+## ..@ indent_modifier : int 0
+## ..@ table_inset : int 0
+## ..@ level : int 1
+## ..@ name : chr "SEX"
+## ..@ main_title : chr "big title"
+## ..@ subtitles : chr(0)
+## ..@ main_footer : chr(0)
+## ..@ provenance_footer : chr(0)
+## Warning: str provides a low level, implementation-detail-specific description
+## of the TableTree object structure. See table_structure(.) for a summary of
+## table struture intended for end users.
+Root to Leaves, are vectors of vectors Tables are tree, nodes in the +tree can have summaries associated with them. Tables are trees because +of the nested structure. There is also the benefit of keeping and +repeating necessary information when trying to paginate a table.
+Children of ElementaryTables
are row objects.
+TableTree
can have children that are either row objects or
+other table objects.
vignettes/dev-guide/dg_tabulation.Rmd
+ dg_tabulation.Rmd
This article is intended for use by developers only and will contain
+low-level explanations of the topics covered. For user-friendly
+vignettes, please see the Articles
+page on the rtables
website.
Any code or prose which appears in the version of this article on the
+main
branch of the repository may reflect a specific state
+of things that can be more or less recent. This guide describes very
+important aspects of the tabulation process that are unlikely to change.
+Regardless, we invite the reader to keep in mind that the current
+repository code may have drifted from the following material in this
+document, and it is always the best practice to read the code directly
+on main
.
Please keep in mind that rtables
is still under active
+development, and it has seen the efforts of multiple contributors across
+different years. Therefore, there may be legacy mechanisms and ongoing
+transformations that could look different in the future.
Being that this a working document that may be subjected to both
+deprecation and updates, we keep xxx
comments to indicate
+placeholders for warnings and to-do’s that need further work.
Tabulation in rtables
is a process that takes a
+pre-defined layout and applies it to data. The layout object, with all
+of its splits and analyze
s, can be applied to different
+data to produce valid tables. This process happens principally within
+the tt_dotabulation.R
file and the user-facing function
+build_table
that resides in it. We will occasionally use
+functions and methods that are present in other files, like
+colby_construction.R
or make_subset_expr.R
. We
+assume the reader is already familiar with the documentation for
+build_table
. We suggest reading the Split
+Machinery article prior to this one, as it is instrumental in
+understanding how the layout object, which is essentially built out of
+splits, is tabulated when data is supplied.
We enter into build_table
using debugonce
+to see how it works.
+# rtables 0.6.2
+library(rtables)
+debugonce(build_table)
+
+# A very simple layout
+lyt <- basic_table() %>%
+ split_rows_by("STRATA1") %>%
+ split_rows_by("SEX", split_fun = drop_split_levels) %>%
+ split_cols_by("ARM") %>%
+ analyze("BMRKR1")
+
+# lyt must be a PreDataTableLayouts object
+is(lyt, "PreDataTableLayouts")
+
+lyt %>% build_table(DM)
Now let’s look within our build_table
call. After the
+initial check that the layout is a pre-data table layout, it checks if
+the column layout is defined (clayout
accessor), i.e. it
+does not have any column split. If that is the case, a
+All obs
column is added automatically with all
+observations. After this, there are a couple of defensive programming
+calls that do checks and transformations as we finally have the data.
+These can be divided into two categories: those that mainly concern the
+layout, which are defined as generics, and those that concern the data,
+which are instead a function as they are not dependent on the layout
+class. Indeed, the layout is structured and can be divided into
+clayout
and rlayout
(column and row layout).
+The first one is used to create cinfo
, which is the general
+object and container of the column splits and information. The second
+one contains the obligatory all data split, i.e. the root split
+(accessible with root_spl
), and the row splits’ vectors
+which are iterative splits in the row space. In the following, we
+consider the initial checks and defensive programming.
+## do checks and defensive programming now that we have the data
+lyt <- fix_dyncuts(lyt, df) # Create the splits that depends on data
+lyt <- set_def_child_ord(lyt, df) # With the data I set the same order for all splits
+lyt <- fix_analyze_vis(lyt) # Checks if the analyze last split should be visible
+# If there is only one you will not get the variable name, otherwise you get it if you
+# have multivar. Default is NA. You can do it now only because you are sure to
+# have the whole layout.
+df <- fix_split_vars(lyt, df, char_ok = is.null(col_counts))
+# checks if split vars are present
+
+lyt[] # preserve names - warning if names longer, repeats the name value if only one
+lyt@.Data # might not preserve the names # it works only when it is another class that inherits from lists
+# We suggest doing extensive testing about these behaviors in order to do choose the appropriate one
Along with the various checks and defensive programming, we find
+PreDataAxisLayout
which is a virtual class that both row
+and column layouts inherit from. Virtual classes are handy for group
+classes that need to share things like labels or functions that need to
+be applicable to their relative classes. See more information about the
+rtables
class hierarchy in the dedicated article here.
Now, we continue with build_table
. After the checks, we
+notice TreePos()
which is a constructor for an object that
+retains a representation of the tree position along with split values
+and labels. This is mainly used by create_colinfo
, which we
+enter now with debugonce(create_colinfo)
. This function
+creates the object that represents the column splits and everything else
+that may be related to the columns. In particular, the column counts are
+calculated in this function. The parameter inputs are as follows:
+cinfo <- create_colinfo(
+ lyt, # Main layout with col split info
+ df, # df used for splits and col counts if no alt_counts_df is present
+ rtpos, # TreePos (does not change out of this function)
+ counts = col_counts, # If we want to overwrite the calculations with df/alt_counts_df
+ alt_counts_df = alt_counts_df, # alternative data for col counts
+ total = col_total, # calculated from build_table inputs (nrow of df or alt_counts_df)
+ topleft # topleft information added into build_table
+)
create_colinfo
is in make_subset_expr.R
.
+Here, we see that if topleft
is present in
+build_table
, it will override the one in lyt
.
+Entering create_colinfo
, we will see the following
+calls:
+clayout <- clayout(lyt) # Extracts column split and info
+
+if (is.null(topleft)) {
+ topleft <- top_left(lyt) # If top_left is not present in build_table, it is taken from lyt
+}
+
+ctree <- coltree(clayout, df = df, rtpos = rtpos) # Main constructor of LayoutColTree
+# The above is referenced as generic and principally represented as
+# setMethod("coltree", "PreDataColLayout", (located in `tree_accessor.R`).
+# This is a call that restructures information from clayout, df, and rtpos
+# to get a more compact column tree layout. Part of this design is related
+# to past implementations.
+
+cexprs <- make_col_subsets(ctree, df) # extracts expressions in a compact fashion.
+# WARNING: removing NAs at this step is automatic. This should
+# be coupled with a warning for NAs in the split (xxx)
+
+colextras <- col_extra_args(ctree) # retrieves extra_args from the tree. It may not be used
Next in the function is the determination of the column counts.
+Currently, this happens only at the leaf level, but it can certainly be
+calculated independently for all levels (this is an open issue in
+rtables
, i.e. how to print other levels’ totals).
+Precedence for column counts may be not documented (“xxx todo”). The
+main use case is when you are analyzing a participation-level dataset,
+with multiple records per subject, and you would like to retain the
+total numbers of subjects per column, often taken from a subject-level
+dataset, to use as column counts. Originally, counts were only able to
+be added as a vector, but it is often the case that users would like the
+possibility to use alt_counts_df
. The cinfo
+object (InstantiatedColumnInfo
) is created with all the
+above information.
If we continue inside build_table
, we see
+.make_ctab
used to make a root split. This is a general
+procedure that generates the initial root split as a content row.
+ctab
is applied to this content row, which is a row that
+contains only a label. From ?summarize_row_groups
, you know
+that this is how rtables
defines label rows, i.e. as
+content rows. .make_ctab
is very similar to the function
+that actual creates the table rows, .make_tablerows
. Note
+that this function uses parent_cfun
and
+.make_caller
to retrieve the content function inserted in
+above levels. Here we split the structural handling of the table object
+and the row-creation engine, which are divided by a
+.make_tablerows
call. If you search the package, you will
+find that this function is only called twice, once in
+.make_ctab
and once in .make_analyzed_tab
.
+These two are the final elements of the table construction: the creation
+of rows.
Going back to build_table
, you will see that the row
+layout is actually a list of split vectors. The fundamental line,
+kids <- lapply(seq_along(rlyt), function(i) {
, allows us
+to appreciate this. Going forward we see how
+recursive_applysplit
is applied to each split vector. It
+may be worthwhile to check what this vector looks like in our test
+case.
+# rtables 0.6.2
+# A very simple layout
+lyt <- basic_table() %>%
+ split_rows_by("STRATA1") %>%
+ split_rows_by("SEX", split_fun = drop_split_levels) %>%
+ split_cols_by("ARM") %>%
+ analyze("BMRKR1")
+
+rlyt <- rtables:::rlayout(lyt)
+str(rlyt, max.level = 2)
Formal class 'PreDataRowLayout' [package "rtables"] with 2 slots
+ ..@ .Data :List of 2 # rlyt is a rtables object (PreDataRowLayout) that is also a list!
+ ..@ root_split:Formal class 'RootSplit' [package "rtables"] with 17 slots # another object!
+ # If you do summarize_row_groups before anything you act on the root split. We need this to
+ # have a place for the content that is valid for the whole table.
+
+str(rtables:::root_spl(rlyt), max.level = 2) # it is still a split
+
+str(rlyt[[1]], max.level = 3) # still a rtables object (SplitVector) that is a list
+Formal class 'SplitVector' [package "rtables"] with 1 slot
+ ..@ .Data:List of 3
+ .. ..$ :Formal class 'VarLevelSplit' [package "rtables"] with 20 slots
+ .. ..$ :Formal class 'VarLevelSplit' [package "rtables"] with 20 slots
+ .. ..$ :Formal class 'AnalyzeMultiVars' [package "rtables"] with 17 slots
The last print is very informative. We can see from the layout
+construction that this object is built with 2
+VarLevelSplit
s on the rows and one final
+AnalyzeMultiVars
, which is the leaf analysis split that has
+the final level rows. The second split vector is the following
+AnalyzeVarSplit
.
xxx To get multiple split vectors, you need to escape the nesting
+with nest = FALSE
or by adding a split_rows_by
+call after an analyze
call.
# rtables 0.6.2
+str(rlyt[[2]], max.level = 5)
+Formal class 'SplitVector' [package "rtables"] with 1 slot
+ ..@ .Data:List of 1
+ .. ..$ :Formal class 'AnalyzeVarSplit' [package "rtables"] with 21 slots
+ .. .. .. ..@ analysis_fun :function (x, ...)
+ .. .. .. .. ..- attr(*, "srcref")= 'srcref' int [1:8] 1723 5 1732 5 5 5 4198 4207
+ .. .. .. .. .. ..- attr(*, "srcfile")=Classes 'srcfilealias', 'srcfile' <environment: 0x560d8e67b750>
+ .. .. .. ..@ default_rowlabel : chr "Var3 Counts"
+ .. .. .. ..@ include_NAs : logi FALSE
+ .. .. .. ..@ var_label_position : chr "default"
+ .. .. .. ..@ payload : chr "VAR3"
+ .. .. .. ..@ name : chr "VAR3"
+ .. .. .. ..@ split_label : chr "Var3 Counts"
+ .. .. .. ..@ split_format : NULL
+ .. .. .. ..@ split_na_str : chr NA
+ .. .. .. ..@ split_label_position : chr(0)
+ .. .. .. ..@ content_fun : NULL
+ .. .. .. ..@ content_format : NULL
+ .. .. .. ..@ content_na_str : chr(0)
+ .. .. .. ..@ content_var : chr ""
+ .. .. .. ..@ label_children : logi FALSE
+ .. .. .. ..@ extra_args : list()
+ .. .. .. ..@ indent_modifier : int 0
+ .. .. .. ..@ content_indent_modifier: int 0
+ .. .. .. ..@ content_extra_args : list()
+ .. .. .. ..@ page_title_prefix : chr NA
+ .. .. .. ..@ child_section_div : chr NA
Continuing in recursive_applysplit
, this is made up of
+two main calls: one to .make_ctab
which makes the content
+row and calculates the counts if specified, and
+.make_split_kids
. This eventually contains
+recursive_applysplit
which is applied if the split vector
+is built of Split
s that are not analyze
+splits. It being a generic is very handy here to switch between
+different downstream processes. In our case (rlyt[[1]]
) we
+will call the method getMethod(".make_split_kids", "Split")
+twice before getting to the analysis split. There, we have a (xxx)
+multi-variable split which applies .make_split_kids
to each
+of its elements, in turn calling the main
+getMethod(".make_split_kids", "VAnalyzeSplit")
which would
+in turn go to .make_analyzed_tab
.
There are interesting edge cases here for different split cases, like
+split_by_multivars
or when one of the splits has a
+reference group. In the internal code here, it is called
+baseline
. If we follow this variable across the function
+layers, we will see that where the split (do_split
) happens
+(in getMethod(".make_split_kids", "Split")
) we have a
+second split for the reference group. This is done to make this
+available in each row to calculate, for example, differences from the
+reference group.
Now we move towards .make_tablerows
, and here analysis
+functions become key as this is the place where these are applied and
+analyzed. First, the external tryCatch
is used to cache
+errors at a higher level, so as to differentiate the two major blocks.
+The function parameters here are quite intuitive, with the exception of
+spl_context
. This is a fundamental parameter that keeps
+information about splits so that it can be visible from analysis
+functions. If you look into this value, you will see that is carried and
+updated everywhere a split happens, except for columns. Column-related
+information is added last, when in gen_onerv
, which is the
+lowest level where one result value is produced. From
+.make_tablerows
we go to gen_rowvalues
, aside
+from some row and referential footers handling.
+gen_rowvalues
unpacks the cinfo
object and
+crosses it with the arriving row split information to generate rows. In
+particular, rawvals <- mapply(gen_onerv,
maps the
+columns to generate a list of values corresponding to a table row.
+Looking at the final if
in gen_onerv
we see
+if (!is(val, "RowsVerticalSection"))
and the function
+in_rows
is called. We invite the reader to explore what the
+building blocks of in_rows
are, and how
+.make_tablerows
constructs a data row
+(DataRow
) or a content row (ContentRow
)
+depending on whether it is called from .make_ctab
or
+.make_analyzed_tab
.
.make_tablerows
either makes a content table or an
+“analysis table”. gen_rowvalues
generates a list of stacks
+(RowsVerticalSection
, more than one rows potentially!) for
+each column.
To add: conceptual part -> calculating things by column and +putting them side by side and slicing them by rows and putting it +together -> rtables is row dominant.
+vignettes/example_analysis_coxreg.Rmd
+ example_analysis_coxreg.Rmd
In this vignette we will demonstrate how a complex analysis function
+can be constructed in order to build highly-customized tables with
+rtables
. This example will detail the steps in creating an
+analysis function to calculate a basic univariable Cox regression
+summary table to analyze the treatment effect of the ARM
+variable and any covariate/interaction effects for a survival analysis.
+For a Cox regression analysis function with more customization options
+and the capability of fitting multivariable Cox regression models, see
+the summarize_coxreg()
+function from the tern
+package, which builds upon the concepts used in the construction of this
+example.
The packages used in this vignette are:
+ +First, we prepare the data that will be used to generate a table in
+this example. We will use the example ADTTE
(Time-To-Event
+Analysis) dataset ex_adtte
from the formatters
+package, which contains our treatment variable ARM
, several
+variables that can be chosen as covariates, and censor variable
+CNSR
from which we will derive the event variable
+EVENT
required for our model. For the purpose of this
+example, we will use age (AGE
) and race (RACE
)
+as our covariates.
We prepare the data as needed to observe the desired effects in our
+summary table. PARAMCD
is filtered so that only records of
+overall survival (OS) are included, and we filter and mutate to include
+only the levels of interest in our covariates. The ARM
+variable is mutated to indicate that "B: Placebo"
should be
+used as the reference level of our treatment variable, and the
+EVENT
variable is derived from CNSR
.
+adtte <- ex_adtte
+
+anl <- adtte %>%
+ dplyr::filter(PARAMCD == "OS") %>%
+ dplyr::filter(ARM %in% c("A: Drug X", "B: Placebo")) %>%
+ dplyr::filter(RACE %in% c("ASIAN", "BLACK OR AFRICAN AMERICAN", "WHITE")) %>%
+ dplyr::mutate(RACE = droplevels(RACE)) %>%
+ dplyr::mutate(ARM = droplevels(stats::relevel(ARM, "B: Placebo"))) %>%
+ dplyr::mutate(EVENT = 1 - CNSR)
tidy
Method for summary.coxph
Objects:
+tidy.summary.coxph
+This method allows the tidy
function from the
+broom
package to operate on summary.coxph
+output, extracting the values of interest to this analysis and returning
+a tidied tibble::tibble()
object.
+tidy.summary.coxph <- function(x, ...) {
+ is(x, "summary.coxph")
+ pval <- x$coefficients
+ confint <- x$conf.int
+ levels <- rownames(pval)
+ pval <- tibble::as_tibble(pval)
+ confint <- tibble::as_tibble(confint)
+
+ ret <- cbind(pval[, grepl("Pr", names(pval))], confint)
+ ret$level <- levels
+ ret$n <- x[["n"]]
+ ret
+}
h_coxreg_inter_effect
+The h_coxreg_inter_effect
helper function is used within
+the following helper function,
+h_coxreg_extract_interaction
, to estimate interaction
+effects from a given model for a given covariate. The function
+calculates the desired statistics from the given model and returns a
+data.frame
with label information for each row as well as
+the statistics n
, hr
(hazard ratio),
+lcl
(CI lower bound), ucl
(CI upper bound),
+pval
(effect p-value), and pval_inter
+(interaction p-value). If a numeric covariate is selected, the median
+value is used as the sole “level” for which an interaction effect is
+calculated. For non-numeric covariates, an interaction effect is
+calculated for each level of the covariate, with each result returned on
+a separate row.
+h_coxreg_inter_effect <- function(x,
+ effect,
+ covar,
+ mod,
+ label,
+ control,
+ data) {
+ if (is.numeric(x)) {
+ betas <- stats::coef(mod)
+ attrs <- attr(stats::terms(mod), "term.labels")
+ term_indices <- grep(pattern = effect, x = attrs[!grepl("strata\\(", attrs)])
+ betas <- betas[term_indices]
+ betas_var <- diag(stats::vcov(mod))[term_indices]
+ betas_cov <- stats::vcov(mod)[term_indices[1], term_indices[2]]
+ xval <- stats::median(x)
+ effect_index <- !grepl(covar, names(betas))
+ coef_hat <- betas[effect_index] + xval * betas[!effect_index]
+ coef_se <- sqrt(betas_var[effect_index] + xval^2 * betas_var[!effect_index] + 2 * xval * betas_cov)
+ q_norm <- stats::qnorm((1 + control$conf_level) / 2)
+ } else {
+ var_lvl <- paste0(effect, levels(data[[effect]])[-1]) # [-1]: reference level
+ giv_lvl <- paste0(covar, levels(data[[covar]]))
+ design_mat <- expand.grid(effect = var_lvl, covar = giv_lvl)
+ design_mat <- design_mat[order(design_mat$effect, design_mat$covar), ]
+ design_mat <- within(data = design_mat, expr = {
+ inter <- paste0(effect, ":", covar)
+ rev_inter <- paste0(covar, ":", effect)
+ })
+ split_by_variable <- design_mat$effect
+ interaction_names <- paste(design_mat$effect, design_mat$covar, sep = "/")
+ mmat <- stats::model.matrix(mod)[1, ]
+ mmat[!mmat == 0] <- 0
+ design_mat <- apply(X = design_mat, MARGIN = 1, FUN = function(x) {
+ mmat[names(mmat) %in% x[-which(names(x) == "covar")]] <- 1
+ mmat
+ })
+ colnames(design_mat) <- interaction_names
+ coef <- stats::coef(mod)
+ vcov <- stats::vcov(mod)
+ betas <- as.matrix(coef)
+ coef_hat <- t(design_mat) %*% betas
+ dimnames(coef_hat)[2] <- "coef"
+ coef_se <- apply(design_mat, 2, function(x) {
+ vcov_el <- as.logical(x)
+ y <- vcov[vcov_el, vcov_el]
+ y <- sum(y)
+ y <- sqrt(y)
+ y
+ })
+ q_norm <- stats::qnorm((1 + control$conf_level) / 2)
+ y <- cbind(coef_hat, `se(coef)` = coef_se)
+ y <- apply(y, 1, function(x) {
+ x["hr"] <- exp(x["coef"])
+ x["lcl"] <- exp(x["coef"] - q_norm * x["se(coef)"])
+ x["ucl"] <- exp(x["coef"] + q_norm * x["se(coef)"])
+ x
+ })
+ y <- t(y)
+ y <- by(y, split_by_variable, identity)
+ y <- lapply(y, as.matrix)
+ attr(y, "details") <- paste0(
+ "Estimations of ", effect, " hazard ratio given the level of ", covar, " compared to ",
+ effect, " level ", levels(data[[effect]])[1], "."
+ )
+ xval <- levels(data[[covar]])
+ }
+ data.frame(
+ effect = "Covariate:",
+ term = rep(covar, length(xval)),
+ term_label = as.character(paste0(" ", xval)),
+ level = as.character(xval),
+ n = NA,
+ hr = if (is.numeric(x)) exp(coef_hat) else y[[1]][, "hr"],
+ lcl = if (is.numeric(x)) exp(coef_hat - q_norm * coef_se) else y[[1]][, "lcl"],
+ ucl = if (is.numeric(x)) exp(coef_hat + q_norm * coef_se) else y[[1]][, "ucl"],
+ pval = NA,
+ pval_inter = NA,
+ stringsAsFactors = FALSE
+ )
+}
h_coxreg_extract_interaction
+Using the previous two helper functions,
+h_coxreg_extract_interaction
uses ANOVA to extract
+information from the given model about the given covariate. This
+function will extract different information depending on whether the
+effect of interest is a treatment/main effect or an interaction effect,
+and returns a data.frame
with label information for each
+row (corresponding to each effect) as well as the statistics
+n
, hr
, lcl
, ucl
,
+pval
, and pval_inter
(for interaction effects
+only). This helper function is used directly within our analysis
+function to analyze the Cox regression model and extract relevant
+information to be processed and displayed within our output table.
+h_coxreg_extract_interaction <- function(effect, covar, mod, data) {
+ control <- list(pval_method = "wald", ties = "exact", conf_level = 0.95, interaction = FALSE)
+ test_statistic <- c(wald = "Wald", likelihood = "LR")[control$pval_method]
+ mod_aov <- withCallingHandlers(
+ expr = car::Anova(mod, test.statistic = test_statistic, type = "III"),
+ message = function(m) invokeRestart("muffleMessage")
+ )
+ msum <- if (!any(attr(stats::terms(mod), "order") == 2)) summary(mod, conf.int = control$conf_level) else mod_aov
+ sum_anova <- broom::tidy(msum)
+ if (!any(attr(stats::terms(mod), "order") == 2)) {
+ effect_aov <- mod_aov[effect, , drop = TRUE]
+ pval <- effect_aov[[grep(pattern = "Pr", x = names(effect_aov)), drop = TRUE]]
+ sum_main <- sum_anova[grepl(effect, sum_anova$level), ]
+ term_label <- if (effect == covar) {
+ paste0(levels(data[[covar]])[2], " vs control (", levels(data[[covar]])[1], ")")
+ } else {
+ unname(formatters::var_labels(data, fill = TRUE)[[covar]])
+ }
+ y <- data.frame(
+ effect = ifelse(covar == effect, "Treatment:", "Covariate:"),
+ term = covar, term_label = term_label,
+ level = levels(data[[effect]])[2],
+ n = mod[["n"]], hr = unname(sum_main["exp(coef)"]), lcl = unname(sum_main[grep("lower", names(sum_main))]),
+ ucl = unname(sum_main[grep("upper", names(sum_main))]), pval = pval,
+ stringsAsFactors = FALSE
+ )
+ y$pval_inter <- NA
+ y
+ } else {
+ pval <- sum_anova[sum_anova$term == effect, ][["p.value"]]
+
+ ## Test the interaction effect
+ pval_inter <- sum_anova[grep(":", sum_anova$term), ][["p.value"]]
+ covar_test <- data.frame(
+ effect = "Covariate:",
+ term = covar, term_label = unname(formatters::var_labels(data, fill = TRUE)[[covar]]),
+ level = "",
+ n = mod$n, hr = NA, lcl = NA, ucl = NA, pval = pval,
+ pval_inter = pval_inter,
+ stringsAsFactors = FALSE
+ )
+ ## Estimate the interaction
+ y <- h_coxreg_inter_effect(
+ data[[covar]],
+ covar = covar,
+ effect = effect,
+ mod = mod,
+ label = unname(formatters::var_labels(data, fill = TRUE)[[covar]]),
+ control = control,
+ data = data
+ )
+ rbind(covar_test, y)
+ }
+}
cached_model
+Next, we will create a helper function, cached_model
,
+which will be used within our analysis function to cache and return the
+fitted Cox regression model for the current covariate. The
+df
argument will be directly inherited from the
+df
argument passed to the analysis function, which contains
+the full dataset being analyzed. The cov
argument will be
+the covariate that is being analyzed depending on the current row
+context. If the treatment effect is currently being analyzed, this value
+will be an empty string. The cache_env
parameter will be an
+environment object which is used to store the model for the current
+covariate, also passed down from the analysis function. Of course, this
+function can also be run outside of the analysis function and will still
+cache and return a Cox regression model.
Using these arguments, the cached_model
function first
+checks if a model for the given covariate cov
is already
+stored in the caching environment cache_env
. If so, then
+this model is retrieved and returned by cached_model
. If
+not, the model must be constructed. This is done by first constructing
+the model formula, model_form
, starting with only the
+treatment effect (ARM
) and adding a covariate effect if one
+is currently being analyzed. Then a Cox regression model is fit using
+df
and the model formula, and this model is both returned
+and stored in the caching environment object as
+cache_env[[cov]]
.
+cached_model <- function(df, cov, cache_env) {
+ ## Check if a model already exists for
+ ## `cov` in the caching environment
+ if (!is.null(cache_env[[cov]])) {
+ ## If model already exists, retrieve it from cache_env
+ model <- cache_env[[cov]]
+ } else {
+ ## Build model formula
+ model_form <- paste0("survival::Surv(AVAL, EVENT) ~ ARM")
+ if (length(cov) > 0) {
+ model_form <- paste(c(model_form, cov), collapse = " * ")
+ } else {
+ cov <- "ARM"
+ }
+ ## Calculate Cox regression model
+ model <- survival::coxph(
+ formula = stats::as.formula(model_form),
+ data = df,
+ ties = "exact"
+ )
+ ## Store model in the caching environment
+ cache_env[[cov]] <- model
+ }
+ model
+}
a_cox_summary
+With our data prepared and helper function created, we can proceed to
+construct our analysis function a_cox_summary
, which will
+be used to populate all of the rows in our table. In order to be used to
+generate both data rows (for interaction effects) and content rows (for
+main effects), we must create a function that can be used as both
+afun
in analyze
and cfun
in
+summarize_row_groups
. Therefore, our function must accept
+the labelstr
parameter.
The arguments of our analysis function will be as follows:
+df
- a data.frame
of the full dataset
+required to fit the Cox regression model.labelstr
- the string
label for the
+variable being analyzed in the current row/column split context..spl_context
- a data.frame
containing the
+value
column which is used by this analysis function to
+determine the name of the variable/covariate in the current split. For
+more details on the information stored by .spl_context
see
+?analyze
.stat
and format
- string
s
+that indicate which statistic column we are currently in and what format
+should be applied to print the statistic.cache_env
- an environment
object that can
+be used to store cached models so that we can prevent repeatedly fitting
+the same model. Instead, each model will be generated once per covariate
+and then reused. This argument will be passed directly to the
+cached_model
helper function we defined previously.cov_main
- a logical
value indicating
+whether or not the current row is summarizing covariate main
+effects.The analysis function works within a given row/column split context
+by using the current covariate (cov
) and the
+cached_model
function to obtain the desired Cox regression
+model. From this model, the h_coxreg_extract_interaction
+function is able to extract information/statistics relevant to the
+analysis and store it in a data.frame
. The rows in this
+data.frame
that are of interest in the current row/column
+split context are then extracted and the statistic to be printed in the
+current column is retrieved from these rows. Finally, the formatted
+cells with this statistic are returned as a
+VerticalRowsSection
object. For more detail see the
+commented function code below, where the purpose of each line within
+a_cox_summary
is described.
+a_cox_summary <- function(df,
+ labelstr = "",
+ .spl_context,
+ stat,
+ format,
+ cache_env,
+ cov_main = FALSE) {
+ ## Get current covariate (variable used in latest row split)
+ cov <- tail(.spl_context$value, 1)
+
+ ## If currently analyzing treatment effect (ARM) replace empty
+ ## value of cov with "ARM" so the correct model row is analyzed
+ if (length(cov) == 0) cov <- "ARM"
+
+ ## Use cached_model to get the fitted Cox regression
+ ## model for the current covariate
+ model <- cached_model(df = df, cov = cov, cache_env = cache_env)
+
+ ## Extract levels of cov to be used as row labels for interaction effects.
+ ## If cov is numeric, the median value of cov is used as a row label instead
+ cov_lvls <- if (is.factor(df[[cov]])) levels(df[[cov]]) else as.character(median(df[[cov]]))
+
+ ## Use function to calculate and extract information relevant to cov from the model
+ cov_rows <- h_coxreg_extract_interaction(effect = "ARM", covar = cov, mod = model, data = df)
+ ## Effect p-value is only printed for treatment effect row
+ if (!cov == "ARM") cov_rows[, "pval"] <- NA_real_
+ ## Extract rows containing statistics for cov from model information
+ if (!cov_main) {
+ ## Extract rows for main effect
+ cov_rows <- cov_rows[cov_rows$level %in% cov_lvls, ]
+ } else {
+ ## Extract all non-main effect rows
+ cov_rows <- cov_rows[nchar(cov_rows$level) == 0, ]
+ }
+ ## Extract value(s) of statistic for current column and variable/levels
+ stat_vals <- as.list(apply(cov_rows[stat], 1, function(x) x, simplify = FALSE))
+ ## Assign labels: covariate name for main effect (content) rows, ARM comparison description
+ ## for treatment effect (content) row, cov_lvls for interaction effect (data) rows
+ nms <- if (cov_main) labelstr else if (cov == "ARM") cov_rows$term_label else cov_lvls
+ ## Return formatted/labelled row
+ in_rows(
+ .list = stat_vals,
+ .names = nms,
+ .labels = nms,
+ .formats = setNames(rep(format, length(nms)), nms),
+ .format_na_strs = setNames(rep("", length(nms)), nms)
+ )
+}
We are able to customize our Cox regression summary using this +analysis function by selecting covariates (and their labels), statistics +(and their labels), and statistic formats to use when generating the +output table. We also initialize a new environment object to be used by +the analysis function as the caching environment to store our models in. +For the purpose of this example, we will choose all 5 of the possible +statistics to include in the table: n, hazard ratio, confidence +interval, effect p-value, and interaction p-value.
+
+my_covs <- c("AGE", "RACE") ## Covariates
+my_cov_labs <- c("Age", "Race") ## Covariate labels
+my_stats <- list("n", "hr", c("lcl", "ucl"), "pval", "pval_inter") ## Statistics
+my_stat_labs <- c("n", "Hazard Ratio", "95% CI", "p-value\n(effect)", "p-value\n(interaction)") ## Statistic labels
+my_formats <- c(
+ n = "xx", hr = "xx.xx", lcl = "(xx.xx, xx.xx)", pval = "xx.xxxx", pval_inter = "xx.xxxx" ## Statistic formats
+)
+my_env <- new.env()
+ny_cache_env <- replicate(length(my_stats), list(my_env)) ## Caching environment
Finally, the table layout can be constructed and used to build the +desired table.
+We first split our basic_table
using
+split_cols_by_multivar
to ensure that each statistic exists
+in its own column. To do so, we choose a variable (in this case
+STUDYID
) which shares the same value in every row, and use
+it as the split variable for every column so that the full dataset is
+used to compute the model for every column. We use the
+extra_args
argument for which each list element’s element
+positions correspond to the children of (columns generated by) this
+split. These arguments are inherited by all following layout elements
+operating within this split, which use these elements as argument
+inputs. To elaborate on this, we have three elements in
+extra_args
: stat
, format
, and
+cache_env
- each of which are arguments of
+a_cox_summary
and have length equal to the number of
+columns (as defined above). For each use of our analysis function
+following this column split, depending on the current column context,
+the corresponding element of each of these three list elements will be
+inherited from extra_args
and used as input. For example,
+if analyze_colvars
is called with
+a_cox_summary
as afun
and is performing
+calculations for column 1, my_stats[1]
("n"
)
+will be given as argument stat
, my_formats[1]
+("xx"
) as argument format
, and
+my_cache_env[1]
(my_env
) as
+cache_env
. This is useful for our table since we want each
+column to print out values for a different statistic and apply its
+corresponding format.
Next, we can use summarize_row_groups
to generate the
+content row for treatment effect. This is the first instance where
+extra_args
from the column split will be inherited and used
+as argument input in cfun
.
After generating the treatment effect row, we want to add rows for
+covariates. We use split_rows_by_multivar
to split rows by
+covariate and apply appropriate labels.
Following this row split, we use summarize_row_groups
+with a_cox_summary
as cfun
to generate one
+content row for each covariate main effect. Once again the contents of
+extra_args
from the column split are inherited as input.
+Here we specify cov_main = TRUE
in the
+extra_args
argument so that main effects rather than
+interactions are considered. Since this is not a split, this instance of
+extra_args
is not inherited by any following layout
+elements. As cov_main
is a singular value,
+cov_main = TRUE
will be used within every column
+context.
The last part of our table is the covariate interaction effects. We
+use analyze_colvars
with a_cox_summary
as
+afun
, and again inherit extra_args
from the
+column split. Using an rtables
“analyze” function generates
+data rows, with one row corresponding to each covariate level (or median
+value, for numeric covariates), nested under the content row (main
+effect) for that same covariate.
+lyt <- basic_table() %>%
+ ## Column split: one column for each statistic
+ split_cols_by_multivar(
+ vars = rep("STUDYID", length(my_stats)),
+ varlabels = my_stat_labs,
+ extra_args = list(
+ stat = my_stats,
+ format = my_formats,
+ cache_env = ny_cache_env
+ )
+ ) %>%
+ ## Create content row for treatment effect
+ summarize_row_groups(cfun = a_cox_summary) %>%
+ ## Row split: one content row for each covariate
+ split_rows_by_multivar(
+ vars = my_covs,
+ varlabels = my_cov_labs,
+ split_label = "Covariate:",
+ indent_mod = -1 ## Align split label left
+ ) %>%
+ ## Create content rows for covariate main effects
+ summarize_row_groups(
+ cfun = a_cox_summary,
+ extra_args = list(cov_main = TRUE)
+ ) %>%
+ ## Create data rows for covariate interaction effects
+ analyze_colvars(afun = a_cox_summary)
Using our pre-processed anl
dataset, we can now build
+and output our final Cox regression summary table.
+cox_tbl <- build_table(lyt, anl)
+cox_tbl
## p-value p-value
+## n Hazard Ratio 95% CI (effect) (interaction)
+## ————————————————————————————————————————————————————————————————————————————————————————————————
+## A: Drug X vs control (B: Placebo) 247 0.97 (0.71, 1.32) 0.8243
+## Covariate:
+## Age 247 0.7832
+## 34 0.92 (0.68, 1.26)
+## Race 247 0.7441
+## ASIAN 1.03 (0.68, 1.57)
+## BLACK OR AFRICAN AMERICAN 0.78 (0.41, 1.49)
+## WHITE 1.06 (0.55, 2.04)
+vignettes/exploratory_analysis.Rmd
+ exploratory_analysis.Rmd
In this vignette, we would like to introduce how
+qtable()
can be used to easily create cross tabulations for
+exploratory data analysis. qtable()
is an extension of
+table()
from base R and can do much beyond creating two-way
+contingency tables. The function has a simple to use interface while
+internally it builds layouts using the rtables
+framework.
Load packages used in this vignette:
+ +Let’s start by seeing what table()
can do:
+table(ex_adsl$ARM)
#
+# A: Drug X B: Placebo C: Combination
+# 134 134 132
+
+table(ex_adsl$SEX, ex_adsl$ARM)
#
+# A: Drug X B: Placebo C: Combination
+# F 79 77 66
+# M 51 55 60
+# U 3 2 4
+# UNDIFFERENTIATED 1 0 2
+We can easily recreate the cross-tables above with
+qtable()
by specifying a data.frame with variable(s) to
+tabulate. The col_vars
and row_vars
arguments
+control how to split the data across columns and rows respectively.
+qtable(ex_adsl, col_vars = "ARM")
# A: Drug X B: Placebo C: Combination
+# (N=134) (N=134) (N=132)
+# ———————————————————————————————————————————————
+# count 134 134 132
+
+qtable(ex_adsl, col_vars = "ARM", row_vars = "SEX")
# A: Drug X B: Placebo C: Combination
+# count (N=134) (N=134) (N=132)
+# ——————————————————————————————————————————————————————————
+# F 79 77 66
+# M 51 55 60
+# U 3 2 4
+# UNDIFFERENTIATED 1 0 2
+Aside from the display style, the main difference is that
+qtable()
will add (N=xx) in the table header by default.
+This can be removed with show_colcounts
.
+qtable(ex_adsl, "ARM", show_colcounts = FALSE)
# count all obs
+# ————————————————————————
+# A: Drug X 134
+# B: Placebo 134
+# C: Combination 132
+Any variables used as the row or column facets should not have any +empty strings (““). This is because non empty values are required as +labels when generating the table. The code below will generate an +error.
+ +Providing more than one variable name for the row or column structure
+in qtable()
will create a nested table. Arbitrary nesting
+is supported in each dimension.
# A: Drug X B: Placebo C: Combination
+# S1 S2 S1 S2 S1 S2
+# count (N=73) (N=61) (N=67) (N=67) (N=56) (N=76)
+# ————————————————————————————————————————————————————————————————————————
+# F
+# A 12 9 11 13 7 11
+# B 14 11 12 15 9 12
+# C 17 16 13 13 14 13
+# M
+# A 5 11 10 9 6 14
+# B 13 8 7 10 9 12
+# C 8 6 13 6 8 11
+# U
+# A 1 0 1 0 1 0
+# B 1 0 0 1 0 1
+# C 1 0 0 0 1 1
+# UNDIFFERENTIATED
+# A 0 0 0 0 0 1
+# C 1 0 0 0 1 0
+Note that by default, unobserved factor levels within a facet are not
+included in the table. This can be modified with
+drop_levels
. The code below adds a row of 0s for
+STRATA1
level “B” nested under the SEX
level
+“UNDIFFERENTIATED”.
+qtable(
+ ex_adsl,
+ row_vars = c("SEX", "STRATA1"),
+ col_vars = c("ARM", "STRATA2"),
+ drop_levels = FALSE
+)
# A: Drug X B: Placebo C: Combination
+# S1 S2 S1 S2 S1 S2
+# count (N=73) (N=61) (N=67) (N=67) (N=56) (N=76)
+# ————————————————————————————————————————————————————————————————————————
+# F
+# A 12 9 11 13 7 11
+# B 14 11 12 15 9 12
+# C 17 16 13 13 14 13
+# M
+# A 5 11 10 9 6 14
+# B 13 8 7 10 9 12
+# C 8 6 13 6 8 11
+# U
+# A 1 0 1 0 1 0
+# B 1 0 0 1 0 1
+# C 1 0 0 0 1 1
+# UNDIFFERENTIATED
+# A 0 0 0 0 0 1
+# B 0 0 0 0 0 0
+# C 1 0 0 0 1 0
+In contrast, table()
cannot return a nested table.
+Rather it produces a list of contingency tables when more than two
+variables are used as inputs.
+table(ex_adsl$SEX, ex_adsl$STRATA1, ex_adsl$ARM, ex_adsl$STRATA2)
# , , = A: Drug X, = S1
+#
+#
+# A B C
+# F 12 14 17
+# M 5 13 8
+# U 1 1 1
+# UNDIFFERENTIATED 0 0 1
+#
+# , , = B: Placebo, = S1
+#
+#
+# A B C
+# F 11 12 13
+# M 10 7 13
+# U 1 0 0
+# UNDIFFERENTIATED 0 0 0
+#
+# , , = C: Combination, = S1
+#
+#
+# A B C
+# F 7 9 14
+# M 6 9 8
+# U 1 0 1
+# UNDIFFERENTIATED 0 0 1
+#
+# , , = A: Drug X, = S2
+#
+#
+# A B C
+# F 9 11 16
+# M 11 8 6
+# U 0 0 0
+# UNDIFFERENTIATED 0 0 0
+#
+# , , = B: Placebo, = S2
+#
+#
+# A B C
+# F 13 15 13
+# M 9 10 6
+# U 0 1 0
+# UNDIFFERENTIATED 0 0 0
+#
+# , , = C: Combination, = S2
+#
+#
+# A B C
+# F 11 12 13
+# M 14 12 11
+# U 0 1 1
+# UNDIFFERENTIATED 1 0 0
+With some help from stats::ftable()
the nested structure
+can be achieved in two steps.
+t1 <- ftable(ex_adsl[, c("SEX", "STRATA1", "ARM", "STRATA2")])
+ftable(t1, row.vars = c("SEX", "STRATA1"))
# ARM A: Drug X B: Placebo C: Combination
+# STRATA2 S1 S2 S1 S2 S1 S2
+# SEX STRATA1
+# F A 12 9 11 13 7 11
+# B 14 11 12 15 9 12
+# C 17 16 13 13 14 13
+# M A 5 11 10 9 6 14
+# B 13 8 7 10 9 12
+# C 8 6 13 6 8 11
+# U A 1 0 1 0 1 0
+# B 1 0 0 1 0 1
+# C 1 0 0 0 1 1
+# UNDIFFERENTIATED A 0 0 0 0 0 1
+# B 0 0 0 0 0 0
+# C 1 0 0 0 1 0
+So far in all the examples we have seen, we used counts to summarize
+the data in each table cell as this is the default analysis used by
+qtable()
. Internally, a single analysis variable specified
+by avar
is used to generate the counts in the table. The
+default analysis variable is the first variable in data
. In
+the case of ex_adsl
this is “STUDYID”.
Let’s see what happens when we introduce some NA
values
+into the analysis variable:
+tmp_adsl <- ex_adsl
+tmp_adsl[[1]] <- NA_character_
+
+qtable(tmp_adsl, row_vars = "ARM", col_vars = "SEX")
# F M U UNDIFFERENTIATED
+# count (N=222) (N=166) (N=9) (N=3)
+# —————————————————————————————————————————————————————————————
+# A: Drug X 0 0 0 0
+# B: Placebo 0 0 0 0
+# C: Combination 0 0 0 0
+The resulting table is showing 0’s across all cells because all the
+values of the analysis variable are NA
.
Keep this behavior in mind when doing quick exploratory analysis
+using the default counts aggregate function of qtable
.
If this does not suit your purpose, you can either pre-process your
+data to re-code the NA
values or use another analysis
+function. We will see how the latter is done in the Custom Aggregation section.
+# Recode NA values
+tmp_adsl[[1]] <- addNA(tmp_adsl[[1]])
+
+qtable(tmp_adsl, row_vars = "ARM", col_vars = "SEX")
# F M U UNDIFFERENTIATED
+# count (N=222) (N=166) (N=9) (N=3)
+# —————————————————————————————————————————————————————————————
+# A: Drug X 79 51 3 1
+# B: Placebo 77 55 2 0
+# C: Combination 66 60 4 2
+In addition, row and column variables should have NA
+levels explicitly labelled as above. If this is not done, the columns
+and/or rows will not reflect the full data.
+tmp_adsl$new1 <- factor(NA_character_, levels = c("X", "Y", "Z"))
+qtable(tmp_adsl, row_vars = "ARM", col_vars = "new1")
# X Y Z
+# count (N=0) (N=0) (N=0)
+# ——————————————————————————————————————
+# A: Drug X 0 0 0
+# B: Placebo 0 0 0
+# C: Combination 0 0 0
+Explicitly labeling the NA
levels in the column facet
+adds a column to the table:
+tmp_adsl$new2 <- addNA(tmp_adsl$new1)
+levels(tmp_adsl$new2)[4] <- "<NA>" # NA needs to be a recognizible string
+qtable(tmp_adsl, row_vars = "ARM", col_vars = "new2")
# X Y Z <NA>
+# count (N=0) (N=0) (N=0) (N=400)
+# ————————————————————————————————————————————————
+# A: Drug X 0 0 0 134
+# B: Placebo 0 0 0 134
+# C: Combination 0 0 0 132
+A powerful feature of qtable()
is that the user can
+define the type of function used to summarize the data in each facet. We
+can specify the type of analysis summary using the afun
+argument:
+qtable(ex_adsl, row_vars = "STRATA2", col_vars = "ARM", avar = "AGE", afun = mean)
# A: Drug X B: Placebo C: Combination
+# AGE - mean (N=134) (N=134) (N=132)
+# ————————————————————————————————————————————————————
+# S1 34.10 36.46 35.70
+# S2 33.38 34.40 35.24
+Note that the analysis variable AGE
and analysis
+function name are included in the top right header of the table.
If the analysis function returns a vector of 2 or 3 elements, the +result is displayed in multi-valued single cells.
+
+qtable(ex_adsl, row_vars = "STRATA2", col_vars = "ARM", avar = "AGE", afun = range)
# A: Drug X B: Placebo C: Combination
+# AGE - range (N=134) (N=134) (N=132)
+# ————————————————————————————————————————————————————————
+# S1 23.0 / 48.0 24.0 / 62.0 20.0 / 69.0
+# S2 21.0 / 50.0 21.0 / 58.0 23.0 / 64.0
+If you want to use an analysis function with more than 3 summary +elements, you can use a list. In this case, the values are displayed in +the table as multiple stacked cells within each facet. If the list +elements are named, the names are used as row labels.
+
+fivenum2 <- function(x) {
+ setNames(as.list(fivenum(x)), c("min", "Q1", "MED", "Q3", "max"))
+}
+qtable(ex_adsl, row_vars = "STRATA2", col_vars = "ARM", avar = "AGE", afun = fivenum2)
# A: Drug X B: Placebo C: Combination
+# AGE - fivenum2 (N=134) (N=134) (N=132)
+# ————————————————————————————————————————————————————————
+# S1
+# min 23.00 24.00 20.00
+# Q1 28.00 30.00 30.50
+# MED 34.00 36.00 35.00
+# Q3 39.00 40.50 40.00
+# max 48.00 62.00 69.00
+# S2
+# min 21.00 21.00 23.00
+# Q1 29.00 29.50 30.00
+# MED 32.00 32.00 34.50
+# Q3 38.00 39.50 38.00
+# max 50.00 58.00 64.00
+More advanced formatting can be controlled with
+in_rows()
. See function documentation for more details.
+meansd_range <- function(x) {
+ in_rows(
+ "Mean (sd)" = rcell(c(mean(x), sd(x)), format = "xx.xx (xx.xx)"),
+ "Range" = rcell(range(x), format = "xx - xx")
+ )
+}
+
+qtable(ex_adsl, row_vars = "STRATA2", col_vars = "ARM", avar = "AGE", afun = meansd_range)
# A: Drug X B: Placebo C: Combination
+# AGE - meansd_range (N=134) (N=134) (N=132)
+# —————————————————————————————————————————————————————————————————
+# S1
+# Mean (sd) 34.10 (6.71) 36.46 (7.72) 35.70 (8.22)
+# Range 23 - 48 24 - 62 20 - 69
+# S2
+# Mean (sd) 33.38 (6.40) 34.40 (7.99) 35.24 (7.39)
+# Range 21 - 50 21 - 58 23 - 64
+Another feature of qtable()
is the ability to quickly
+add marginal summary rows with the summarize_groups
+argument. This summary will add to the table the count of non-NA records
+of the analysis variable at each level of nesting. For example, compare
+these two tables:
+qtable(
+ ex_adsl,
+ row_vars = c("STRATA1", "STRATA2"), col_vars = "ARM",
+ avar = "AGE", afun = mean
+)
# A: Drug X B: Placebo C: Combination
+# AGE - mean (N=134) (N=134) (N=132)
+# ————————————————————————————————————————————————————
+# A
+# S1 31.61 36.68 34.00
+# S2 34.40 33.55 34.35
+# B
+# S1 34.57 37.68 35.83
+# S2 32.79 34.77 36.68
+# C
+# S1 35.26 35.38 36.58
+# S2 32.95 34.89 34.72
+
+qtable(
+ ex_adsl,
+ row_vars = c("STRATA1", "STRATA2"), col_vars = "ARM",
+ summarize_groups = TRUE, avar = "AGE", afun = mean
+)
# A: Drug X B: Placebo C: Combination
+# AGE - mean (N=134) (N=134) (N=132)
+# —————————————————————————————————————————————————————————
+# A 38 (28.4%) 44 (32.8%) 40 (30.3%)
+# S1 18 (13.4%) 22 (16.4%) 14 (10.6%)
+# AGE - mean 31.61 36.68 34.00
+# S2 20 (14.9%) 22 (16.4%) 26 (19.7%)
+# AGE - mean 34.40 33.55 34.35
+# B 47 (35.1%) 45 (33.6%) 43 (32.6%)
+# S1 28 (20.9%) 19 (14.2%) 18 (13.6%)
+# AGE - mean 34.57 37.68 35.83
+# S2 19 (14.2%) 26 (19.4%) 25 (18.9%)
+# AGE - mean 32.79 34.77 36.68
+# C 49 (36.6%) 45 (33.6%) 49 (37.1%)
+# S1 27 (20.1%) 26 (19.4%) 24 (18.2%)
+# AGE - mean 35.26 35.38 36.58
+# S2 22 (16.4%) 19 (14.2%) 25 (18.9%)
+# AGE - mean 32.95 34.89 34.72
+In the second table, there are marginal summary rows for each level
+of the two row facet variables: STRATA1
and
+STRATA2
. The number 18 in the second row gives the count of
+observations part of ARM
level “A: Drug X”,
+STRATA1
level “A”, and STRATA2
level “S1”. The
+percent is calculated as the cell count divided by the column count
+given in the table header. So we can see that the mean AGE
+of 31.61 in that subgroup is based on 18 subjects which correspond to
+13.4% of the subjects in arm “A: Drug X”.
See ?summarize_row_groups
for how to add marginal
+summary rows when using the core rtables
framework.
Tables generated with qtable()
can include annotations
+such as titles, subtitles and footnotes like so:
+qtable(
+ ex_adsl,
+ row_vars = "STRATA2", col_vars = "ARM",
+ title = "Strata 2 Summary",
+ subtitle = paste0("STUDY ", ex_adsl$STUDYID[1]),
+ main_footer = paste0("Date: ", as.character(Sys.Date()))
+)
# Strata 2 Summary
+# STUDY AB12345
+#
+# ———————————————————————————————————————————————
+# A: Drug X B: Placebo C: Combination
+# count (N=134) (N=134) (N=132)
+# ———————————————————————————————————————————————
+# S1 73 67 56
+# S2 61 67 76
+# ———————————————————————————————————————————————
+#
+# Date: 2024-04-16
+Here is what we have learned in this vignette:
+qtable()
can replace and extend uses of
+table()
and stats::ftable()
qtable()
is useful for exploratory data
+analysis
As the intended use of qtable()
is for exploratory data
+analysis, there is limited functionality for building very complex
+tables. For details on how to get started with the core
+rtables
layout functionality see the introduction
+vignette.
vignettes/format_precedence.Rmd
+ format_precedence.Rmd
Users of the rtables
package can specify the format in
+which the numbers in the reporting tables are printed. Formatting
+functionality is provided by the formatters
+R package. See formatters::list_valid_format_labels()
for a
+list of all available formats. The format can be specified by the user
+in a few different places. It may happen that, for a single table
+layout, the format is specified in more than one place. In such a case,
+the final format that will be applied depends on format precedence rules
+defined by rtables
. In this vignette, we describe the basic
+rules of rtables
format precedence.
The examples shown in this vignette utilize the example
+ADSL
dataset, a demographic table that summarizes the
+variables content for different population subsets (encoded in the
+columns).
Note that all ex_*
data which is currently attached to
+the rtables
package is provided by the formatters
+package and was created using the publicly available random.cdisc.data
+R package.
The format in which numbers are printed can be specified by the user
+in a few different places. In the context of precedence, it is important
+which level of the split hierarchy formats are specified at. In general,
+there are two such levels: the cell level and the
+so-called parent table level. The concept of the cell
+and the parent table results from the way in which the
+rtables
package stores resulting tables. It models the
+resulting tables as hierarchical, tree-like objects with the cells (as
+leaves) containing multiple values. Particularly noteworthy in this
+context is the fact that the actual table splitting occurs in a
+row-dominant way (even if column splitting is present in the layout).
+rtables
provides user-end function
+table_structure()
that prints the structure of a given
+table object.
For a simple illustration, consider the following example:
+
+lyt <- basic_table() %>%
+ split_cols_by("ARM") %>%
+ split_rows_by("SEX") %>%
+ analyze(vars = "AGE", afun = mean)
+
+adsl_analyzed <- build_table(lyt, ADSL)
+adsl_analyzed
# A: Drug X B: Placebo C: Combination
+# —————————————————————————————————————————————————————————————————————————
+# F
+# mean 32.7594936708861 34.1168831168831 35.1969696969697
+# M
+# mean 35.5686274509804 37.4363636363636 35.3833333333333
+# U
+# mean 31.6666666666667 31 35.25
+# UNDIFFERENTIATED
+# mean 28 NA 45
+
+table_structure(adsl_analyzed)
# [TableTree] SEX
+# [TableTree] F
+# [ElementaryTable] AGE (1 x 3)
+# [TableTree] M
+# [ElementaryTable] AGE (1 x 3)
+# [TableTree] U
+# [ElementaryTable] AGE (1 x 3)
+# [TableTree] UNDIFFERENTIATED
+# [ElementaryTable] AGE (1 x 3)
+In this table, there are 4 sub-tables under the SEX
+table. These are: F
, M
, U
, and
+UNDIFFERENTIATED
. Each of these sub-tables has one
+sub-table AGE
. For example, for the first AGE
+sub-table, its parent table is F
.
The concept of hierarchical, tree-like representations of resulting +tables translates directly to format precedence and inheritance rules. +As a general principle, the format being finally applied for the cell is +the one that is the most specific, that is, the one which is the closest +to the cell in a given path in the tree. Hence, the +precedence-inheritance chain looks like the following:
+parent_table -> parent_table -> ... -> parent_table -> cell
+In such a chain, the outermost parent_table
is the least
+specific place to specify the format, while the cell
is the
+most specific one. In cases where the format is specified by the user in
+more than one place, the one which is most specific will be applied in
+the cell. If no specific format has been selected by the user for the
+split, then the default format will be applied. The default format is
+"xx"
and it yields the same formatting as the
+as.character()
function. In the following sections of this
+vignette, we will illustrate the format precedence rules with a few
+examples.
Below is a simple layout that does not explicitly set a format for +the output of the analysis function. In such a case, the default format +is applied.
+
+lyt0 <- basic_table() %>%
+ split_cols_by("ARM") %>%
+ analyze(vars = "AGE", afun = mean)
+
+build_table(lyt0, ADSL)
# A: Drug X B: Placebo C: Combination
+# —————————————————————————————————————————————————————————————
+# mean 33.7686567164179 35.4328358208955 35.4318181818182
+The format of a cell can be explicitly specified via the
+rcell()
or in_rows()
functions. The former is
+essentially a collection of data objects while the latter is a
+collection of rcell()
objects. As previously mentioned,
+this is the most specific place where the format can be specified by the
+user.
+lyt1 <- basic_table() %>%
+ split_cols_by("ARM") %>%
+ analyze(vars = "AGE", afun = function(x) {
+ rcell(mean(x), format = "xx.xx", label = "Mean")
+ })
+
+build_table(lyt1, ADSL)
# A: Drug X B: Placebo C: Combination
+# ——————————————————————————————————————————————
+# Mean 33.77 35.43 35.43
+
+lyt1a <- basic_table() %>%
+ split_cols_by("ARM") %>%
+ analyze(vars = "AGE", afun = function(x) {
+ in_rows(
+ "Mean" = rcell(mean(x)),
+ .formats = "xx.xx"
+ )
+ })
+
+build_table(lyt1a, ADSL)
# A: Drug X B: Placebo C: Combination
+# ——————————————————————————————————————————————
+# Mean 33.77 35.43 35.43
+If the format is specified in both of these places at the same time,
+the one specified via in_rows()
takes highest precedence.
+Technically, in this case, the format defined in rcell()
+will simply be overwritten by the one defined in in_rows()
.
+This is because the format specified in in_rows()
is
+applied to the cells not the rows (overriding the previously specified
+cell-specific values), which indicates that the precedence rules
+described above are still in place.
+lyt2 <- basic_table() %>%
+ split_cols_by("ARM") %>%
+ analyze(vars = "AGE", afun = function(x) {
+ in_rows(
+ "Mean" = rcell(mean(x), format = "xx.xxx"),
+ .formats = "xx.xx"
+ )
+ })
+
+build_table(lyt2, ADSL)
# A: Drug X B: Placebo C: Combination
+# ——————————————————————————————————————————————
+# Mean 33.77 35.43 35.43
+In addition to the cell level, the format can be specified at the +parent table level. If no format has been set by the user for a cell, +the most specific format for that cell is the one defined at its +innermost parent table split (if any).
+
+lyt3 <- basic_table() %>%
+ split_cols_by("ARM") %>%
+ analyze(vars = "AGE", mean, format = "xx.x")
+
+build_table(lyt3, ADSL)
# A: Drug X B: Placebo C: Combination
+# ——————————————————————————————————————————————
+# mean 33.8 35.4 35.4
+If the cell format is also specified for a cell, then the parent +table format is ignored for this cell since the cell format is more +specific and therefore takes precedence.
+
+lyt4 <- basic_table() %>%
+ split_cols_by("ARM") %>%
+ analyze(
+ vars = "AGE", afun = function(x) {
+ rcell(mean(x), format = "xx.xx", label = "Mean")
+ },
+ format = "xx.x"
+ )
+
+build_table(lyt4, ADSL)
# A: Drug X B: Placebo C: Combination
+# ——————————————————————————————————————————————
+# Mean 33.77 35.43 35.43
+
+lyt4a <- basic_table() %>%
+ split_cols_by("ARM") %>%
+ analyze(
+ vars = "AGE", afun = function(x) {
+ in_rows(
+ "Mean" = rcell(mean(x)),
+ "SD" = rcell(sd(x)),
+ .formats = "xx.xx"
+ )
+ },
+ format = "xx.x"
+ )
+
+build_table(lyt4a, ADSL)
# A: Drug X B: Placebo C: Combination
+# ——————————————————————————————————————————————
+# Mean 33.77 35.43 35.43
+# SD 6.55 7.90 7.72
+In the following, slightly more complicated, example, we can observe
+partial inheritance. That is, only SD
cells inherit the
+parent table’s format while the Mean
cells do not.
+lyt5 <- basic_table() %>%
+ split_cols_by("ARM") %>%
+ analyze(
+ vars = "AGE", afun = function(x) {
+ in_rows(
+ "Mean" = rcell(mean(x), format = "xx.xx"),
+ "SD" = rcell(sd(x))
+ )
+ },
+ format = "xx.x"
+ )
+
+build_table(lyt5, ADSL)
# A: Drug X B: Placebo C: Combination
+# ——————————————————————————————————————————————
+# Mean 33.77 35.43 35.43
+# SD 6.6 7.9 7.7
+NA
Handling
+Consider the following layout and the resulting table created:
+
+lyt6 <- basic_table() %>%
+ split_cols_by("ARM") %>%
+ split_rows_by("SEX") %>%
+ analyze(vars = "AGE", afun = mean, format = "xx.xx")
+
+build_table(lyt6, ADSL)
# A: Drug X B: Placebo C: Combination
+# ——————————————————————————————————————————————————————————
+# F
+# mean 32.76 34.12 35.20
+# M
+# mean 35.57 37.44 35.38
+# U
+# mean 31.67 31.00 35.25
+# UNDIFFERENTIATED
+# mean 28.00 NA 45.00
+In the output the cell corresponding to the
+UNDIFFERENTIATED
level of SEX
and the
+B: Placebo
level of ARM
is displayed as
+NA
. This occurs because there were no non-NA
+values under this facet that could be used to compute the mean.
+rtables
allows the user to specify a string to display when
+cell values are NA
. Similar to formats for numbers, the
+user can specify a string to replace NA
with the parameter
+format_na_str
or .format_na_str
. This can be
+specified at the cell or parent table level. NA
string
+precedence and inheritance rules are the same as those for number format
+precedence, described in the previous section of this vignette. We will
+illustrate this with a few examples.
NA
Values at the Cell Level
+At the cell level, it is possible to replace NA
values
+with a custom string by means of the format_na_str
+parameter in rcell()
or .format_na_str
+parameter in in_rows()
.
+lyt7 <- basic_table() %>%
+ split_cols_by("ARM") %>%
+ split_rows_by("SEX") %>%
+ analyze(vars = "AGE", afun = function(x) {
+ rcell(mean(x), format = "xx.xx", label = "Mean", format_na_str = "<missing>")
+ })
+
+build_table(lyt7, ADSL)
# A: Drug X B: Placebo C: Combination
+# ——————————————————————————————————————————————————————————
+# F
+# Mean 32.76 34.12 35.20
+# M
+# Mean 35.57 37.44 35.38
+# U
+# Mean 31.67 31.00 35.25
+# UNDIFFERENTIATED
+# Mean 28.00 <missing> 45.00
+
+lyt7a <- basic_table() %>%
+ split_cols_by("ARM") %>%
+ split_rows_by("SEX") %>%
+ analyze(vars = "AGE", afun = function(x) {
+ in_rows(
+ "Mean" = rcell(mean(x), format = "xx.xx"),
+ .format_na_strs = "<MISSING>"
+ )
+ })
+
+build_table(lyt7a, ADSL)
# A: Drug X B: Placebo C: Combination
+# ——————————————————————————————————————————————————————————
+# F
+# Mean 32.76 34.12 35.20
+# M
+# Mean 35.57 37.44 35.38
+# U
+# Mean 31.67 31.00 35.25
+# UNDIFFERENTIATED
+# Mean 28.00 <MISSING> 45.00
+If the NA
string is specified in both of these places at
+the same time, the one specified with in_rows()
takes
+precedence. Technically, in this case the NA
replacement
+string defined in rcell()
will simply be overwritten by the
+one defined in in_rows()
. This is because the
+NA
string specified in in_rows()
is applied to
+the cells, not the rows (overriding the previously specified cell
+specific values), which means that the precedence rules described above
+are still in place.
+lyt8 <- basic_table() %>%
+ split_cols_by("ARM") %>%
+ split_rows_by("SEX") %>%
+ analyze(vars = "AGE", afun = function(x) {
+ in_rows(
+ "Mean" = rcell(mean(x), format = "xx.xx", format_na_str = "<missing>"),
+ .format_na_strs = "<MISSING>"
+ )
+ })
+
+build_table(lyt8, ADSL)
# A: Drug X B: Placebo C: Combination
+# ——————————————————————————————————————————————————————————
+# F
+# Mean 32.76 34.12 35.20
+# M
+# Mean 35.57 37.44 35.38
+# U
+# Mean 31.67 31.00 35.25
+# UNDIFFERENTIATED
+# Mean 28.00 <MISSING> 45.00
+NA
Values and Inheritance
+Principles
+In addition to the cell level, the string replacement for
+NA
values can be specified at the parent table level. If no
+replacement string has been specified by the user for a cell, the most
+specific NA
string for that cell is the one defined at its
+innermost parent table split (if any).
+lyt9 <- basic_table() %>%
+ split_cols_by("ARM") %>%
+ split_rows_by("SEX") %>%
+ analyze(vars = "AGE", mean, format = "xx.xx", na_str = "not available")
+
+build_table(lyt9, ADSL)
# A: Drug X B: Placebo C: Combination
+# —————————————————————————————————————————————————————————————
+# F
+# mean 32.76 34.12 35.20
+# M
+# mean 35.57 37.44 35.38
+# U
+# mean 31.67 31.00 35.25
+# UNDIFFERENTIATED
+# mean 28.00 not available 45.00
+If an NA
value replacement string was also specified at
+the cell level, then the one set at the parent table level is ignored
+for this cell as the cell level format is more specific and therefore
+takes precedence.
+lyt10 <- basic_table() %>%
+ split_cols_by("ARM") %>%
+ split_rows_by("SEX") %>%
+ analyze(
+ vars = "AGE", afun = function(x) {
+ rcell(mean(x), format = "xx.xx", label = "Mean", format_na_str = "<missing>")
+ },
+ na_str = "not available"
+ )
+
+build_table(lyt10, ADSL)
# A: Drug X B: Placebo C: Combination
+# ——————————————————————————————————————————————————————————
+# F
+# Mean 32.76 34.12 35.20
+# M
+# Mean 35.57 37.44 35.38
+# U
+# Mean 31.67 31.00 35.25
+# UNDIFFERENTIATED
+# Mean 28.00 <missing> 45.00
+
+lyt10a <- basic_table() %>%
+ split_cols_by("ARM") %>%
+ split_rows_by("SEX") %>%
+ analyze(
+ vars = "AGE", afun = function(x) {
+ in_rows(
+ "Mean" = rcell(mean(x)),
+ "SD" = rcell(sd(x)),
+ .formats = "xx.xx",
+ .format_na_strs = "<missing>"
+ )
+ },
+ na_str = "not available"
+ )
+
+build_table(lyt10a, ADSL)
# A: Drug X B: Placebo C: Combination
+# ——————————————————————————————————————————————————————————
+# F
+# Mean 32.76 34.12 35.20
+# SD 6.09 7.06 7.43
+# M
+# Mean 35.57 37.44 35.38
+# SD 7.08 8.69 8.24
+# U
+# Mean 31.67 31.00 35.25
+# SD 3.21 5.66 3.10
+# UNDIFFERENTIATED
+# Mean 28.00 <missing> 45.00
+# SD <missing> <missing> 1.41
+In the following, slightly more complicated example, we can observe
+partial inheritance of NA strings. That is, only SD
cells
+inherit the parent table’s NA
string, while the
+Mean
cells do not.
+lyt11 <- basic_table() %>%
+ split_cols_by("ARM") %>%
+ split_rows_by("SEX") %>%
+ analyze(
+ vars = "AGE", afun = function(x) {
+ in_rows(
+ "Mean" = rcell(mean(x), format_na_str = "<missing>"),
+ "SD" = rcell(sd(x))
+ )
+ },
+ format = "xx.xx",
+ na_str = "not available"
+ )
+
+build_table(lyt11, ADSL)
# A: Drug X B: Placebo C: Combination
+# —————————————————————————————————————————————————————————————————
+# F
+# Mean 32.76 34.12 35.20
+# SD 6.09 7.06 7.43
+# M
+# Mean 35.57 37.44 35.38
+# SD 7.08 8.69 8.24
+# U
+# Mean 31.67 31.00 35.25
+# SD 3.21 5.66 3.10
+# UNDIFFERENTIATED
+# Mean 28.00 <missing> 45.00
+# SD not available not available 1.41
+Articles intended for developer use only.
+ +vignettes/introduction.Rmd
+ introduction.Rmd
The rtables
package provides a framework to create,
+tabulate, and output tables in R. Most of the design requirements for
+rtables
have their origin in studying tables that are
+commonly used to report analyses from clinical trials; however, we were
+careful to keep rtables
a general purpose toolkit.
In this vignette, we give a short introduction into
+rtables
and tabulating a table.
The content in this vignette is based on the following two +resources:
+rtables
+useR 2020 presentation by Gabriel Beckerrtables
+- A Framework For Creating Complex Structured Reporting Tables Via
+Multi-Level Faceted Computations.The packages used in this vignette are rtables
and
+dplyr
:
To build a table using rtables
two components are
+required: A layout constructed using rtables
functions, and
+a data.frame
of unaggregated data. These two elements are
+combined to build a table object. Table objects contain information
+about both the content and the structure of the table, as well as
+instructions on how this information should be processed to construct
+the table. After obtaining the table object, a formatted table can be
+printed in ASCII format, or exported to a variety of other formats
+(.txt
, .pdf
, .docx
, etc.).
The data used in this vignette is a made up using random number +generators. The data content is relatively simple: one row per imaginary +person and one column per measurement: study arm, the country of origin, +gender, handedness, age, and weight.
+
+n <- 400
+
+set.seed(1)
+
+df <- tibble(
+ arm = factor(sample(c("Arm A", "Arm B"), n, replace = TRUE), levels = c("Arm A", "Arm B")),
+ country = factor(sample(c("CAN", "USA"), n, replace = TRUE, prob = c(.55, .45)), levels = c("CAN", "USA")),
+ gender = factor(sample(c("Female", "Male"), n, replace = TRUE), levels = c("Female", "Male")),
+ handed = factor(sample(c("Left", "Right"), n, prob = c(.6, .4), replace = TRUE), levels = c("Left", "Right")),
+ age = rchisq(n, 30) + 10
+) %>% mutate(
+ weight = 35 * rnorm(n, sd = .5) + ifelse(gender == "Female", 140, 180)
+)
+
+head(df)
# # A tibble: 6 × 6
+# arm country gender handed age weight
+# <fct> <fct> <fct> <fct> <dbl> <dbl>
+# 1 Arm A USA Female Left 31.3 139.
+# 2 Arm B CAN Female Right 50.5 116.
+# 3 Arm A USA Male Right 32.4 186.
+# 4 Arm A USA Male Right 34.6 169.
+# 5 Arm B USA Female Right 43.0 160.
+# 6 Arm A USA Female Right 43.2 126.
+Note that we use factor variables so that the level order is
+represented in the row or column order when we tabulate the information
+of df
below.
The aim of this vignette is to build the following table step by +step:
+# Arm A Arm B
+# Female Male Female Male
+# (N=96) (N=105) (N=92) (N=107)
+# ————————————————————————————————————————————————————————————
+# CAN 45 (46.9%) 64 (61.0%) 46 (50.0%) 62 (57.9%)
+# Left 32 (33.3%) 42 (40.0%) 26 (28.3%) 37 (34.6%)
+# mean 38.87 40.43 40.33 37.68
+# Right 13 (13.5%) 22 (21.0%) 20 (21.7%) 25 (23.4%)
+# mean 36.64 40.19 40.16 40.65
+# USA 51 (53.1%) 41 (39.0%) 46 (50.0%) 45 (42.1%)
+# Left 34 (35.4%) 19 (18.1%) 25 (27.2%) 25 (23.4%)
+# mean 40.36 39.68 39.21 40.07
+# Right 17 (17.7%) 22 (21.0%) 21 (22.8%) 20 (18.7%)
+# mean 36.94 39.80 38.53 39.02
+The table above can be achieved via the qtable()
+function. If you are new to tabulation with the rtables
+layout framework, you can use this convenience wrapper to create many
+types of two-way frequency tables.
The purpose of qtable
is to enable quick exploratory
+data analysis. See the exploratory_analysis
+vignette for more details.
Here is the code to recreate the table above:
+
+qtable(df,
+ row_vars = c("country", "handed"),
+ col_vars = c("arm", "gender"),
+ avar = "age",
+ afun = mean,
+ summarize_groups = TRUE,
+ row_labels = "mean"
+)
# Arm A Arm B
+# Female Male Female Male
+# age - mean (N=96) (N=105) (N=92) (N=107)
+# ——————————————————————————————————————————————————————————————
+# CAN 45 (46.9%) 64 (61.0%) 46 (50.0%) 62 (57.9%)
+# Left 32 (33.3%) 42 (40.0%) 26 (28.3%) 37 (34.6%)
+# mean 38.87 40.43 40.33 37.68
+# Right 13 (13.5%) 22 (21.0%) 20 (21.7%) 25 (23.4%)
+# mean 36.64 40.19 40.16 40.65
+# USA 51 (53.1%) 41 (39.0%) 46 (50.0%) 45 (42.1%)
+# Left 34 (35.4%) 19 (18.1%) 25 (27.2%) 25 (23.4%)
+# mean 40.36 39.68 39.21 40.07
+# Right 17 (17.7%) 22 (21.0%) 21 (22.8%) 20 (18.7%)
+# mean 36.94 39.80 38.53 39.02
+From the qtable
function arguments above we can see many
+of the key concepts of the underlying rtables
layout
+framework. The user needs to define:
In the sections below we will look at translating each of these
+questions to a set of features part of the rtables
layout
+framework. Now let’s take a look at building the example table with a
+layout.
In rtables
a basic table is defined to have 0 rows and
+one column representing all data. Analyzing a variable is one way of
+adding a row:
+lyt <- basic_table() %>%
+ analyze("age", mean, format = "xx.x")
+
+tbl <- build_table(lyt, df)
+tbl
# all obs
+# ——————————————
+# mean 39.4
+In the code above we first described the table and assigned that
+description to a variable lyt
. We then built the table
+using the actual data with build_table()
. The description
+of a table is called a table layout. basic_table()
is the
+start of every table layout and contains the information that we have in
+one column representing all data. The analyze()
instruction
+adds to the layout that the age
variable should be analyzed
+with the mean()
analysis function and the result should be
+rounded to 1
decimal place.
Hence, a layout is “pre-data”, that is, it’s a description of how to +build a table once we get data. We can look at the layout isolated:
+
+lyt
# A Pre-data Table Layout
+#
+# Column-Split Structure:
+# ()
+#
+# Row-Split Structure:
+# age (** analysis **)
+The general layouting instructions are summarized below:
+basic_table()
is a layout representing a table with
+zero rows and one columnsplit_rows_by()
,
+split_rows_by_multivar()
,
+split_rows_by_cuts()
, split_rows_by_cutfun()
,
+split_rows_by_quartiles()
+split_cols_by()
,
+split_cols_by_multivar()
,
+split_cols_by_cuts()
, split_cols_by_cutfun()
,
+split_cols_by_quartiles()
+summarize_row_groups()
+analyze()
,
+analyze_colvars()
+Using those functions, it is possible to create a wide variety of +tables as we will show in this document.
+We will now add more structure to the columns by adding a column
+split based on the factor variable arm
:
+lyt <- basic_table() %>%
+ split_cols_by("arm") %>%
+ analyze("age", afun = mean, format = "xx.x")
+
+tbl <- build_table(lyt, df)
+tbl
# Arm A Arm B
+# ————————————————————
+# mean 39.5 39.4
+The resulting table has one column per factor level of
+arm
. So the data represented by the first column is
+df[df$arm == "ARM A", ]
. Hence, the
+split_cols_by()
partitions the data among the columns by
+default.
Column splitting can be done in a recursive/nested manner by adding
+sequential split_cols_by()
layout instruction. It’s also
+possible to add a non-nested split. Here we splitting each arm further
+by the gender:
+lyt <- basic_table() %>%
+ split_cols_by("arm") %>%
+ split_cols_by("gender") %>%
+ analyze("age", afun = mean, format = "xx.x")
+
+tbl <- build_table(lyt, df)
+tbl
# Arm A Arm B
+# Female Male Female Male
+# ————————————————————————————————————
+# mean 38.8 40.1 39.6 39.2
+The first column represents the data in df
where
+df$arm == "A" & df$gender == "Female"
and the second
+column the data in df
where
+df$arm == "A" & df$gender == "Male"
, and so on.
So far, we have created layouts with analysis and column splitting
+instructions, i.e. analyze()
and
+split_cols_by()
, respectively. This resulted with a table
+with multiple columns and one data row. We will add more row structure
+by stratifying the mean analysis by country (i.e. adding a split in the
+row space):
+lyt <- basic_table() %>%
+ split_cols_by("arm") %>%
+ split_cols_by("gender") %>%
+ split_rows_by("country") %>%
+ analyze("age", afun = mean, format = "xx.x")
+
+tbl <- build_table(lyt, df)
+tbl
# Arm A Arm B
+# Female Male Female Male
+# ——————————————————————————————————————
+# CAN
+# mean 38.2 40.3 40.3 38.9
+# USA
+# mean 39.2 39.7 38.9 39.6
+In this table the data used to derive the first data cell (average of
+age of female Canadians in Arm A) is where
+df$country == "CAN" & df$arm == "Arm A" & df$gender == "Female"
.
+This cell value can also be calculated manually:
+mean(df$age[df$country == "CAN" & df$arm == "Arm A" & df$gender == "Female"])
# [1] 38.22447
+Row structure can also be used to group the table into titled groups
+of pages during rendering. We do this via ‘page by splits’, which are
+declared via page_by = TRUE
within a call to
+split_rows_by
:
+lyt <- basic_table() %>%
+ split_cols_by("arm") %>%
+ split_cols_by("gender") %>%
+ split_rows_by("country", page_by = TRUE) %>%
+ split_rows_by("handed") %>%
+ analyze("age", afun = mean, format = "xx.x")
+
+tbl <- build_table(lyt, df)
+cat(export_as_txt(tbl, page_type = "letter", page_break = "\n\n~~~~~~ Page Break ~~~~~~\n\n"))
#
+# country: CAN
+#
+# ————————————————————————————————————————
+# Arm A Arm B
+# Female Male Female Male
+# ————————————————————————————————————————
+# Left
+# mean 38.9 40.4 40.3 37.7
+# Right
+# mean 36.6 40.2 40.2 40.6
+#
+#
+# ~~~~~~ Page Break ~~~~~~
+#
+#
+# country: USA
+#
+# ————————————————————————————————————————
+# Arm A Arm B
+# Female Male Female Male
+# ————————————————————————————————————————
+# Left
+# mean 40.4 39.7 39.2 40.1
+# Right
+# mean 36.9 39.8 38.5 39.0
+We go into more detail on page-by splits and how to control the +page-group specific titles in the Title and footer vignette.
+Note that if you print or render a table without pagination, the +page_by splits are currently rendered as normal row splits. This may +change in future releases.
+When adding row splits, we get by default label rows for each split
+level, for example CAN
and USA
in the table
+above. Besides the column space subsetting, we have now further
+subsetted the data for each cell. It is often useful when defining a row
+splitting to display information about each row group. In
+rtables
this is referred to as content information,
+i.e. mean()
on row 2 is a descendant of CAN
+(visible via the indenting, though the table has an underlying tree
+structure that is not of importance for this vignette). In order to add
+content information and turn the CAN
label row into a
+content row, the summarize_row_groups()
function is
+required. By default, the count (nrows()
) and percentage of
+data relative to the column associated data is calculated:
+lyt <- basic_table() %>%
+ split_cols_by("arm") %>%
+ split_cols_by("gender") %>%
+ split_rows_by("country") %>%
+ summarize_row_groups() %>%
+ analyze("age", afun = mean, format = "xx.x")
+
+tbl <- build_table(lyt, df)
+tbl
# Arm A Arm B
+# Female Male Female Male
+# ——————————————————————————————————————————————————————————
+# CAN 45 (46.9%) 64 (61.0%) 46 (50.0%) 62 (57.9%)
+# mean 38.2 40.3 40.3 38.9
+# USA 51 (53.1%) 41 (39.0%) 46 (50.0%) 45 (42.1%)
+# mean 39.2 39.7 38.9 39.6
+The relative percentage for average age of female Canadians is +calculated as follows:
+
+df_cell <- subset(df, df$country == "CAN" & df$arm == "Arm A" & df$gender == "Female")
+df_col_1 <- subset(df, df$arm == "Arm A" & df$gender == "Female")
+
+c(count = nrow(df_cell), percentage = nrow(df_cell) / nrow(df_col_1))
# count percentage
+# 45.00000 0.46875
+so the group percentages per row split sum up to 1 for each +column.
+We can further split the row space by dividing each country by +handedness:
+
+lyt <- basic_table() %>%
+ split_cols_by("arm") %>%
+ split_cols_by("gender") %>%
+ split_rows_by("country") %>%
+ summarize_row_groups() %>%
+ split_rows_by("handed") %>%
+ analyze("age", afun = mean, format = "xx.x")
+
+tbl <- build_table(lyt, df)
+tbl
# Arm A Arm B
+# Female Male Female Male
+# ————————————————————————————————————————————————————————————
+# CAN 45 (46.9%) 64 (61.0%) 46 (50.0%) 62 (57.9%)
+# Left
+# mean 38.9 40.4 40.3 37.7
+# Right
+# mean 36.6 40.2 40.2 40.6
+# USA 51 (53.1%) 41 (39.0%) 46 (50.0%) 45 (42.1%)
+# Left
+# mean 40.4 39.7 39.2 40.1
+# Right
+# mean 36.9 39.8 38.5 39.0
+Next, we further add a count and percentage summary for handedness +within each country:
+
+lyt <- basic_table() %>%
+ split_cols_by("arm") %>%
+ split_cols_by("gender") %>%
+ split_rows_by("country") %>%
+ summarize_row_groups() %>%
+ split_rows_by("handed") %>%
+ summarize_row_groups() %>%
+ analyze("age", afun = mean, format = "xx.x")
+
+tbl <- build_table(lyt, df)
+tbl
# Arm A Arm B
+# Female Male Female Male
+# ————————————————————————————————————————————————————————————
+# CAN 45 (46.9%) 64 (61.0%) 46 (50.0%) 62 (57.9%)
+# Left 32 (33.3%) 42 (40.0%) 26 (28.3%) 37 (34.6%)
+# mean 38.9 40.4 40.3 37.7
+# Right 13 (13.5%) 22 (21.0%) 20 (21.7%) 25 (23.4%)
+# mean 36.6 40.2 40.2 40.6
+# USA 51 (53.1%) 41 (39.0%) 46 (50.0%) 45 (42.1%)
+# Left 34 (35.4%) 19 (18.1%) 25 (27.2%) 25 (23.4%)
+# mean 40.4 39.7 39.2 40.1
+# Right 17 (17.7%) 22 (21.0%) 21 (22.8%) 20 (18.7%)
+# mean 36.9 39.8 38.5 39.0
+There are a number of other table frameworks available in
+R
, including:
There are a number of reasons to choose rtables
(yet
+another tables R package):
More in depth comparisons of the various tabulation frameworks can be +found in the Overview +of table R packages chapter of the Tables in Clinical Trials with R +book compiled by the R Consortium Tables Working Group.
+In this vignette you have learned:
+The other vignettes in the rtables
package will provide
+more detailed information about the rtables
package. We
+recommend that you continue with the tabulation_dplyr
+vignette which compares the information derived by the table in this
+vignette using dplyr
.
vignettes/introspecting_tables.Rmd
+ introspecting_tables.Rmd
The packages used in this vignette are rtables
and
+dplyr
:
First, let’s set up a simple table.
+
+lyt <- basic_table(show_colcounts = TRUE) %>%
+ split_cols_by("ARMCD") %>%
+ split_cols_by("STRATA2") %>%
+ split_rows_by("STRATA1") %>%
+ add_overall_col("All") %>%
+ summarize_row_groups() %>%
+ analyze("AGE", afun = max, format = "xx.x")
+
+tbl <- build_table(lyt, ex_adsl)
+tbl
# ARM A ARM B ARM C
+# S1 S2 S1 S2 S1 S2 All
+# (N=73) (N=61) (N=67) (N=67) (N=56) (N=76) (N=400)
+# —————————————————————————————————————————————————————————————————————————————————————————————————
+# A 18 (24.7%) 20 (32.8%) 22 (32.8%) 22 (32.8%) 14 (25.0%) 26 (34.2%) 122 (30.5%)
+# max 40.0 46.0 62.0 50.0 47.0 45.0 62.0
+# B 28 (38.4%) 19 (31.1%) 19 (28.4%) 26 (38.8%) 18 (32.1%) 25 (32.9%) 135 (33.8%)
+# max 48.0 47.0 58.0 58.0 46.0 64.0 64.0
+# C 27 (37.0%) 22 (36.1%) 26 (38.8%) 19 (28.4%) 24 (42.9%) 25 (32.9%) 143 (35.8%)
+# max 48.0 50.0 48.0 51.0 69.0 50.0 69.0
+We can get basic table dimensions, the number of rows, and the number +of columns with the following code:
+
+dim(tbl)
# [1] 6 7
+
+nrow(tbl)
# [1] 6
+
+ncol(tbl)
# [1] 7
+The table_structure()
function prints a summary of a
+table’s row structure at one of two levels of detail. By default, it
+summarizes the structure at the subtable level.
+table_structure(tbl)
# [TableTree] STRATA1
+# [TableTree] A [cont: 1 x 7]
+# [ElementaryTable] AGE (1 x 7)
+# [TableTree] B [cont: 1 x 7]
+# [ElementaryTable] AGE (1 x 7)
+# [TableTree] C [cont: 1 x 7]
+# [ElementaryTable] AGE (1 x 7)
+When the detail
argument is set to "row"
,
+however, it provides a more detailed row-level summary which acts as a
+useful alternative to how we might normally use the str()
+function to interrogate compound nested lists.
+table_structure(tbl, detail = "row")
# TableTree: [STRATA1] (STRATA1)
+# labelrow: [STRATA1] (STRATA1) - <not visible>
+# children:
+# TableTree: [A] (A)
+# labelrow: [A] (A) - <not visible>
+# content:
+# ElementaryTable: [A@content] ()
+# labelrow: [] () - <not visible>
+# children:
+# ContentRow: [A] (A)
+# children:
+# ElementaryTable: [AGE] (AGE)
+# labelrow: [AGE] (AGE) - <not visible>
+# children:
+# DataRow: [max] (max)
+# TableTree: [B] (B)
+# labelrow: [B] (B) - <not visible>
+# content:
+# ElementaryTable: [B@content] ()
+# labelrow: [] () - <not visible>
+# children:
+# ContentRow: [B] (B)
+# children:
+# ElementaryTable: [AGE] (AGE)
+# labelrow: [AGE] (AGE) - <not visible>
+# children:
+# DataRow: [max] (max)
+# TableTree: [C] (C)
+# labelrow: [C] (C) - <not visible>
+# content:
+# ElementaryTable: [C@content] ()
+# labelrow: [] () - <not visible>
+# children:
+# ContentRow: [C] (C)
+# children:
+# ElementaryTable: [AGE] (AGE)
+# labelrow: [AGE] (AGE) - <not visible>
+# children:
+# DataRow: [max] (max)
+The make_row_df()
and make_col_df()
+functions each create a data.frame
with a variety of
+information about the table’s structure. Most useful for introspection
+purposes are the label
, name
,
+abs_rownumber
, path
and
+node_class
columns (the remainder of the information in the
+returned data.frame
is used for pagination)
+make_row_df(tbl)[, c("label", "name", "abs_rownumber", "path", "node_class")]
# label name abs_rownumber path node_class
+# 1 A A 1 STRATA1,.... ContentRow
+# 2 max max 2 STRATA1,.... DataRow
+# 3 B B 3 STRATA1,.... ContentRow
+# 4 max max 4 STRATA1,.... DataRow
+# 5 C C 5 STRATA1,.... ContentRow
+# 6 max max 6 STRATA1,.... DataRow
+There is also a wrapper function, row_paths()
available
+for make_row_df
to display only the row path structure:
+row_paths(tbl)
# [[1]]
+# [1] "STRATA1" "A" "@content" "A"
+#
+# [[2]]
+# [1] "STRATA1" "A" "AGE" "max"
+#
+# [[3]]
+# [1] "STRATA1" "B" "@content" "B"
+#
+# [[4]]
+# [1] "STRATA1" "B" "AGE" "max"
+#
+# [[5]]
+# [1] "STRATA1" "C" "@content" "C"
+#
+# [[6]]
+# [1] "STRATA1" "C" "AGE" "max"
+By default make_row_df()
summarizes only visible rows,
+but setting visible_only
to FALSE
gives us a
+structural summary of the table with the full hierarchy of subtables,
+including those that are not represented directly by any visible
+rows:
+make_row_df(tbl, visible_only = FALSE)[, c("label", "name", "abs_rownumber", "path", "node_class")]
# label name abs_rownumber path node_class
+# 1 STRATA1 NA STRATA1 TableTree
+# 2 A NA STRATA1, A TableTree
+# 3 A@content NA STRATA1,.... ElementaryTable
+# 4 A A 1 STRATA1,.... ContentRow
+# 5 AGE NA STRATA1,.... ElementaryTable
+# 6 max max 2 STRATA1,.... DataRow
+# 7 B NA STRATA1, B TableTree
+# 8 B@content NA STRATA1,.... ElementaryTable
+# 9 B B 3 STRATA1,.... ContentRow
+# 10 AGE NA STRATA1,.... ElementaryTable
+# 11 max max 4 STRATA1,.... DataRow
+# 12 C NA STRATA1, C TableTree
+# 13 C@content NA STRATA1,.... ElementaryTable
+# 14 C C 5 STRATA1,.... ContentRow
+# 15 AGE NA STRATA1,.... ElementaryTable
+# 16 max max 6 STRATA1,.... DataRow
+make_col_df()
similarly accepts
+visible_only
, though here the meaning is slightly
+different, indicating whether only leaf columns should be
+summarized (defaults to TRUE
) or whether higher level
+groups of columns - analogous to subtables in row space - should be
+summarized as well.
+make_col_df(tbl)[, c("label", "name", "abs_pos", "path", "leaf_indices")]
# label name abs_pos path leaf_indices
+# 1 S1 S1 1 ARMCD, A.... 1
+# 2 S2 S2 2 ARMCD, A.... 2
+# 3 S1 S1 3 ARMCD, A.... 3
+# 4 S2 S2 4 ARMCD, A.... 4
+# 5 S1 S1 5 ARMCD, A.... 5
+# 6 S2 S2 6 ARMCD, A.... 6
+# 7 All All 7 All, All 7
+
+make_col_df(tbl, visible_only = FALSE)[, c("label", "name", "abs_pos", "path", "leaf_indices")]
# label name abs_pos path leaf_indices
+# 1 ARM A ARM A NA ARMCD, ARM A 1, 2
+# 2 S1 S1 1 ARMCD, A.... 1
+# 3 S2 S2 2 ARMCD, A.... 2
+# 4 ARM B ARM B NA ARMCD, ARM B 3, 4
+# 5 S1 S1 3 ARMCD, A.... 3
+# 6 S2 S2 4 ARMCD, A.... 4
+# 7 ARM C ARM C NA ARMCD, ARM C 5, 6
+# 8 S1 S1 5 ARMCD, A.... 5
+# 9 S2 S2 6 ARMCD, A.... 6
+# 10 All All 7 All, All 7
+Similarly, there is wrapper function col_paths()
+available, which displays only the column structure:
+col_paths(tbl)
# [[1]]
+# [1] "ARMCD" "ARM A" "STRATA2" "S1"
+#
+# [[2]]
+# [1] "ARMCD" "ARM A" "STRATA2" "S2"
+#
+# [[3]]
+# [1] "ARMCD" "ARM B" "STRATA2" "S1"
+#
+# [[4]]
+# [1] "ARMCD" "ARM B" "STRATA2" "S2"
+#
+# [[5]]
+# [1] "ARMCD" "ARM C" "STRATA2" "S1"
+#
+# [[6]]
+# [1] "ARMCD" "ARM C" "STRATA2" "S2"
+#
+# [[7]]
+# [1] "All" "All"
+The row_paths_summary()
and
+col_paths_summary()
functions wrap the respective
+make_*_df
functions, printing the name
,
+node_class
, and path
information (in the row
+case), or the label
and path
information (in
+the column case), indented to illustrate table structure:
+row_paths_summary(tbl)
# rowname node_class path
+# ————————————————————————————————————————————————
+# A ContentRow STRATA1, A, @content, A
+# max DataRow STRATA1, A, AGE, max
+# B ContentRow STRATA1, B, @content, B
+# max DataRow STRATA1, B, AGE, max
+# C ContentRow STRATA1, C, @content, C
+# max DataRow STRATA1, C, AGE, max
+
+col_paths_summary(tbl)
# label path
+# ——————————————————————————————————
+# ARM A ARMCD, ARM A
+# S1 ARMCD, ARM A, STRATA2, S1
+# S2 ARMCD, ARM A, STRATA2, S2
+# ARM B ARMCD, ARM B
+# S1 ARMCD, ARM B, STRATA2, S1
+# S2 ARMCD, ARM B, STRATA2, S2
+# ARM C ARMCD, ARM C
+# S1 ARMCD, ARM C, STRATA2, S1
+# S2 ARMCD, ARM C, STRATA2, S2
+# All All, All
+We can gain insight into the value formatting structure of a table
+using table_shell()
, which returns a table with the same
+output as print()
but with the cell values replaced by
+their underlying format strings (e.g. instead of 40.0
,
+xx.x
is displayed, and so on). This is useful for
+understanding the structure of the table, and for debugging purposes.
+Another useful tool is the value_formats()
function which
+instead of a table returns a matrix of the format strings for each cell
+value in the table.
See below the printout for the above examples:
+
+table_shell(tbl)
# ARM A ARM B ARM C
+# S1 S2 S1 S2 S1 S2 All
+# (N=73) (N=61) (N=67) (N=67) (N=56) (N=76) (N=400)
+# ————————————————————————————————————————————————————————————————————————————————————————————————
+# A xx (xx.x%) xx (xx.x%) xx (xx.x%) xx (xx.x%) xx (xx.x%) xx (xx.x%) xx (xx.x%)
+# max xx.x xx.x xx.x xx.x xx.x xx.x xx.x
+# B xx (xx.x%) xx (xx.x%) xx (xx.x%) xx (xx.x%) xx (xx.x%) xx (xx.x%) xx (xx.x%)
+# max xx.x xx.x xx.x xx.x xx.x xx.x xx.x
+# C xx (xx.x%) xx (xx.x%) xx (xx.x%) xx (xx.x%) xx (xx.x%) xx (xx.x%) xx (xx.x%)
+# max xx.x xx.x xx.x xx.x xx.x xx.x xx.x
+
+value_formats(tbl)
# ARM A.S1 ARM A.S2 ARM B.S1 ARM B.S2 ARM C.S1
+# A "xx (xx.x%)" "xx (xx.x%)" "xx (xx.x%)" "xx (xx.x%)" "xx (xx.x%)"
+# max "xx.x" "xx.x" "xx.x" "xx.x" "xx.x"
+# B "xx (xx.x%)" "xx (xx.x%)" "xx (xx.x%)" "xx (xx.x%)" "xx (xx.x%)"
+# max "xx.x" "xx.x" "xx.x" "xx.x" "xx.x"
+# C "xx (xx.x%)" "xx (xx.x%)" "xx (xx.x%)" "xx (xx.x%)" "xx (xx.x%)"
+# max "xx.x" "xx.x" "xx.x" "xx.x" "xx.x"
+# ARM C.S2 All
+# A "xx (xx.x%)" "xx (xx.x%)"
+# max "xx.x" "xx.x"
+# B "xx (xx.x%)" "xx (xx.x%)"
+# max "xx.x" "xx.x"
+# C "xx (xx.x%)" "xx (xx.x%)"
+# max "xx.x" "xx.x"
+Knowing the structure of an rtable
object is helpful for
+retrieving specific values from the table. For examples, see the Path
+Based Cell Value Accessing section of the Subsetting and
+Manipulating Table Contents vignette.
Understanding table structure is also important for post-processing +processes such as sorting and pruning. More details on this are covered +in the Pruning +and Sorting Tables vignette vignette.
+vignettes/manual_table_construction.Rmd
+ manual_table_construction.Rmd
The main functions currently associated with rtable
s
+are
Tables in rtables
can be constructed via the layout or
+rtabulate
tabulation frameworks or also manually. Currently
+manual table construction is the only way to define column spans. The
+main functions for manual table constructions are:
+tbl <- rtable(
+ header = c("Treatement\nN=100", "Comparison\nN=300"),
+ format = "xx (xx.xx%)",
+ rrow("A", c(104, .2), c(100, .4)),
+ rrow("B", c(23, .4), c(43, .5)),
+ rrow(),
+ rrow("this is a very long section header"),
+ rrow("estimate", rcell(55.23, "xx.xx", colspan = 2)),
+ rrow("95% CI", indent = 1, rcell(c(44.8, 67.4), format = "(xx.x, xx.x)", colspan = 2))
+)
Before we go into explaining the individual components used to create
+this table we continue with the html conversion of the
+rtable()
object:
+as_html(tbl, width = "80%")
+ | Treatement | +Comparison | +
---|---|---|
+ | N=100 | +N=300 | +
A | +104 (20.00%) | +100 (40.00%) | +
B | +23 (40.00%) | +43 (50.00%) | +
+ | + | + |
this is a very long section header | ++ | + |
estimate | +55.23 | +|
95% CI | +(44.8, 67.4) | +
Next, the [
operator lets you access the cell
+content.
+tbl[1, 1]
# Treatement
+# N=100
+# ————————————————
+# A 104 (20.00%)
+and to format that cell run format_rcell(tbl[1,1])
=.
Note that tbl[6, 1]
and tbl[6, 2]
display
+both the same rcell
because of the
+colspan
.
vignettes/sorting_pruning.Rmd
+ sorting_pruning.Rmd
Often we want to filter or reorder elements of a table in ways that +take into account the table structure. For example:
+
+library(rtables)
+library(dplyr)
+
+raw_lyt <- basic_table() %>%
+ split_cols_by("ARM") %>%
+ split_cols_by("SEX") %>%
+ split_rows_by("RACE") %>%
+ summarize_row_groups() %>%
+ split_rows_by("STRATA1") %>%
+ summarize_row_groups() %>%
+ analyze("AGE")
+
+raw_tbl <- build_table(raw_lyt, DM)
+raw_tbl
# A: Drug X B: Placebo C: Combination
+# F M U UNDIFFERENTIATED F M U UNDIFFERENTIATED F M U UNDIFFERENTIATED
+# ——————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————
+# ASIAN 44 (62.9%) 35 (68.6%) 0 (NA%) 0 (NA%) 37 (66.1%) 31 (62.0%) 0 (NA%) 0 (NA%) 40 (65.6%) 44 (64.7%) 0 (NA%) 0 (NA%)
+# A 15 (21.4%) 12 (23.5%) 0 (NA%) 0 (NA%) 14 (25.0%) 6 (12.0%) 0 (NA%) 0 (NA%) 15 (24.6%) 16 (23.5%) 0 (NA%) 0 (NA%)
+# Mean 30.40 34.42 NA NA 35.43 30.33 NA NA 37.40 36.25 NA NA
+# B 16 (22.9%) 8 (15.7%) 0 (NA%) 0 (NA%) 13 (23.2%) 16 (32.0%) 0 (NA%) 0 (NA%) 10 (16.4%) 12 (17.6%) 0 (NA%) 0 (NA%)
+# Mean 33.75 34.88 NA NA 32.46 30.94 NA NA 33.30 35.92 NA NA
+# C 13 (18.6%) 15 (29.4%) 0 (NA%) 0 (NA%) 10 (17.9%) 9 (18.0%) 0 (NA%) 0 (NA%) 15 (24.6%) 16 (23.5%) 0 (NA%) 0 (NA%)
+# Mean 36.92 35.60 NA NA 34.00 31.89 NA NA 33.47 31.38 NA NA
+# BLACK OR AFRICAN AMERICAN 18 (25.7%) 10 (19.6%) 0 (NA%) 0 (NA%) 12 (21.4%) 12 (24.0%) 0 (NA%) 0 (NA%) 13 (21.3%) 14 (20.6%) 0 (NA%) 0 (NA%)
+# A 5 (7.1%) 1 (2.0%) 0 (NA%) 0 (NA%) 5 (8.9%) 2 (4.0%) 0 (NA%) 0 (NA%) 4 (6.6%) 4 (5.9%) 0 (NA%) 0 (NA%)
+# Mean 31.20 33.00 NA NA 28.00 30.00 NA NA 30.75 36.50 NA NA
+# B 7 (10.0%) 3 (5.9%) 0 (NA%) 0 (NA%) 3 (5.4%) 3 (6.0%) 0 (NA%) 0 (NA%) 6 (9.8%) 6 (8.8%) 0 (NA%) 0 (NA%)
+# Mean 36.14 34.33 NA NA 29.67 32.00 NA NA 36.33 31.00 NA NA
+# C 6 (8.6%) 6 (11.8%) 0 (NA%) 0 (NA%) 4 (7.1%) 7 (14.0%) 0 (NA%) 0 (NA%) 3 (4.9%) 4 (5.9%) 0 (NA%) 0 (NA%)
+# Mean 31.33 39.67 NA NA 34.50 34.00 NA NA 33.00 36.50 NA NA
+# WHITE 8 (11.4%) 6 (11.8%) 0 (NA%) 0 (NA%) 7 (12.5%) 7 (14.0%) 0 (NA%) 0 (NA%) 8 (13.1%) 10 (14.7%) 0 (NA%) 0 (NA%)
+# A 2 (2.9%) 1 (2.0%) 0 (NA%) 0 (NA%) 3 (5.4%) 3 (6.0%) 0 (NA%) 0 (NA%) 1 (1.6%) 5 (7.4%) 0 (NA%) 0 (NA%)
+# Mean 34.00 45.00 NA NA 29.33 33.33 NA NA 35.00 32.80 NA NA
+# B 4 (5.7%) 3 (5.9%) 0 (NA%) 0 (NA%) 1 (1.8%) 4 (8.0%) 0 (NA%) 0 (NA%) 3 (4.9%) 1 (1.5%) 0 (NA%) 0 (NA%)
+# Mean 37.00 43.67 NA NA 48.00 36.75 NA NA 34.33 36.00 NA NA
+# C 2 (2.9%) 2 (3.9%) 0 (NA%) 0 (NA%) 3 (5.4%) 0 (0.0%) 0 (NA%) 0 (NA%) 4 (6.6%) 4 (5.9%) 0 (NA%) 0 (NA%)
+# Mean 35.50 44.00 NA NA 44.67 NA NA NA 38.50 35.00 NA NA
+# AMERICAN INDIAN OR ALASKA NATIVE 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%)
+# A 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%)
+# Mean NA NA NA NA NA NA NA NA NA NA NA NA
+# B 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%)
+# Mean NA NA NA NA NA NA NA NA NA NA NA NA
+# C 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%)
+# Mean NA NA NA NA NA NA NA NA NA NA NA NA
+# MULTIPLE 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%)
+# A 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%)
+# Mean NA NA NA NA NA NA NA NA NA NA NA NA
+# B 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%)
+# Mean NA NA NA NA NA NA NA NA NA NA NA NA
+# C 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%)
+# Mean NA NA NA NA NA NA NA NA NA NA NA NA
+# NATIVE HAWAIIAN OR OTHER PACIFIC ISLANDER 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%)
+# A 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%)
+# Mean NA NA NA NA NA NA NA NA NA NA NA NA
+# B 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%)
+# Mean NA NA NA NA NA NA NA NA NA NA NA NA
+# C 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%)
+# Mean NA NA NA NA NA NA NA NA NA NA NA NA
+# OTHER 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%)
+# A 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%)
+# Mean NA NA NA NA NA NA NA NA NA NA NA NA
+# B 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%)
+# Mean NA NA NA NA NA NA NA NA NA NA NA NA
+# C 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%)
+# Mean NA NA NA NA NA NA NA NA NA NA NA NA
+# UNKNOWN 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%)
+# A 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%)
+# Mean NA NA NA NA NA NA NA NA NA NA NA NA
+# B 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%)
+# Mean NA NA NA NA NA NA NA NA NA NA NA NA
+# C 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%) 0 (0.0%) 0 (0.0%) 0 (NA%) 0 (NA%)
+# Mean NA NA NA NA NA NA NA NA NA NA NA NA
+Trimming represents a convenience wrapper around simple, direct
+subsetting of the rows of a TableTree
.
We use the trim_rows()
function with our table and a
+criteria function. All rows where the criteria function returns
+TRUE
will be removed, and all others will be retained.
NOTE: Each row is kept or removed completely +independently, with no awareness of the surrounding structure. This +means, for example, that a subtree could have all its analysis rows +removed and not be removed itself. For structure-aware filtering of a +table, we will use pruning described in the next section.
+A trimming function accepts a TableRow
object
+and returns TRUE
if the row should be removed.
The default trimming function removes rows in which all columns have
+no values in them, i.e. that have all NA
values or all
+0
values:
+trim_rows(raw_tbl)
# A: Drug X B: Placebo C: Combination
+# F M U UNDIFFERENTIATED F M U UNDIFFERENTIATED F M U UNDIFFERENTIATED
+# ——————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————
+# ASIAN 44 (62.9%) 35 (68.6%) 0 (NA%) 0 (NA%) 37 (66.1%) 31 (62.0%) 0 (NA%) 0 (NA%) 40 (65.6%) 44 (64.7%) 0 (NA%) 0 (NA%)
+# A 15 (21.4%) 12 (23.5%) 0 (NA%) 0 (NA%) 14 (25.0%) 6 (12.0%) 0 (NA%) 0 (NA%) 15 (24.6%) 16 (23.5%) 0 (NA%) 0 (NA%)
+# Mean 30.40 34.42 NA NA 35.43 30.33 NA NA 37.40 36.25 NA NA
+# B 16 (22.9%) 8 (15.7%) 0 (NA%) 0 (NA%) 13 (23.2%) 16 (32.0%) 0 (NA%) 0 (NA%) 10 (16.4%) 12 (17.6%) 0 (NA%) 0 (NA%)
+# Mean 33.75 34.88 NA NA 32.46 30.94 NA NA 33.30 35.92 NA NA
+# C 13 (18.6%) 15 (29.4%) 0 (NA%) 0 (NA%) 10 (17.9%) 9 (18.0%) 0 (NA%) 0 (NA%) 15 (24.6%) 16 (23.5%) 0 (NA%) 0 (NA%)
+# Mean 36.92 35.60 NA NA 34.00 31.89 NA NA 33.47 31.38 NA NA
+# BLACK OR AFRICAN AMERICAN 18 (25.7%) 10 (19.6%) 0 (NA%) 0 (NA%) 12 (21.4%) 12 (24.0%) 0 (NA%) 0 (NA%) 13 (21.3%) 14 (20.6%) 0 (NA%) 0 (NA%)
+# A 5 (7.1%) 1 (2.0%) 0 (NA%) 0 (NA%) 5 (8.9%) 2 (4.0%) 0 (NA%) 0 (NA%) 4 (6.6%) 4 (5.9%) 0 (NA%) 0 (NA%)
+# Mean 31.20 33.00 NA NA 28.00 30.00 NA NA 30.75 36.50 NA NA
+# B 7 (10.0%) 3 (5.9%) 0 (NA%) 0 (NA%) 3 (5.4%) 3 (6.0%) 0 (NA%) 0 (NA%) 6 (9.8%) 6 (8.8%) 0 (NA%) 0 (NA%)
+# Mean 36.14 34.33 NA NA 29.67 32.00 NA NA 36.33 31.00 NA NA
+# C 6 (8.6%) 6 (11.8%) 0 (NA%) 0 (NA%) 4 (7.1%) 7 (14.0%) 0 (NA%) 0 (NA%) 3 (4.9%) 4 (5.9%) 0 (NA%) 0 (NA%)
+# Mean 31.33 39.67 NA NA 34.50 34.00 NA NA 33.00 36.50 NA NA
+# WHITE 8 (11.4%) 6 (11.8%) 0 (NA%) 0 (NA%) 7 (12.5%) 7 (14.0%) 0 (NA%) 0 (NA%) 8 (13.1%) 10 (14.7%) 0 (NA%) 0 (NA%)
+# A 2 (2.9%) 1 (2.0%) 0 (NA%) 0 (NA%) 3 (5.4%) 3 (6.0%) 0 (NA%) 0 (NA%) 1 (1.6%) 5 (7.4%) 0 (NA%) 0 (NA%)
+# Mean 34.00 45.00 NA NA 29.33 33.33 NA NA 35.00 32.80 NA NA
+# B 4 (5.7%) 3 (5.9%) 0 (NA%) 0 (NA%) 1 (1.8%) 4 (8.0%) 0 (NA%) 0 (NA%) 3 (4.9%) 1 (1.5%) 0 (NA%) 0 (NA%)
+# Mean 37.00 43.67 NA NA 48.00 36.75 NA NA 34.33 36.00 NA NA
+# C 2 (2.9%) 2 (3.9%) 0 (NA%) 0 (NA%) 3 (5.4%) 0 (0.0%) 0 (NA%) 0 (NA%) 4 (6.6%) 4 (5.9%) 0 (NA%) 0 (NA%)
+# Mean 35.50 44.00 NA NA 44.67 NA NA NA 38.50 35.00 NA NA
+There are currently no special utilities for trimming columns but we
+can remove the empty columns with fairly straightforward column
+subsetting using the col_counts()
function:
+coltrimmed <- raw_tbl[, col_counts(raw_tbl) > 0]
# Note: method with signature 'VTableTree#missing#ANY' chosen for function '[',
+# target signature 'TableTree#missing#logical'.
+# "VTableTree#ANY#logical" would also be valid
+
+h_coltrimmed <- head(coltrimmed, n = 14)
+h_coltrimmed
# A: Drug X B: Placebo C: Combination
+# F M F M F M
+# ———————————————————————————————————————————————————————————————————————————————————————————————————————
+# ASIAN 44 (62.9%) 35 (68.6%) 37 (66.1%) 31 (62.0%) 40 (65.6%) 44 (64.7%)
+# A 15 (21.4%) 12 (23.5%) 14 (25.0%) 6 (12.0%) 15 (24.6%) 16 (23.5%)
+# Mean 30.40 34.42 35.43 30.33 37.40 36.25
+# B 16 (22.9%) 8 (15.7%) 13 (23.2%) 16 (32.0%) 10 (16.4%) 12 (17.6%)
+# Mean 33.75 34.88 32.46 30.94 33.30 35.92
+# C 13 (18.6%) 15 (29.4%) 10 (17.9%) 9 (18.0%) 15 (24.6%) 16 (23.5%)
+# Mean 36.92 35.60 34.00 31.89 33.47 31.38
+# BLACK OR AFRICAN AMERICAN 18 (25.7%) 10 (19.6%) 12 (21.4%) 12 (24.0%) 13 (21.3%) 14 (20.6%)
+# A 5 (7.1%) 1 (2.0%) 5 (8.9%) 2 (4.0%) 4 (6.6%) 4 (5.9%)
+# Mean 31.20 33.00 28.00 30.00 30.75 36.50
+# B 7 (10.0%) 3 (5.9%) 3 (5.4%) 3 (6.0%) 6 (9.8%) 6 (8.8%)
+# Mean 36.14 34.33 29.67 32.00 36.33 31.00
+# C 6 (8.6%) 6 (11.8%) 4 (7.1%) 7 (14.0%) 3 (4.9%) 4 (5.9%)
+# Mean 31.33 39.67 34.50 34.00 33.00 36.50
+Now, it is interesting to see how this table is structured:
+
+table_structure(h_coltrimmed)
# [TableTree] RACE
+# [TableTree] ASIAN [cont: 1 x 6]
+# [TableTree] STRATA1
+# [TableTree] A [cont: 1 x 6]
+# [ElementaryTable] AGE (1 x 6)
+# [TableTree] B [cont: 1 x 6]
+# [ElementaryTable] AGE (1 x 6)
+# [TableTree] C [cont: 1 x 6]
+# [ElementaryTable] AGE (1 x 6)
+# [TableTree] BLACK OR AFRICAN AMERICAN [cont: 1 x 6]
+# [TableTree] STRATA1
+# [TableTree] A [cont: 1 x 6]
+# [ElementaryTable] AGE (1 x 6)
+# [TableTree] B [cont: 1 x 6]
+# [ElementaryTable] AGE (1 x 6)
+# [TableTree] C [cont: 1 x 6]
+# [ElementaryTable] AGE (1 x 6)
+For a deeper understanding of the fundamental structures in
+rtables
, we suggest taking a look at slides 69-76 of this
+Slide
+deck.
In brief, it is important to notice how [TableTree] RACE
+is the root of the table that is split (with
+split_rows_by("RACE") %>%
) into two subtables:
+[TableTree] ASIAN [cont: 1 x 6]
and
+[TableTree] BLACK OR AFRICAN AMERICAN [cont: 1 x 6]
. These
+are then “described” with summarize_row_groups() %>%
,
+which creates for every split a “content” table containing 1 row (the 1
+in cont: 1 x 6
), which when rendered takes the place of
+LabelRow
.
Each of these two subtables then contain a STRATA1
+table, representing the further split_rows_by("STRATA1")
in
+the layout, which, similar to the RACE
table, is split into
+subtables: one for each strata which have similar content tables; Each
+individual strata subtable, then, contains an
+ElementaryTable
(whose children are individual rows)
+generated by the analyze("AGE")
layout directive,
+i.e. [ElementaryTable] AGE (1 x 6)
.
This subtable and row structure is very important for both sorting
+and pruning; values in “content” (ContentRow
) and “value”
+(DataRow
) rows use different access functions and they
+should be treated differently.
Another interesting function that can be used to understand the +connection between row names and their representational path is the +following:
+
+row_paths_summary(h_coltrimmed)
# rowname node_class path
+# ———————————————————————————————————————————————————————————————————————————————————————————————————————————————
+# ASIAN ContentRow RACE, ASIAN, @content, ASIAN
+# A ContentRow RACE, ASIAN, STRATA1, A, @content, A
+# Mean DataRow RACE, ASIAN, STRATA1, A, AGE, Mean
+# B ContentRow RACE, ASIAN, STRATA1, B, @content, B
+# Mean DataRow RACE, ASIAN, STRATA1, B, AGE, Mean
+# C ContentRow RACE, ASIAN, STRATA1, C, @content, C
+# Mean DataRow RACE, ASIAN, STRATA1, C, AGE, Mean
+# BLACK OR AFRICAN AMERICAN ContentRow RACE, BLACK OR AFRICAN AMERICAN, @content, BLACK OR AFRICAN AMERICAN
+# A ContentRow RACE, BLACK OR AFRICAN AMERICAN, STRATA1, A, @content, A
+# Mean DataRow RACE, BLACK OR AFRICAN AMERICAN, STRATA1, A, AGE, Mean
+# B ContentRow RACE, BLACK OR AFRICAN AMERICAN, STRATA1, B, @content, B
+# Mean DataRow RACE, BLACK OR AFRICAN AMERICAN, STRATA1, B, AGE, Mean
+# C ContentRow RACE, BLACK OR AFRICAN AMERICAN, STRATA1, C, @content, C
+# Mean DataRow RACE, BLACK OR AFRICAN AMERICAN, STRATA1, C, AGE, Mean
+Pruning is similar in outcome to trimming, but more powerful and more +complex, as it takes structure into account.
+Pruning is applied recursively, in that at each structural unit +(subtable, row) it applies the pruning function both at that level and +to all it’s children (up to a user-specifiable maximum depth).
+The default pruning function, for example, determines if a subtree is +empty by:
+NA
sNA
s
+pruned <- prune_table(coltrimmed)
+pruned
# A: Drug X B: Placebo C: Combination
+# F M F M F M
+# ———————————————————————————————————————————————————————————————————————————————————————————————————————
+# ASIAN 44 (62.9%) 35 (68.6%) 37 (66.1%) 31 (62.0%) 40 (65.6%) 44 (64.7%)
+# A 15 (21.4%) 12 (23.5%) 14 (25.0%) 6 (12.0%) 15 (24.6%) 16 (23.5%)
+# Mean 30.40 34.42 35.43 30.33 37.40 36.25
+# B 16 (22.9%) 8 (15.7%) 13 (23.2%) 16 (32.0%) 10 (16.4%) 12 (17.6%)
+# Mean 33.75 34.88 32.46 30.94 33.30 35.92
+# C 13 (18.6%) 15 (29.4%) 10 (17.9%) 9 (18.0%) 15 (24.6%) 16 (23.5%)
+# Mean 36.92 35.60 34.00 31.89 33.47 31.38
+# BLACK OR AFRICAN AMERICAN 18 (25.7%) 10 (19.6%) 12 (21.4%) 12 (24.0%) 13 (21.3%) 14 (20.6%)
+# A 5 (7.1%) 1 (2.0%) 5 (8.9%) 2 (4.0%) 4 (6.6%) 4 (5.9%)
+# Mean 31.20 33.00 28.00 30.00 30.75 36.50
+# B 7 (10.0%) 3 (5.9%) 3 (5.4%) 3 (6.0%) 6 (9.8%) 6 (8.8%)
+# Mean 36.14 34.33 29.67 32.00 36.33 31.00
+# C 6 (8.6%) 6 (11.8%) 4 (7.1%) 7 (14.0%) 3 (4.9%) 4 (5.9%)
+# Mean 31.33 39.67 34.50 34.00 33.00 36.50
+# WHITE 8 (11.4%) 6 (11.8%) 7 (12.5%) 7 (14.0%) 8 (13.1%) 10 (14.7%)
+# A 2 (2.9%) 1 (2.0%) 3 (5.4%) 3 (6.0%) 1 (1.6%) 5 (7.4%)
+# Mean 34.00 45.00 29.33 33.33 35.00 32.80
+# B 4 (5.7%) 3 (5.9%) 1 (1.8%) 4 (8.0%) 3 (4.9%) 1 (1.5%)
+# Mean 37.00 43.67 48.00 36.75 34.33 36.00
+# C 2 (2.9%) 2 (3.9%) 3 (5.4%) 0 (0.0%) 4 (6.6%) 4 (5.9%)
+# Mean 35.50 44.00 44.67 NA 38.50 35.00
+We can also use the low_obs_pruner()
pruning function
+constructor to create a pruning function which removes subtrees with
+content summaries whose first entries for each column sum or average are
+below a specified number. (In the default summaries the first entry per
+column is the count).
+pruned2 <- prune_table(coltrimmed, low_obs_pruner(10, "mean"))
+pruned2
# A: Drug X B: Placebo C: Combination
+# F M F M F M
+# ——————————————————————————————————————————————————————————————————————————————————————
+# ASIAN 44 (62.9%) 35 (68.6%) 37 (66.1%) 31 (62.0%) 40 (65.6%) 44 (64.7%)
+# A 15 (21.4%) 12 (23.5%) 14 (25.0%) 6 (12.0%) 15 (24.6%) 16 (23.5%)
+# Mean 30.40 34.42 35.43 30.33 37.40 36.25
+# B 16 (22.9%) 8 (15.7%) 13 (23.2%) 16 (32.0%) 10 (16.4%) 12 (17.6%)
+# Mean 33.75 34.88 32.46 30.94 33.30 35.92
+# C 13 (18.6%) 15 (29.4%) 10 (17.9%) 9 (18.0%) 15 (24.6%) 16 (23.5%)
+# Mean 36.92 35.60 34.00 31.89 33.47 31.38
+Note that because the pruning is being applied recursively, only the
+ASIAN
subtree remains because even though the full
+BLACK OR AFRICAN AMERICAN
subtree encompassed enough
+observations, the strata within it did not. We can take care of this by
+setting the stop_depth
for pruning to 1
.
+pruned3 <- prune_table(coltrimmed, low_obs_pruner(10, "sum"), stop_depth = 1)
+pruned3
# A: Drug X B: Placebo C: Combination
+# F M F M F M
+# ———————————————————————————————————————————————————————————————————————————————————————————————————————
+# ASIAN 44 (62.9%) 35 (68.6%) 37 (66.1%) 31 (62.0%) 40 (65.6%) 44 (64.7%)
+# A 15 (21.4%) 12 (23.5%) 14 (25.0%) 6 (12.0%) 15 (24.6%) 16 (23.5%)
+# Mean 30.40 34.42 35.43 30.33 37.40 36.25
+# B 16 (22.9%) 8 (15.7%) 13 (23.2%) 16 (32.0%) 10 (16.4%) 12 (17.6%)
+# Mean 33.75 34.88 32.46 30.94 33.30 35.92
+# C 13 (18.6%) 15 (29.4%) 10 (17.9%) 9 (18.0%) 15 (24.6%) 16 (23.5%)
+# Mean 36.92 35.60 34.00 31.89 33.47 31.38
+# BLACK OR AFRICAN AMERICAN 18 (25.7%) 10 (19.6%) 12 (21.4%) 12 (24.0%) 13 (21.3%) 14 (20.6%)
+# A 5 (7.1%) 1 (2.0%) 5 (8.9%) 2 (4.0%) 4 (6.6%) 4 (5.9%)
+# Mean 31.20 33.00 28.00 30.00 30.75 36.50
+# B 7 (10.0%) 3 (5.9%) 3 (5.4%) 3 (6.0%) 6 (9.8%) 6 (8.8%)
+# Mean 36.14 34.33 29.67 32.00 36.33 31.00
+# C 6 (8.6%) 6 (11.8%) 4 (7.1%) 7 (14.0%) 3 (4.9%) 4 (5.9%)
+# Mean 31.33 39.67 34.50 34.00 33.00 36.50
+# WHITE 8 (11.4%) 6 (11.8%) 7 (12.5%) 7 (14.0%) 8 (13.1%) 10 (14.7%)
+# A 2 (2.9%) 1 (2.0%) 3 (5.4%) 3 (6.0%) 1 (1.6%) 5 (7.4%)
+# Mean 34.00 45.00 29.33 33.33 35.00 32.80
+# B 4 (5.7%) 3 (5.9%) 1 (1.8%) 4 (8.0%) 3 (4.9%) 1 (1.5%)
+# Mean 37.00 43.67 48.00 36.75 34.33 36.00
+# C 2 (2.9%) 2 (3.9%) 3 (5.4%) 0 (0.0%) 4 (6.6%) 4 (5.9%)
+# Mean 35.50 44.00 44.67 NA 38.50 35.00
+We can also see that pruning to a lower number of observations, say,
+to a total of 16
, with no stop_depth
removes
+some but not all of the strata from our third race
+(WHITE
).
+pruned4 <- prune_table(coltrimmed, low_obs_pruner(16, "sum"))
+pruned4
# A: Drug X B: Placebo C: Combination
+# F M F M F M
+# ———————————————————————————————————————————————————————————————————————————————————————————————————————
+# ASIAN 44 (62.9%) 35 (68.6%) 37 (66.1%) 31 (62.0%) 40 (65.6%) 44 (64.7%)
+# A 15 (21.4%) 12 (23.5%) 14 (25.0%) 6 (12.0%) 15 (24.6%) 16 (23.5%)
+# Mean 30.40 34.42 35.43 30.33 37.40 36.25
+# B 16 (22.9%) 8 (15.7%) 13 (23.2%) 16 (32.0%) 10 (16.4%) 12 (17.6%)
+# Mean 33.75 34.88 32.46 30.94 33.30 35.92
+# C 13 (18.6%) 15 (29.4%) 10 (17.9%) 9 (18.0%) 15 (24.6%) 16 (23.5%)
+# Mean 36.92 35.60 34.00 31.89 33.47 31.38
+# BLACK OR AFRICAN AMERICAN 18 (25.7%) 10 (19.6%) 12 (21.4%) 12 (24.0%) 13 (21.3%) 14 (20.6%)
+# A 5 (7.1%) 1 (2.0%) 5 (8.9%) 2 (4.0%) 4 (6.6%) 4 (5.9%)
+# Mean 31.20 33.00 28.00 30.00 30.75 36.50
+# B 7 (10.0%) 3 (5.9%) 3 (5.4%) 3 (6.0%) 6 (9.8%) 6 (8.8%)
+# Mean 36.14 34.33 29.67 32.00 36.33 31.00
+# C 6 (8.6%) 6 (11.8%) 4 (7.1%) 7 (14.0%) 3 (4.9%) 4 (5.9%)
+# Mean 31.33 39.67 34.50 34.00 33.00 36.50
+# WHITE 8 (11.4%) 6 (11.8%) 7 (12.5%) 7 (14.0%) 8 (13.1%) 10 (14.7%)
+# B 4 (5.7%) 3 (5.9%) 1 (1.8%) 4 (8.0%) 3 (4.9%) 1 (1.5%)
+# Mean 37.00 43.67 48.00 36.75 34.33 36.00
+Sorting of an rtables
table is done at a
+path, meaning a sort operation will occur at a particular
+location within the table, and the direct children of the
+element at that path will be reordered. This occurs whether those
+children are subtables themselves, or individual rows. Sorting is done
+via the sort_at_path()
function, which accepts both a (row)
+path and a scoring function.
A score function accepts a subtree or TableRow
+and returns a single orderable (typically numeric) value. Within the
+subtable currently being sorted, the children are then reordered by the
+value of the score function. Importantly, “content”
+(ContentRow
) and “values” (DataRow
) need to be
+treated differently in the scoring function as they are retrieved: the
+content of a subtable is retrieved via the
+content _table
accessor.
The cont_n_allcols()
scoring function provided by
+rtables
, works by scoring subtables by the sum of the first
+elements in the first row of the subtable’s content table. Note
+that this function fails if the child being scored does not have a
+content function (i.e., if summarize_row_groups()
was not
+used at the corresponding point in the layout). We can see this in it’s
+definition, below:
+cont_n_allcols
# function (tt)
+# {
+# ctab <- content_table(tt)
+# if (NROW(ctab) == 0) {
+# stop("cont_n_allcols score function used at subtable [",
+# obj_name(tt), "] that has no content table.")
+# }
+# sum(sapply(row_values(tree_children(ctab)[[1]]), function(cv) cv[1]))
+# }
+# <bytecode: 0x55b1d6d50778>
+# <environment: namespace:rtables>
+Therefore, a fundamental difference between pruning and sorting is +that sorting occurs at particular places in the table, as defined by a +path.
+For example, we can sort the strata values (ContentRow
)
+by observation counts within just the ASIAN
subtable:
+sort_at_path(pruned, path = c("RACE", "ASIAN", "STRATA1"), scorefun = cont_n_allcols)
# A: Drug X B: Placebo C: Combination
+# F M F M F M
+# ———————————————————————————————————————————————————————————————————————————————————————————————————————
+# ASIAN 44 (62.9%) 35 (68.6%) 37 (66.1%) 31 (62.0%) 40 (65.6%) 44 (64.7%)
+# A 15 (21.4%) 12 (23.5%) 14 (25.0%) 6 (12.0%) 15 (24.6%) 16 (23.5%)
+# Mean 30.40 34.42 35.43 30.33 37.40 36.25
+# C 13 (18.6%) 15 (29.4%) 10 (17.9%) 9 (18.0%) 15 (24.6%) 16 (23.5%)
+# Mean 36.92 35.60 34.00 31.89 33.47 31.38
+# B 16 (22.9%) 8 (15.7%) 13 (23.2%) 16 (32.0%) 10 (16.4%) 12 (17.6%)
+# Mean 33.75 34.88 32.46 30.94 33.30 35.92
+# BLACK OR AFRICAN AMERICAN 18 (25.7%) 10 (19.6%) 12 (21.4%) 12 (24.0%) 13 (21.3%) 14 (20.6%)
+# A 5 (7.1%) 1 (2.0%) 5 (8.9%) 2 (4.0%) 4 (6.6%) 4 (5.9%)
+# Mean 31.20 33.00 28.00 30.00 30.75 36.50
+# B 7 (10.0%) 3 (5.9%) 3 (5.4%) 3 (6.0%) 6 (9.8%) 6 (8.8%)
+# Mean 36.14 34.33 29.67 32.00 36.33 31.00
+# C 6 (8.6%) 6 (11.8%) 4 (7.1%) 7 (14.0%) 3 (4.9%) 4 (5.9%)
+# Mean 31.33 39.67 34.50 34.00 33.00 36.50
+# WHITE 8 (11.4%) 6 (11.8%) 7 (12.5%) 7 (14.0%) 8 (13.1%) 10 (14.7%)
+# A 2 (2.9%) 1 (2.0%) 3 (5.4%) 3 (6.0%) 1 (1.6%) 5 (7.4%)
+# Mean 34.00 45.00 29.33 33.33 35.00 32.80
+# B 4 (5.7%) 3 (5.9%) 1 (1.8%) 4 (8.0%) 3 (4.9%) 1 (1.5%)
+# Mean 37.00 43.67 48.00 36.75 34.33 36.00
+# C 2 (2.9%) 2 (3.9%) 3 (5.4%) 0 (0.0%) 4 (6.6%) 4 (5.9%)
+# Mean 35.50 44.00 44.67 NA 38.50 35.00
+
+# B and C are swapped as the global count (sum of all column counts) of strata C is higher than the one of strata B
Unlike other uses of pathing (currently), a sorting path can contain
+“*“. This indicates that the children of each subtable matching he
+*
element of the path should be sorted
+separately as indicated by the remainder of the path
+after the *
and the score function.
Thus we can extend our sorting of strata within the
+ASIAN
subtable to all race-specific subtables by using the
+wildcard:
+sort_at_path(pruned, path = c("RACE", "*", "STRATA1"), scorefun = cont_n_allcols)
# A: Drug X B: Placebo C: Combination
+# F M F M F M
+# ———————————————————————————————————————————————————————————————————————————————————————————————————————
+# ASIAN 44 (62.9%) 35 (68.6%) 37 (66.1%) 31 (62.0%) 40 (65.6%) 44 (64.7%)
+# A 15 (21.4%) 12 (23.5%) 14 (25.0%) 6 (12.0%) 15 (24.6%) 16 (23.5%)
+# Mean 30.40 34.42 35.43 30.33 37.40 36.25
+# C 13 (18.6%) 15 (29.4%) 10 (17.9%) 9 (18.0%) 15 (24.6%) 16 (23.5%)
+# Mean 36.92 35.60 34.00 31.89 33.47 31.38
+# B 16 (22.9%) 8 (15.7%) 13 (23.2%) 16 (32.0%) 10 (16.4%) 12 (17.6%)
+# Mean 33.75 34.88 32.46 30.94 33.30 35.92
+# BLACK OR AFRICAN AMERICAN 18 (25.7%) 10 (19.6%) 12 (21.4%) 12 (24.0%) 13 (21.3%) 14 (20.6%)
+# C 6 (8.6%) 6 (11.8%) 4 (7.1%) 7 (14.0%) 3 (4.9%) 4 (5.9%)
+# Mean 31.33 39.67 34.50 34.00 33.00 36.50
+# B 7 (10.0%) 3 (5.9%) 3 (5.4%) 3 (6.0%) 6 (9.8%) 6 (8.8%)
+# Mean 36.14 34.33 29.67 32.00 36.33 31.00
+# A 5 (7.1%) 1 (2.0%) 5 (8.9%) 2 (4.0%) 4 (6.6%) 4 (5.9%)
+# Mean 31.20 33.00 28.00 30.00 30.75 36.50
+# WHITE 8 (11.4%) 6 (11.8%) 7 (12.5%) 7 (14.0%) 8 (13.1%) 10 (14.7%)
+# B 4 (5.7%) 3 (5.9%) 1 (1.8%) 4 (8.0%) 3 (4.9%) 1 (1.5%)
+# Mean 37.00 43.67 48.00 36.75 34.33 36.00
+# A 2 (2.9%) 1 (2.0%) 3 (5.4%) 3 (6.0%) 1 (1.6%) 5 (7.4%)
+# Mean 34.00 45.00 29.33 33.33 35.00 32.80
+# C 2 (2.9%) 2 (3.9%) 3 (5.4%) 0 (0.0%) 4 (6.6%) 4 (5.9%)
+# Mean 35.50 44.00 44.67 NA 38.50 35.00
+
+# All subtables, i.e. ASIAN, BLACK..., and WHITE, are reordered separately
The above is equivalent to separately calling the following:
+
+tmptbl <- sort_at_path(pruned, path = c("RACE", "ASIAN", "STRATA1"), scorefun = cont_n_allcols)
+tmptbl <- sort_at_path(tmptbl, path = c("RACE", "BLACK OR AFRICAN AMERICAN", "STRATA1"), scorefun = cont_n_allcols)
+tmptbl <- sort_at_path(tmptbl, path = c("RACE", "WHITE", "STRATA1"), scorefun = cont_n_allcols)
+tmptbl
# A: Drug X B: Placebo C: Combination
+# F M F M F M
+# ———————————————————————————————————————————————————————————————————————————————————————————————————————
+# ASIAN 44 (62.9%) 35 (68.6%) 37 (66.1%) 31 (62.0%) 40 (65.6%) 44 (64.7%)
+# A 15 (21.4%) 12 (23.5%) 14 (25.0%) 6 (12.0%) 15 (24.6%) 16 (23.5%)
+# Mean 30.40 34.42 35.43 30.33 37.40 36.25
+# C 13 (18.6%) 15 (29.4%) 10 (17.9%) 9 (18.0%) 15 (24.6%) 16 (23.5%)
+# Mean 36.92 35.60 34.00 31.89 33.47 31.38
+# B 16 (22.9%) 8 (15.7%) 13 (23.2%) 16 (32.0%) 10 (16.4%) 12 (17.6%)
+# Mean 33.75 34.88 32.46 30.94 33.30 35.92
+# BLACK OR AFRICAN AMERICAN 18 (25.7%) 10 (19.6%) 12 (21.4%) 12 (24.0%) 13 (21.3%) 14 (20.6%)
+# C 6 (8.6%) 6 (11.8%) 4 (7.1%) 7 (14.0%) 3 (4.9%) 4 (5.9%)
+# Mean 31.33 39.67 34.50 34.00 33.00 36.50
+# B 7 (10.0%) 3 (5.9%) 3 (5.4%) 3 (6.0%) 6 (9.8%) 6 (8.8%)
+# Mean 36.14 34.33 29.67 32.00 36.33 31.00
+# A 5 (7.1%) 1 (2.0%) 5 (8.9%) 2 (4.0%) 4 (6.6%) 4 (5.9%)
+# Mean 31.20 33.00 28.00 30.00 30.75 36.50
+# WHITE 8 (11.4%) 6 (11.8%) 7 (12.5%) 7 (14.0%) 8 (13.1%) 10 (14.7%)
+# B 4 (5.7%) 3 (5.9%) 1 (1.8%) 4 (8.0%) 3 (4.9%) 1 (1.5%)
+# Mean 37.00 43.67 48.00 36.75 34.33 36.00
+# A 2 (2.9%) 1 (2.0%) 3 (5.4%) 3 (6.0%) 1 (1.6%) 5 (7.4%)
+# Mean 34.00 45.00 29.33 33.33 35.00 32.80
+# C 2 (2.9%) 2 (3.9%) 3 (5.4%) 0 (0.0%) 4 (6.6%) 4 (5.9%)
+# Mean 35.50 44.00 44.67 NA 38.50 35.00
+It is possible to understand better pathing with
+table_structure()
that highlights the tree-like structure
+and the node names:
+table_structure(pruned)
# [TableTree] RACE
+# [TableTree] ASIAN [cont: 1 x 6]
+# [TableTree] STRATA1
+# [TableTree] A [cont: 1 x 6]
+# [ElementaryTable] AGE (1 x 6)
+# [TableTree] B [cont: 1 x 6]
+# [ElementaryTable] AGE (1 x 6)
+# [TableTree] C [cont: 1 x 6]
+# [ElementaryTable] AGE (1 x 6)
+# [TableTree] BLACK OR AFRICAN AMERICAN [cont: 1 x 6]
+# [TableTree] STRATA1
+# [TableTree] A [cont: 1 x 6]
+# [ElementaryTable] AGE (1 x 6)
+# [TableTree] B [cont: 1 x 6]
+# [ElementaryTable] AGE (1 x 6)
+# [TableTree] C [cont: 1 x 6]
+# [ElementaryTable] AGE (1 x 6)
+# [TableTree] WHITE [cont: 1 x 6]
+# [TableTree] STRATA1
+# [TableTree] A [cont: 1 x 6]
+# [ElementaryTable] AGE (1 x 6)
+# [TableTree] B [cont: 1 x 6]
+# [ElementaryTable] AGE (1 x 6)
+# [TableTree] C [cont: 1 x 6]
+# [ElementaryTable] AGE (1 x 6)
+or with row_paths_summary
:
+row_paths_summary(pruned)
# rowname node_class path
+# ———————————————————————————————————————————————————————————————————————————————————————————————————————————————
+# ASIAN ContentRow RACE, ASIAN, @content, ASIAN
+# A ContentRow RACE, ASIAN, STRATA1, A, @content, A
+# Mean DataRow RACE, ASIAN, STRATA1, A, AGE, Mean
+# B ContentRow RACE, ASIAN, STRATA1, B, @content, B
+# Mean DataRow RACE, ASIAN, STRATA1, B, AGE, Mean
+# C ContentRow RACE, ASIAN, STRATA1, C, @content, C
+# Mean DataRow RACE, ASIAN, STRATA1, C, AGE, Mean
+# BLACK OR AFRICAN AMERICAN ContentRow RACE, BLACK OR AFRICAN AMERICAN, @content, BLACK OR AFRICAN AMERICAN
+# A ContentRow RACE, BLACK OR AFRICAN AMERICAN, STRATA1, A, @content, A
+# Mean DataRow RACE, BLACK OR AFRICAN AMERICAN, STRATA1, A, AGE, Mean
+# B ContentRow RACE, BLACK OR AFRICAN AMERICAN, STRATA1, B, @content, B
+# Mean DataRow RACE, BLACK OR AFRICAN AMERICAN, STRATA1, B, AGE, Mean
+# C ContentRow RACE, BLACK OR AFRICAN AMERICAN, STRATA1, C, @content, C
+# Mean DataRow RACE, BLACK OR AFRICAN AMERICAN, STRATA1, C, AGE, Mean
+# WHITE ContentRow RACE, WHITE, @content, WHITE
+# A ContentRow RACE, WHITE, STRATA1, A, @content, A
+# Mean DataRow RACE, WHITE, STRATA1, A, AGE, Mean
+# B ContentRow RACE, WHITE, STRATA1, B, @content, B
+# Mean DataRow RACE, WHITE, STRATA1, B, AGE, Mean
+# C ContentRow RACE, WHITE, STRATA1, C, @content, C
+# Mean DataRow RACE, WHITE, STRATA1, C, AGE, Mean
+Note in the latter we see content rows as those with paths following
+@content
, e.g., ASIAN, @content, ASIAN
. The
+first of these at a given path (i.e.,
+<path>, @content, <>
are the rows which will be
+used by the scoring functions which begin with cont_
.
We can directly sort the ethnicity by observations in increasing +order:
+
+ethsort <- sort_at_path(pruned, path = c("RACE"), scorefun = cont_n_allcols, decreasing = FALSE)
+ethsort
# A: Drug X B: Placebo C: Combination
+# F M F M F M
+# ———————————————————————————————————————————————————————————————————————————————————————————————————————
+# WHITE 8 (11.4%) 6 (11.8%) 7 (12.5%) 7 (14.0%) 8 (13.1%) 10 (14.7%)
+# A 2 (2.9%) 1 (2.0%) 3 (5.4%) 3 (6.0%) 1 (1.6%) 5 (7.4%)
+# Mean 34.00 45.00 29.33 33.33 35.00 32.80
+# B 4 (5.7%) 3 (5.9%) 1 (1.8%) 4 (8.0%) 3 (4.9%) 1 (1.5%)
+# Mean 37.00 43.67 48.00 36.75 34.33 36.00
+# C 2 (2.9%) 2 (3.9%) 3 (5.4%) 0 (0.0%) 4 (6.6%) 4 (5.9%)
+# Mean 35.50 44.00 44.67 NA 38.50 35.00
+# BLACK OR AFRICAN AMERICAN 18 (25.7%) 10 (19.6%) 12 (21.4%) 12 (24.0%) 13 (21.3%) 14 (20.6%)
+# A 5 (7.1%) 1 (2.0%) 5 (8.9%) 2 (4.0%) 4 (6.6%) 4 (5.9%)
+# Mean 31.20 33.00 28.00 30.00 30.75 36.50
+# B 7 (10.0%) 3 (5.9%) 3 (5.4%) 3 (6.0%) 6 (9.8%) 6 (8.8%)
+# Mean 36.14 34.33 29.67 32.00 36.33 31.00
+# C 6 (8.6%) 6 (11.8%) 4 (7.1%) 7 (14.0%) 3 (4.9%) 4 (5.9%)
+# Mean 31.33 39.67 34.50 34.00 33.00 36.50
+# ASIAN 44 (62.9%) 35 (68.6%) 37 (66.1%) 31 (62.0%) 40 (65.6%) 44 (64.7%)
+# A 15 (21.4%) 12 (23.5%) 14 (25.0%) 6 (12.0%) 15 (24.6%) 16 (23.5%)
+# Mean 30.40 34.42 35.43 30.33 37.40 36.25
+# B 16 (22.9%) 8 (15.7%) 13 (23.2%) 16 (32.0%) 10 (16.4%) 12 (17.6%)
+# Mean 33.75 34.88 32.46 30.94 33.30 35.92
+# C 13 (18.6%) 15 (29.4%) 10 (17.9%) 9 (18.0%) 15 (24.6%) 16 (23.5%)
+# Mean 36.92 35.60 34.00 31.89 33.47 31.38
+Within each ethnicity separately, sort the strata by number of
+females in arm C (i.e. column position 5
):
+sort_at_path(pruned, path = c("RACE", "*", "STRATA1"), cont_n_onecol(5))
# A: Drug X B: Placebo C: Combination
+# F M F M F M
+# ———————————————————————————————————————————————————————————————————————————————————————————————————————
+# ASIAN 44 (62.9%) 35 (68.6%) 37 (66.1%) 31 (62.0%) 40 (65.6%) 44 (64.7%)
+# A 15 (21.4%) 12 (23.5%) 14 (25.0%) 6 (12.0%) 15 (24.6%) 16 (23.5%)
+# Mean 30.40 34.42 35.43 30.33 37.40 36.25
+# C 13 (18.6%) 15 (29.4%) 10 (17.9%) 9 (18.0%) 15 (24.6%) 16 (23.5%)
+# Mean 36.92 35.60 34.00 31.89 33.47 31.38
+# B 16 (22.9%) 8 (15.7%) 13 (23.2%) 16 (32.0%) 10 (16.4%) 12 (17.6%)
+# Mean 33.75 34.88 32.46 30.94 33.30 35.92
+# BLACK OR AFRICAN AMERICAN 18 (25.7%) 10 (19.6%) 12 (21.4%) 12 (24.0%) 13 (21.3%) 14 (20.6%)
+# B 7 (10.0%) 3 (5.9%) 3 (5.4%) 3 (6.0%) 6 (9.8%) 6 (8.8%)
+# Mean 36.14 34.33 29.67 32.00 36.33 31.00
+# A 5 (7.1%) 1 (2.0%) 5 (8.9%) 2 (4.0%) 4 (6.6%) 4 (5.9%)
+# Mean 31.20 33.00 28.00 30.00 30.75 36.50
+# C 6 (8.6%) 6 (11.8%) 4 (7.1%) 7 (14.0%) 3 (4.9%) 4 (5.9%)
+# Mean 31.33 39.67 34.50 34.00 33.00 36.50
+# WHITE 8 (11.4%) 6 (11.8%) 7 (12.5%) 7 (14.0%) 8 (13.1%) 10 (14.7%)
+# C 2 (2.9%) 2 (3.9%) 3 (5.4%) 0 (0.0%) 4 (6.6%) 4 (5.9%)
+# Mean 35.50 44.00 44.67 NA 38.50 35.00
+# B 4 (5.7%) 3 (5.9%) 1 (1.8%) 4 (8.0%) 3 (4.9%) 1 (1.5%)
+# Mean 37.00 43.67 48.00 36.75 34.33 36.00
+# A 2 (2.9%) 1 (2.0%) 3 (5.4%) 3 (6.0%) 1 (1.6%) 5 (7.4%)
+# Mean 34.00 45.00 29.33 33.33 35.00 32.80
+When sorting within an analysis subtable (e.g., the subtable +generated when your analysis function generates more than one row per +group of data), the name of that subtable (generally the name of the +variable being analyzed) must appear in the path, even if +the variable label is not displayed when the table is +printed.
+To show the differences between sorting an analysis subtable
+(DataRow
), and a content subtable
+(ContentRow
), we modify and prune (as before) a similar raw
+table as before:
+more_analysis_fnc <- function(x) {
+ in_rows(
+ "median" = median(x),
+ "mean" = mean(x),
+ .formats = "xx.x"
+ )
+}
+
+raw_lyt <- basic_table() %>%
+ split_cols_by("ARM") %>%
+ split_rows_by(
+ "RACE",
+ split_fun = drop_and_remove_levels("WHITE") # dropping WHITE levels
+ ) %>%
+ summarize_row_groups() %>%
+ split_rows_by("STRATA1") %>%
+ summarize_row_groups() %>%
+ analyze("AGE", afun = more_analysis_fnc)
+
+tbl <- build_table(raw_lyt, DM) %>%
+ prune_table() %>%
+ print()
# A: Drug X B: Placebo C: Combination
+# ————————————————————————————————————————————————————————————————————
+# ASIAN 79 (65.3%) 68 (64.2%) 84 (65.1%)
+# A 27 (22.3%) 20 (18.9%) 31 (24.0%)
+# median 30.0 33.0 36.0
+# mean 32.2 33.9 36.8
+# B 24 (19.8%) 29 (27.4%) 22 (17.1%)
+# median 32.5 32.0 34.0
+# mean 34.1 31.6 34.7
+# C 28 (23.1%) 19 (17.9%) 31 (24.0%)
+# median 36.5 34.0 33.0
+# mean 36.2 33.0 32.4
+# BLACK OR AFRICAN AMERICAN 28 (23.1%) 24 (22.6%) 27 (20.9%)
+# A 6 (5.0%) 7 (6.6%) 8 (6.2%)
+# median 32.0 29.0 32.5
+# mean 31.5 28.6 33.6
+# B 10 (8.3%) 6 (5.7%) 12 (9.3%)
+# median 33.0 30.0 33.5
+# mean 35.6 30.8 33.7
+# C 12 (9.9%) 11 (10.4%) 7 (5.4%)
+# median 33.0 36.0 32.0
+# mean 35.5 34.2 35.0
+What should we do now if we want to sort each median and mean in each
+of the strata variables? We need to write a custom score function as the
+ready-made ones at the moment work only with content nodes
+(content_table()
access function for
+cont_n_allcols()
and cont_n_onecol()
, of which
+we will talk in a moment). But before that, we need to think about what
+are we ordering, i.e. we need to specify the right path. We suggest
+looking at the structure first with table_structure()
or
+row_paths_summary()
.
+table_structure(tbl) # Direct inspection into the tree-like structure of rtables
# [TableTree] RACE
+# [TableTree] ASIAN [cont: 1 x 3]
+# [TableTree] STRATA1
+# [TableTree] A [cont: 1 x 3]
+# [ElementaryTable] AGE (2 x 3)
+# [TableTree] B [cont: 1 x 3]
+# [ElementaryTable] AGE (2 x 3)
+# [TableTree] C [cont: 1 x 3]
+# [ElementaryTable] AGE (2 x 3)
+# [TableTree] BLACK OR AFRICAN AMERICAN [cont: 1 x 3]
+# [TableTree] STRATA1
+# [TableTree] A [cont: 1 x 3]
+# [ElementaryTable] AGE (2 x 3)
+# [TableTree] B [cont: 1 x 3]
+# [ElementaryTable] AGE (2 x 3)
+# [TableTree] C [cont: 1 x 3]
+# [ElementaryTable] AGE (2 x 3)
+We see that to order all of the AGE
nodes we need to get
+there with something like this:
+RACE, ASIAN, STRATA1, A, AGE
and no more as the next level
+is what we need to sort. But we see now that this path would sort only
+the first group. We need wildcards:
+RACE, *, STRATA1, *, AGE
.
Now, we have found a way to select relevant paths that we want to
+sort. We want to construct a scoring function that works on the median
+and mean and sort them. To do so, we may want to enter our scoring
+function with browser()
to see what is fed to it and try to
+retrieve the single value that is to be returned to do the sorting. We
+allow the user to experiment with this, while here we show a possible
+solution that considers summing all the column values that are retrieved
+with row_values(tt)
from the subtable that is fed to the
+function itself. Note that any score function should be defined as
+having a subtable tt
as a unique input parameter and a
+single numeric value as output.
+scorefun <- function(tt) {
+ # Here we could use browser()
+ sum(unlist(row_values(tt)))
+}
+sort_at_path(tbl, c("RACE", "*", "STRATA1", "*", "AGE"), scorefun)
# A: Drug X B: Placebo C: Combination
+# ————————————————————————————————————————————————————————————————————
+# ASIAN 79 (65.3%) 68 (64.2%) 84 (65.1%)
+# A 27 (22.3%) 20 (18.9%) 31 (24.0%)
+# mean 32.2 33.9 36.8
+# median 30.0 33.0 36.0
+# B 24 (19.8%) 29 (27.4%) 22 (17.1%)
+# mean 34.1 31.6 34.7
+# median 32.5 32.0 34.0
+# C 28 (23.1%) 19 (17.9%) 31 (24.0%)
+# median 36.5 34.0 33.0
+# mean 36.2 33.0 32.4
+# BLACK OR AFRICAN AMERICAN 28 (23.1%) 24 (22.6%) 27 (20.9%)
+# A 6 (5.0%) 7 (6.6%) 8 (6.2%)
+# mean 31.5 28.6 33.6
+# median 32.0 29.0 32.5
+# B 10 (8.3%) 6 (5.7%) 12 (9.3%)
+# mean 35.6 30.8 33.7
+# median 33.0 30.0 33.5
+# C 12 (9.9%) 11 (10.4%) 7 (5.4%)
+# mean 35.5 34.2 35.0
+# median 33.0 36.0 32.0
+To help the user visualize what is happening in the score function we +show here an example of its exploration from the debugging:
+> sort_at_path(tbl, c("RACE", "*", "STRATA1", "*", "AGE"), scorefun)
+Called from: scorefun(x)
+Browse[1]> tt ### THIS IS THE LEAF LEVEL -> DataRow ###
+[DataRow indent_mod 0]: median 30.0 33.0 36.0
+Browse[1]> row_values(tt) ### Extraction of values -> It will be a named list! ###
+$`A: Drug X`
+[1] 30
+
+$`B: Placebo`
+[1] 33
+
+$`C: Combination`
+[1] 36
+
+Browse[1]> sum(unlist(row_values(tt))) ### Final value we want to give back to sort_at_path ###
+[1] 99
+We can see how powerful and pragmatic it might be to change the
+sorting principles from within the custom scoring function. We show this
+by selecting a specific column to sort. Looking at the pre-defined
+function cont_n_onecol()
gives us an insight into how to
+proceed.
+cont_n_onecol
# function (j)
+# {
+# function(tt) {
+# ctab <- content_table(tt)
+# if (NROW(ctab) == 0) {
+# stop("cont_n_allcols score function used at subtable [",
+# obj_name(tt), "] that has no content table.")
+# }
+# row_values(tree_children(ctab)[[1]])[[j]][1]
+# }
+# }
+# <bytecode: 0x55b1d6e545f8>
+# <environment: namespace:rtables>
+We see that a similar function to cont_n_allcols()
is
+wrapped by one that allows a parameter j
to be used to
+select a specific column. We will do the same here for selecting which
+column we want to sort.
+scorefun_onecol <- function(colpath) {
+ function(tt) {
+ # Here we could use browser()
+ unlist(cell_values(tt, colpath = colpath), use.names = FALSE)[1] # Modified to lose the list names
+ }
+}
+sort_at_path(tbl, c("RACE", "*", "STRATA1", "*", "AGE"), scorefun_onecol(colpath = c("ARM", "A: Drug X")))
# A: Drug X B: Placebo C: Combination
+# ————————————————————————————————————————————————————————————————————
+# ASIAN 79 (65.3%) 68 (64.2%) 84 (65.1%)
+# A 27 (22.3%) 20 (18.9%) 31 (24.0%)
+# mean 32.2 33.9 36.8
+# median 30.0 33.0 36.0
+# B 24 (19.8%) 29 (27.4%) 22 (17.1%)
+# mean 34.1 31.6 34.7
+# median 32.5 32.0 34.0
+# C 28 (23.1%) 19 (17.9%) 31 (24.0%)
+# median 36.5 34.0 33.0
+# mean 36.2 33.0 32.4
+# BLACK OR AFRICAN AMERICAN 28 (23.1%) 24 (22.6%) 27 (20.9%)
+# A 6 (5.0%) 7 (6.6%) 8 (6.2%)
+# median 32.0 29.0 32.5
+# mean 31.5 28.6 33.6
+# B 10 (8.3%) 6 (5.7%) 12 (9.3%)
+# mean 35.6 30.8 33.7
+# median 33.0 30.0 33.5
+# C 12 (9.9%) 11 (10.4%) 7 (5.4%)
+# mean 35.5 34.2 35.0
+# median 33.0 36.0 32.0
+In the above table we see that the mean and median rows are reordered +by their values in the first column, compared to the raw table, as +desired.
+With this function we can also do the same for columns that are +nested within larger splits:
+
+# Simpler table
+tbl <- basic_table() %>%
+ split_cols_by("ARM") %>%
+ split_cols_by("SEX",
+ split_fun = drop_and_remove_levels(c("U", "UNDIFFERENTIATED"))
+ ) %>%
+ analyze("AGE", afun = more_analysis_fnc) %>%
+ build_table(DM) %>%
+ prune_table() %>%
+ print()
# A: Drug X B: Placebo C: Combination
+# F M F M F M
+# —————————————————————————————————————————————————————————
+# median 32.0 35.0 33.0 31.0 35.0 32.0
+# mean 33.7 36.5 33.8 32.1 34.9 34.3
+
+sort_at_path(tbl, c("AGE"), scorefun_onecol(colpath = c("ARM", "B: Placebo", "SEX", "F")))
# A: Drug X B: Placebo C: Combination
+# F M F M F M
+# —————————————————————————————————————————————————————————
+# mean 33.7 36.5 33.8 32.1 34.9 34.3
+# median 32.0 35.0 33.0 31.0 35.0 32.0
+Pruning criteria and scoring functions map TableTree
or
+TableRow
objects to a Boolean value (for pruning criteria)
+or a sortable scalar value (scoring functions). To do this we currently
+need to interact with the structure of the objects more than usual.
+Indeed, we showed already how sorting can be very complicated if the
+concept of tree-like structure and pathing is not well understood. It is
+important though to have in mind the following functions that can be
+used in each pruning or sorting function to retrieve the relevant
+information from the table.
cell_values()
- Retrieves a named list of a
+TableRow
or TableTree
object’s values
+rowpath
and colpath
to
+restrict which cell values are returnedobj_name()
- Retrieves the name of an object. Note this
+can differ from the label that is displayed (if any is) when printing.
+This will match the element in the path.obj_label()
- Retrieves the display label of an object.
+Note this can differ from the name that appears in the path.content_table()
- Retrieves a TableTree
+object’s content table (which contains its summary rows).tree_children()
- Retrieves a TableTree
+object’s direct children (either subtables, rows or possibly a mix
+thereof, though that should not happen in practice)In this case, for convenience/simplicity, we use the name of the +table element but any logic which returns a single string could be used +here.
+We sort the ethnicity by alphabetical order (in practice undoing our +previous sorting by ethnicity above).
+
+silly_name_scorer <- function(tt) {
+ nm <- obj_name(tt)
+ print(nm)
+ nm
+}
+
+sort_at_path(ethsort, "RACE", silly_name_scorer) # Now, it is sorted alphabetically!
# [1] "WHITE"
+# [1] "BLACK OR AFRICAN AMERICAN"
+# [1] "ASIAN"
+# A: Drug X B: Placebo C: Combination
+# F M F M F M
+# ———————————————————————————————————————————————————————————————————————————————————————————————————————
+# ASIAN 44 (62.9%) 35 (68.6%) 37 (66.1%) 31 (62.0%) 40 (65.6%) 44 (64.7%)
+# A 15 (21.4%) 12 (23.5%) 14 (25.0%) 6 (12.0%) 15 (24.6%) 16 (23.5%)
+# Mean 30.40 34.42 35.43 30.33 37.40 36.25
+# B 16 (22.9%) 8 (15.7%) 13 (23.2%) 16 (32.0%) 10 (16.4%) 12 (17.6%)
+# Mean 33.75 34.88 32.46 30.94 33.30 35.92
+# C 13 (18.6%) 15 (29.4%) 10 (17.9%) 9 (18.0%) 15 (24.6%) 16 (23.5%)
+# Mean 36.92 35.60 34.00 31.89 33.47 31.38
+# BLACK OR AFRICAN AMERICAN 18 (25.7%) 10 (19.6%) 12 (21.4%) 12 (24.0%) 13 (21.3%) 14 (20.6%)
+# A 5 (7.1%) 1 (2.0%) 5 (8.9%) 2 (4.0%) 4 (6.6%) 4 (5.9%)
+# Mean 31.20 33.00 28.00 30.00 30.75 36.50
+# B 7 (10.0%) 3 (5.9%) 3 (5.4%) 3 (6.0%) 6 (9.8%) 6 (8.8%)
+# Mean 36.14 34.33 29.67 32.00 36.33 31.00
+# C 6 (8.6%) 6 (11.8%) 4 (7.1%) 7 (14.0%) 3 (4.9%) 4 (5.9%)
+# Mean 31.33 39.67 34.50 34.00 33.00 36.50
+# WHITE 8 (11.4%) 6 (11.8%) 7 (12.5%) 7 (14.0%) 8 (13.1%) 10 (14.7%)
+# A 2 (2.9%) 1 (2.0%) 3 (5.4%) 3 (6.0%) 1 (1.6%) 5 (7.4%)
+# Mean 34.00 45.00 29.33 33.33 35.00 32.80
+# B 4 (5.7%) 3 (5.9%) 1 (1.8%) 4 (8.0%) 3 (4.9%) 1 (1.5%)
+# Mean 37.00 43.67 48.00 36.75 34.33 36.00
+# C 2 (2.9%) 2 (3.9%) 3 (5.4%) 0 (0.0%) 4 (6.6%) 4 (5.9%)
+# Mean 35.50 44.00 44.67 NA 38.50 35.00
+NOTE: Generally this would be more appropriately
+done using the reorder_split_levels()
function within the
+layout rather than as a sort post-processing step, but other character
+scorers may or may not map as easily to layouting directives.
We need the F and M percents, only for Arm C (i.e. columns 5 and 6), +differenced.
+We will sort the strata within each +ethnicity by the percent difference in counts between +males and females in arm C.
+Note: this is not statistically meaningful at all, and is in fact a +terrible idea because it reorders the strata seemingly (but not) at +random within each race, but illustrates the various things we need to +do inside custom sorting functions.
+
+silly_gender_diffcount <- function(tt) {
+ ## (1st) content row has same name as object (STRATA1 level)
+ rpath <- c(obj_name(tt), "@content", obj_name(tt))
+ ## the [1] below is cause these are count (pct%) cells
+ ## and we only want the count part!
+ mcount <- unlist(cell_values(
+ tt,
+ rowpath = rpath,
+ colpath = c("ARM", "C: Combination", "SEX", "M")
+ ))[1]
+ fcount <- unlist(cell_values(
+ tt,
+ rowpath = rpath,
+ colpath = c("ARM", "C: Combination", "SEX", "F")
+ ))[1]
+ (mcount - fcount) / fcount
+}
+
+sort_at_path(pruned, c("RACE", "*", "STRATA1"), silly_gender_diffcount)
# A: Drug X B: Placebo C: Combination
+# F M F M F M
+# ———————————————————————————————————————————————————————————————————————————————————————————————————————
+# ASIAN 44 (62.9%) 35 (68.6%) 37 (66.1%) 31 (62.0%) 40 (65.6%) 44 (64.7%)
+# B 16 (22.9%) 8 (15.7%) 13 (23.2%) 16 (32.0%) 10 (16.4%) 12 (17.6%)
+# Mean 33.75 34.88 32.46 30.94 33.30 35.92
+# A 15 (21.4%) 12 (23.5%) 14 (25.0%) 6 (12.0%) 15 (24.6%) 16 (23.5%)
+# Mean 30.40 34.42 35.43 30.33 37.40 36.25
+# C 13 (18.6%) 15 (29.4%) 10 (17.9%) 9 (18.0%) 15 (24.6%) 16 (23.5%)
+# Mean 36.92 35.60 34.00 31.89 33.47 31.38
+# BLACK OR AFRICAN AMERICAN 18 (25.7%) 10 (19.6%) 12 (21.4%) 12 (24.0%) 13 (21.3%) 14 (20.6%)
+# C 6 (8.6%) 6 (11.8%) 4 (7.1%) 7 (14.0%) 3 (4.9%) 4 (5.9%)
+# Mean 31.33 39.67 34.50 34.00 33.00 36.50
+# A 5 (7.1%) 1 (2.0%) 5 (8.9%) 2 (4.0%) 4 (6.6%) 4 (5.9%)
+# Mean 31.20 33.00 28.00 30.00 30.75 36.50
+# B 7 (10.0%) 3 (5.9%) 3 (5.4%) 3 (6.0%) 6 (9.8%) 6 (8.8%)
+# Mean 36.14 34.33 29.67 32.00 36.33 31.00
+# WHITE 8 (11.4%) 6 (11.8%) 7 (12.5%) 7 (14.0%) 8 (13.1%) 10 (14.7%)
+# A 2 (2.9%) 1 (2.0%) 3 (5.4%) 3 (6.0%) 1 (1.6%) 5 (7.4%)
+# Mean 34.00 45.00 29.33 33.33 35.00 32.80
+# C 2 (2.9%) 2 (3.9%) 3 (5.4%) 0 (0.0%) 4 (6.6%) 4 (5.9%)
+# Mean 35.50 44.00 44.67 NA 38.50 35.00
+# B 4 (5.7%) 3 (5.9%) 1 (1.8%) 4 (8.0%) 3 (4.9%) 1 (1.5%)
+# Mean 37.00 43.67 48.00 36.75 34.33 36.00
+vignettes/split_functions.Rmd
+ split_functions.Rmd
By default, split_*_by(varname, ...)
generates a facet
+for each level the variable varname
takes in the
+data - including unobserved ones in the factor
case. This
+behavior can be customized in various ways.
The most straightforward way to customize which facets are generated
+by a split is with one of the split functions or split function families
+provided by rtables
.
These predefined split functions and function factories implement +commonly desired customization patterns of splitting behavior (i.e., +faceting behavior). They include:
+remove_split_levels
- remove specified levels from the
+data for facet generation.keep_split_levels
- keep only specified levels in the
+data for facet generation (removing all others).drop_split_levels
- drop levels that are unobserved
+within the data being split, i.e., associated with the parent
+facet.reorder_split_levels
- reorder the levels (and thus the
+generated facets) to the specified order.trim_levels_in_group
- drop unobserved levels of
+another variable independently within the data associated with each
+facet generated by the current split.add_overall_level
, add_combo_levels
- add
+additional “virtual” levels which combine two or more levels of the
+variable being split. See the following section.trim_levels_to_map
- trim the levels of multiple
+variables to a pre-specified set of value combinations. See the
+following section.The first four of these are fairly self-describing and for brevity,
+we refer our readers to ?split_funcs
for details including
+working examples.
Often with nested splitting involving multiple variables, the values +of the variables in question are logically nested; meaning that +certain values of the inner variable are only coherent in combination +with a specific value or values of the outer variable.
+As an example, suppose we have a variable vehicle_class
,
+which can take the values "automobile"
, and
+"boat"
, and a variable vehicle_type
, which can
+take the values "car"
, "truck"
,
+"suv"
,"sailboat"
, and
+"cruiseliner"
. The combination ("automobile"
,
+"cruiseliner"
) does not make sense and will never occur in
+any (correctly cleaned) data set; nor does the combination
+("boat"
, "truck"
).
We will showcase strategies to deal with this in the next sections +using the following artificial data:
+
+set.seed(0)
+levs_type <- c("car", "truck", "suv", "sailboat", "cruiseliner")
+
+vclass <- sample(c("auto", "boat"), 1000, replace = TRUE)
+auto_inds <- which(vclass == "auto")
+vtype <- rep(NA_character_, 1000)
+vtype[auto_inds] <- sample(
+ c("car", "truck"), ## suv missing on purpose
+ length(auto_inds),
+ replace = TRUE
+)
+vtype[-auto_inds] <- sample(
+ c("sailboat", "cruiseliner"),
+ 1000 - length(auto_inds),
+ replace = TRUE
+)
+
+vehic_data <- data.frame(
+ vehicle_class = factor(vclass),
+ vehicle_type = factor(vtype, levels = levs_type),
+ color = sample(
+ c("white", "black", "red"), 1000,
+ prob = c(1, 2, 1),
+ replace = TRUE
+ ),
+ cost = ifelse(
+ vclass == "boat",
+ rnorm(1000, 100000, sd = 5000),
+ rnorm(1000, 40000, sd = 5000)
+ )
+)
+head(vehic_data)
## vehicle_class vehicle_type color cost
+## 1 boat sailboat black 100393.81
+## 2 auto car white 38150.17
+## 3 boat sailboat white 98696.13
+## 4 auto truck white 37677.16
+## 5 auto truck black 38489.27
+## 6 boat cruiseliner black 108709.72
+trim_levels_in_group
+The trim_levels_in_group
split function factory creates
+split functions which deal with this issue empirically; any combination
+which is observed in the data being tabulated will appear as
+nested facets within the table, while those that do not, will not.
If we use default level-based faceting, we get several logically +incoherent cells within our table:
+
+library(rtables)
+
+lyt <- basic_table() %>%
+ split_cols_by("color") %>%
+ split_rows_by("vehicle_class") %>%
+ split_rows_by("vehicle_type") %>%
+ analyze("cost")
+
+build_table(lyt, vehic_data)
## black white red
+## ————————————————————————————————————————————————
+## auto
+## car
+## Mean 40431.92 40518.92 38713.14
+## truck
+## Mean 40061.70 40635.74 40024.41
+## suv
+## Mean NA NA NA
+## sailboat
+## Mean NA NA NA
+## cruiseliner
+## Mean NA NA NA
+## boat
+## car
+## Mean NA NA NA
+## truck
+## Mean NA NA NA
+## suv
+## Mean NA NA NA
+## sailboat
+## Mean 99349.69 99996.54 101865.73
+## cruiseliner
+## Mean 100212.00 99340.25 100363.52
+This is obviously not the table we want, as the majority of its space
+is taken up by meaningless combinations. If we use
+trim_levels_in_group
to trim the levels of
+vehicle_type
separately within each level of
+vehicle_class
, we get a table which only has meaningful
+combinations:
+lyt2 <- basic_table() %>%
+ split_cols_by("color") %>%
+ split_rows_by("vehicle_class", split_fun = trim_levels_in_group("vehicle_type")) %>%
+ split_rows_by("vehicle_type") %>%
+ analyze("cost")
+
+build_table(lyt2, vehic_data)
## black white red
+## ————————————————————————————————————————————————
+## auto
+## car
+## Mean 40431.92 40518.92 38713.14
+## truck
+## Mean 40061.70 40635.74 40024.41
+## boat
+## sailboat
+## Mean 99349.69 99996.54 101865.73
+## cruiseliner
+## Mean 100212.00 99340.25 100363.52
+Note, however, that it does not contain all meaningful
+combinations, only those that were actually observed in our data; which
+happens to not include the perfectly valid "auto"
,
+"suv"
combination.
To restrict level combinations to those which are valid
+regardless of whether the combination was observed, we must use
+trim_levels_to_map()
instead.
trim_levels_to_map
+trim_levels_to_map
is similar to
+trim_levels_in_group
in that its purpose is to avoid
+combinatorial explosion when nesting splitting with logically nested
+variables. Unlike its sibling function, however, with
+trim_levels_to_map
we define the exact set of allowed
+combinations a priori, and that exact set of combinations is
+produced in the resulting table, regardless of whether they are observed
+or not.
+library(tibble)
+map <- tribble(
+ ~vehicle_class, ~vehicle_type,
+ "auto", "truck",
+ "auto", "suv",
+ "auto", "car",
+ "boat", "sailboat",
+ "boat", "cruiseliner"
+)
+
+lyt3 <- basic_table() %>%
+ split_cols_by("color") %>%
+ split_rows_by("vehicle_class", split_fun = trim_levels_to_map(map)) %>%
+ split_rows_by("vehicle_type") %>%
+ analyze("cost")
+
+build_table(lyt3, vehic_data)
## black white red
+## ————————————————————————————————————————————————
+## auto
+## car
+## Mean 40431.92 40518.92 38713.14
+## truck
+## Mean 40061.70 40635.74 40024.41
+## suv
+## Mean NA NA NA
+## boat
+## sailboat
+## Mean 99349.69 99996.54 101865.73
+## cruiseliner
+## Mean 100212.00 99340.25 100363.52
+Now we see that the "auto"
, "suv"
+combination is again present, even though it is populated with
+NA
s (because there is no data in that category), but the
+logically invalid combinations are still absent.
Another very common manipulation of faceting in a table context is +the introduction of combination levels that are not explicitly modeled +in the data. Most often, this involves the addition of an “overall” +category, but in both principle and practice it can involve any +arbitrary combination of levels.
+rtables
explicitly supports this via the
+add_overall_level
(for the all case) and
+add_combo_levels
split function factories.
add_overall_level
+add_overall_level
accepts valname
which is
+the name of the new level, as well as label
, and
+first
(whether it should come first, if TRUE
,
+or last, if FALSE
, in the ordering).
Building further on our arbitrary vehicles table, we can use this to +create an “all colors” category:
+
+lyt4 <- basic_table(show_colcounts = TRUE) %>%
+ split_cols_by("color", split_fun = add_overall_level("allcolors", label = "All Colors")) %>%
+ split_rows_by("vehicle_class", split_fun = trim_levels_to_map(map)) %>%
+ split_rows_by("vehicle_type") %>%
+ analyze("cost")
+
+build_table(lyt4, vehic_data)
## All Colors black white red
+## (N=1000) (N=521) (N=251) (N=228)
+## —————————————————————————————————————————————————————————————
+## auto
+## car
+## Mean 40095.49 40431.92 40518.92 38713.14
+## truck
+## Mean 40194.68 40061.70 40635.74 40024.41
+## suv
+## Mean NA NA NA NA
+## boat
+## sailboat
+## Mean 100133.22 99349.69 99996.54 101865.73
+## cruiseliner
+## Mean 100036.76 100212.00 99340.25 100363.52
+With the column counts turned on, we can see that the “All Colors” +column encompasses the full 1000 (completely fake) vehicles in our data +set.
+To add more arbitrary combinations, we use
+add_combo_levels
.
add_combo_levels
+add_combo_levels
allows us to add one or more arbitrary
+combination levels to the faceting structure of our table.
We do this by defining a combination data.frame which
+describes the levels we want to add. A combination
+data.frame
has the following columns and one row for each
+combination to add:
valname
- string indicating the name of the value,
+which will appear in paths.label
- a string indicating the label which should be
+displayed when rendering.levelcombo
- character vector of the individual levels
+to be combined in this combination level.exargs
- a list (usually list()
) of extra
+arguments which should be passed to analysis and content functions when
+tabulated within this column or row.Suppose we wanted combinations levels for all non-white colors, and +for white and black colors. We do this like so:
+
+combodf <- tribble(
+ ~valname, ~label, ~levelcombo, ~exargs,
+ "non-white", "Non-White", c("black", "red"), list(),
+ "blackwhite", "Black or White", c("black", "white"), list()
+)
+
+
+lyt5 <- basic_table(show_colcounts = TRUE) %>%
+ split_cols_by("color", split_fun = add_combo_levels(combodf)) %>%
+ split_rows_by("vehicle_class", split_fun = trim_levels_to_map(map)) %>%
+ split_rows_by("vehicle_type") %>%
+ analyze("cost")
+
+build_table(lyt5, vehic_data)
## black white red Non-White Black or White
+## (N=521) (N=251) (N=228) (N=749) (N=772)
+## —————————————————————————————————————————————————————————————————————————————
+## auto
+## car
+## Mean 40431.92 40518.92 38713.14 39944.93 40460.77
+## truck
+## Mean 40061.70 40635.74 40024.41 40050.66 40243.57
+## suv
+## Mean NA NA NA NA NA
+## boat
+## sailboat
+## Mean 99349.69 99996.54 101865.73 100179.72 99567.50
+## cruiseliner
+## Mean 100212.00 99340.25 100363.52 100258.56 99937.47
+Beyond the ability to select common splitting customizations from the
+split functions and split function factories rtables
+provides, we can also fully customize every aspect of splitting behavior
+by creating our own split functions. While it is possible to do so by
+hand, the primary way we do this is via the
+make_split_fun()
function, which accepts functions
+implementing different component behaviors and combines them into a
+split function which can be used in a layout.
Splitting, or faceting as it is done in rtables
, can be
+thought of as the combination of 3 steps:
The make_split_fun()
function allows us to specify
+custom behaviors for each of these steps independently when defining
+custom splitting behavior via the pre
,
+core_split
, and post
arguments, which dictate
+the above steps, respectively.
The pre
argument accepts zero or more pre-processing
+functions, which must accept: df
, spl
,
+vals
, labels
, and can optionally accept
+.spl_context
. They then manipulate df
(the
+incoming data for the split) and return a modified data.frame. This
+modified data.frame must contain all columns present in the
+incoming data.frame, but can add columns if necessary. Although, we note
+that these new columns cannot be used in the layout as split or
+analysis variables, because they will not be present when validity
+checking is done.
The pre-processing component is useful for things such as +manipulating factor levels, e.g., to trim unobserved ones or to reorder +levels based on observed counts, etc.
+For a more detailed discussion on what custom split functions do, and
+an example of a custom split function not implemented via
+make_split_fun()
, see ?custom_split_funs
.
Here we will implement an arbitrary, custom split function where we +specify both pre- and post-processing instructions. It is unusual for +users to need to override the core splitting logic - and, in fact, is +only supported in row space currently - so we leave this off of our +example here but will provide another narrow example of that usage +below.
+First, we define two aspects of ‘pre-processing step’ behavior:
+
+## reverse order of levels
+
+rev_lev <- function(df, spl, vals, labels, ...) {
+ ## in the split_rows_by() and split_cols_by() cases,
+ ## spl_variable() gives us the variable
+ var <- spl_variable(spl)
+ vec <- df[[var]]
+ levs <- if (is.character(vec)) unique(vec) else levels(vec)
+ df[[var]] <- factor(vec, levels = rev(levs))
+ df
+}
+
+rem_lev_facet <- function(torem) {
+ function(df, spl, vals, labels, ...) {
+ var <- spl_variable(spl)
+ vec <- df[[var]]
+ bad <- vec == torem
+ df <- df[!bad, ]
+ levs <- if (is.character(vec)) unique(vec) else levels(vec)
+ df[[var]] <- factor(as.character(vec[!bad]), levels = setdiff(levs, torem))
+ df
+ }
+}
Finally we implement our post-processing function. Here we will +reorder the facets based on the amount of data each of them +represents.
+
+sort_them_facets <- function(splret, spl, fulldf, ...) {
+ ord <- order(sapply(splret$datasplit, nrow))
+ make_split_result(
+ splret$values[ord],
+ splret$datasplit[ord],
+ splret$labels[ord]
+ )
+}
Finally, we construct our custom split function and use it to create +our table:
+
+silly_splfun1 <- make_split_fun(
+ pre = list(
+ rev_lev,
+ rem_lev_facet("white")
+ ),
+ post = list(sort_them_facets)
+)
+
+lyt6 <- basic_table(show_colcounts = TRUE) %>%
+ split_cols_by("color", split_fun = silly_splfun1) %>%
+ split_rows_by("vehicle_class", split_fun = trim_levels_to_map(map)) %>%
+ split_rows_by("vehicle_type") %>%
+ analyze("cost")
+
+build_table(lyt6, vehic_data)
## red black
+## (N=228) (N=521)
+## —————————————————————————————————————
+## auto
+## car
+## Mean 38713.14 40431.92
+## truck
+## Mean 40024.41 40061.70
+## suv
+## Mean NA NA
+## boat
+## sailboat
+## Mean 101865.73 99349.69
+## cruiseliner
+## Mean 100363.52 100212.00
+Currently, overriding core split behavior is only supported +in functions used for row splits.
+Next, we write a custom core-splitting function which divides the +observations into 4 groups: the first 100, observations 101-500, +observations 501-900, and the last hundred. We could claim this was to +test for structural bias in the first and last observations, but really +its to simply illustrate overriding the core splitting machinery and has +no meaningful statistical purpose.
+
+silly_core_split <- function(spl, df, vals, labels, .spl_context) {
+ make_split_result(
+ c("first", "lowmid", "highmid", "last"),
+ datasplit = list(
+ df[1:100, ],
+ df[101:500, ],
+ df[501:900, ],
+ df[901:1000, ]
+ ),
+ labels = c(
+ "first 100",
+ "obs 101-500",
+ "obs 501-900",
+ "last 100"
+ )
+ )
+}
We can use this to construct a splitting function. This can be +combined with pre- and post-processing functions, as each of the stages +is performed independently, but in this case, we won’t, because our core +splitting behavior is such that pre- or post-processing do not make much +sense.
+
+even_sillier_splfun <- make_split_fun(core_split = silly_core_split)
+
+lyt7 <- basic_table(show_colcounts = TRUE) %>%
+ split_cols_by("color") %>%
+ split_rows_by("vehicle_class", split_fun = even_sillier_splfun) %>%
+ split_rows_by("vehicle_type") %>%
+ analyze("cost")
+
+build_table(lyt7, vehic_data)
## black white red
+## (N=521) (N=251) (N=228)
+## —————————————————————————————————————————————————
+## first 100
+## car
+## Mean 40496.05 37785.41 37623.17
+## truck
+## Mean 41094.17 40437.29 37866.81
+## suv
+## Mean NA NA NA
+## sailboat
+## Mean 100560.80 102017.05 101185.96
+## cruiseliner
+## Mean 100838.12 96952.27 100610.71
+## obs 101-500
+## car
+## Mean 39350.88 41185.98 37978.72
+## truck
+## Mean 40166.87 41385.32 39885.72
+## suv
+## Mean NA NA NA
+## sailboat
+## Mean 98845.47 99563.02 101462.79
+## cruiseliner
+## Mean 101558.62 99039.91 97335.05
+## obs 501-900
+## car
+## Mean 40721.82 40379.48 38681.26
+## truck
+## Mean 39951.92 39846.89 39840.39
+## suv
+## Mean NA NA NA
+## sailboat
+## Mean 99533.20 100347.18 102732.12
+## cruiseliner
+## Mean 99140.43 100074.43 101994.99
+## last 100
+## car
+## Mean 45204.44 40626.95 41214.33
+## truck
+## Mean 38920.70 40620.47 42899.14
+## suv
+## Mean NA NA NA
+## sailboat
+## Mean 99380.21 97644.77 101691.92
+## cruiseliner
+## Mean 100017.53 99581.94 100751.30
+make_split_fun
+Pre-processing and post-processing functions in the custom-splitting +context are best thought of as (and implemented as) independent, atomic +building blocks for the desired overall behavior. This allows them to be +reused in a flexible mix-and-match way.
+rtables
provides several behavior components implemented
+as either functions or function factories:
drop_facet_levels
- drop unobserved levels in the
+variable being splittrim_levels_in_facets
- provides
+trim_levels_in_group
behavioradd_overall_facet
- add a combination facet for the
+full dataadd_combo_facet
- add a single combination facet (can
+be used more than once in a single make_split_fun
+call)vignettes/subsetting_tables.Rmd
+ subsetting_tables.Rmd
TableTree
objects are based on a tree data structure as
+the name indicates. The package is written such that the user does not
+need to walk trees for many basic table manipulations. Walking trees
+will still be necessary for certain manipulation and will be the subject
+of a different vignette.
In this vignette we show some methods to subset tables and to extract +cell values.
+We will use the following table for illustrative purposes:
+
+library(rtables)
+library(dplyr)
+
+lyt <- basic_table() %>%
+ split_cols_by("ARM") %>%
+ split_rows_by("SEX", split_fun = drop_split_levels) %>%
+ analyze(c("AGE", "STRATA1"))
+
+tbl <- build_table(lyt, ex_adsl %>% filter(SEX %in% c("M", "F")))
+tbl
# A: Drug X B: Placebo C: Combination
+# ———————————————————————————————————————————————————
+# F
+# AGE
+# Mean 32.76 34.12 35.20
+# STRATA1
+# A 21 24 18
+# B 25 27 21
+# C 33 26 27
+# M
+# AGE
+# Mean 35.57 37.44 35.38
+# STRATA1
+# A 16 19 20
+# B 21 17 21
+# C 14 19 19
+[
+The [
and [<-
accessor functions operate
+largely the same as their data.frame
cousins:
[
and [<-
both treat Tables as
+rectangular objects (rather than trees)
+[
accepts both column and row absolute position, and
+missing arguments mean “all indexes in that dimension”
+[.data.frame
they cannot be mixed with positive ones[
always returns the same class as the object being
+subset unless drop = TRUE
+[ , drop = TRUE
returns the raw (possibly
+multi-element) value associated with the cell.Known Differences from [.data.frame
-
+absolute position cannot currently be used to reorder columns or rows.
+Note in general the result of such an ordering is unlikely to be
+structurally valid. To change the order of values, please read sorting
+and pruning vignette or relevant function
+(sort_at_path()
). - character
indices are
+treated as paths, not vectors of names in both [
and
+[<-
The [
accessor function always returns an
+TableTree
object if drop=TRUE
is not set. The
+first argument are the row indices and the second argument the column
+indices. Alternatively logical subsetting can be used. The indices are
+based on visible rows and not on the tree structure. So:
+tbl[1, 1]
# A: Drug X
+# —————————————
+# F
+is a table with an empty cell because the first row is a label row. +We need to access a cell with actual cell data:
+
+tbl[3, 1]
# A: Drug X
+# ————————————————
+# Mean 32.76
+To retrieve the value, we use drop = TRUE
:
+tbl[3, 1, drop = TRUE]
# [1] 32.75949
+One can access multiple rows and columns:
+
+tbl[1:3, 1:2]
# A: Drug X B: Placebo
+# —————————————————————————————————
+# F
+# AGE
+# Mean 32.76 34.12
+Note that we do not repeat label rows for descending children, +e.g.
+
+tbl[2:4, ]
# A: Drug X B: Placebo C: Combination
+# —————————————————————————————————————————————————
+# AGE
+# Mean 32.76 34.12 35.20
+# STRATA1
+does not show that the first row is derived from AGE
. In
+order to repeat content/label information, one should use the pagination
+feature. Please read the related vignette.
Character indices are interpreted as paths (see below), NOT elements
+to be matched against names(tbl)
:
+tbl[, c("ARM", "A: Drug X")]
# Note: method with signature 'VTableTree#missing#ANY' chosen for function '[',
+# target signature 'TableTree#missing#character'.
+# "VTableTree#ANY#character" would also be valid
+# A: Drug X
+# —————————————————————
+# F
+# AGE
+# Mean 32.76
+# STRATA1
+# A 21
+# B 25
+# C 33
+# M
+# AGE
+# Mean 35.57
+# STRATA1
+# A 16
+# B 21
+# C 14
+As standard no additional information is kept after subsetting. Here, +we show with a more complete table how it is still possible to keep the +(possibly) relevant information.
+
+top_left(tbl) <- "SEX"
+main_title(tbl) <- "Table 1"
+subtitles(tbl) <- c("Authors:", " - Abcd Zabcd", " - Cde Zbcd")
+
+main_footer(tbl) <- "Please regard this table as an example of smart subsetting"
+prov_footer(tbl) <- "Do remember where you read this though"
+
+fnotes_at_path(tbl, rowpath = c("M", "AGE", "Mean"), colpath = c("ARM", "A: Drug X")) <- "Very important mean"
Normal subsetting loses all the information showed above.
+
+tbl[3, 3]
# C: Combination
+# —————————————————————
+# Mean 35.20
+If all the rows are kept, top left information is also kept. This can
+be also imposed by adding keep_topleft = TRUE
to the
+subsetting as follows:
+tbl[, 2:3]
# SEX B: Placebo C: Combination
+# ———————————————————————————————————————
+# F
+# AGE
+# Mean 34.12 35.20
+# STRATA1
+# A 24 18
+# B 27 21
+# C 26 27
+# M
+# AGE
+# Mean 37.44 35.38
+# STRATA1
+# A 19 20
+# B 17 21
+# C 19 19
+
+tbl[1:3, 3, keep_topleft = TRUE]
# SEX C: Combination
+# —————————————————————————
+# F
+# AGE
+# Mean 35.20
+If the referenced entry is present in the subsetting, also the +referential footnote will appear. Please consider reading relevant +vignette about referential +footnotes. In case of subsetting, the referential footnotes are by +default indexed again, as if the produced table is a new one.
+
+tbl[10, 1]
# A: Drug X
+# ————————————————
+# Mean 35.57 {1}
+# ————————————————
+#
+# {1} - Very important mean
+# ————————————————
+
+col_paths_summary(tbl) # Use these to find the right path to value or label
# label path
+# —————————————————————————————————————
+# A: Drug X ARM, A: Drug X
+# B: Placebo ARM, B: Placebo
+# C: Combination ARM, C: Combination
+
+row_paths_summary(tbl) #
# rowname node_class path
+# —————————————————————————————————————————————
+# F LabelRow SEX, F
+# AGE LabelRow SEX, F, AGE
+# Mean DataRow SEX, F, AGE, Mean
+# STRATA1 LabelRow SEX, F, STRATA1
+# A DataRow SEX, F, STRATA1, A
+# B DataRow SEX, F, STRATA1, B
+# C DataRow SEX, F, STRATA1, C
+# M LabelRow SEX, M
+# AGE LabelRow SEX, M, AGE
+# Mean DataRow SEX, M, AGE, Mean
+# STRATA1 LabelRow SEX, M, STRATA1
+# A DataRow SEX, M, STRATA1, A
+# B DataRow SEX, M, STRATA1, B
+# C DataRow SEX, M, STRATA1, C
+
+# To select column value, use `NULL` for `rowpath`
+fnotes_at_path(tbl, rowpath = NULL, colpath = c("ARM", "A: Drug X")) <- "Interesting"
+tbl[3, 1]
# A: Drug X {1}
+# ————————————————————
+# Mean 32.76
+# ————————————————————
+#
+# {1} - Interesting
+# ————————————————————
+
+# reindexing of {2} as {1}
+fnotes_at_path(tbl, rowpath = c("M", "AGE", "Mean"), colpath = NULL) <- "THIS mean"
+tbl # {1}, {2}, and {3} are present
# Table 1
+# Authors:
+# - Abcd Zabcd
+# - Cde Zbcd
+#
+# ——————————————————————————————————————————————————————————
+# SEX A: Drug X {1} B: Placebo C: Combination
+# ——————————————————————————————————————————————————————————
+# F
+# AGE
+# Mean 32.76 34.12 35.20
+# STRATA1
+# A 21 24 18
+# B 25 27 21
+# C 33 26 27
+# M
+# AGE
+# Mean {2} 35.57 {3} 37.44 35.38
+# STRATA1
+# A 16 19 20
+# B 21 17 21
+# C 14 19 19
+# ——————————————————————————————————————————————————————————
+#
+# {1} - Interesting
+# {2} - THIS mean
+# {3} - Very important mean
+# ——————————————————————————————————————————————————————————
+#
+# Please regard this table as an example of smart subsetting
+#
+# Do remember where you read this though
+
+tbl[10, 2] # only {1} which was previously {2}
# B: Placebo
+# —————————————————————
+# Mean {1} 37.44
+# —————————————————————
+#
+# {1} - THIS mean
+# —————————————————————
+Similar to what we have used to keep top left information, we can +specify to keep more information from the original table. As a standard +the foot notes are always present if the titles are kept.
+
+tbl[1:3, 2:3, keep_titles = TRUE]
# Table 1
+# Authors:
+# - Abcd Zabcd
+# - Cde Zbcd
+#
+# ——————————————————————————————————————
+# B: Placebo C: Combination
+# ——————————————————————————————————————
+# F
+# AGE
+# Mean 34.12 35.20
+# ——————————————————————————————————————
+#
+# Please regard this table as an example of smart subsetting
+#
+# Do remember where you read this though
+
+tbl[1:3, 2:3, keep_titles = FALSE, keep_footers = TRUE]
# B: Placebo C: Combination
+# ——————————————————————————————————————
+# F
+# AGE
+# Mean 34.12 35.20
+# ——————————————————————————————————————
+#
+# Please regard this table as an example of smart subsetting
+#
+# Do remember where you read this though
+
+# Referential footnotes are not influenced by `keep_footers = FALSE`
+tbl[1:3, keep_titles = TRUE, keep_footers = FALSE]
# Table 1
+# Authors:
+# - Abcd Zabcd
+# - Cde Zbcd
+#
+# ——————————————————————————————————————————————————————
+# A: Drug X {1} B: Placebo C: Combination
+# ——————————————————————————————————————————————————————
+# F
+# AGE
+# Mean 32.76 34.12 35.20
+# ——————————————————————————————————————————————————————
+#
+# {1} - Interesting
+# ——————————————————————————————————————————————————————
+Tables can be subset or modified in a structurally aware manner via +pathing.
+Paths define semantically meaningful positions within a constructed +table that correspond to the logic of the layout used to create it.
+A path is an ordered set of split names, the names of subgroups
+generated by the split, and the @content
directive, which
+steps into a position’s content (or row group summary) table.
We can see the row and column paths of an existing table via the
+row_paths()
, col_paths()
,
+row_paths_summary()
, and col_paths_summary()
,
+functions, or as a portion of the more general
+make_row_df()
function output.
+lyt2 <- basic_table() %>%
+ split_cols_by("ARM") %>%
+ split_cols_by("SEX", split_fun = drop_split_levels) %>%
+ split_rows_by("RACE", split_fun = drop_split_levels) %>%
+ summarize_row_groups() %>%
+ analyze(c("AGE", "STRATA1"))
+
+tbl2 <- build_table(lyt2, ex_adsl %>% filter(SEX %in% c("M", "F") & RACE %in% (levels(RACE)[1:3])))
+tbl2
# A: Drug X B: Placebo C: Combination
+# F M F M F M
+# ———————————————————————————————————————————————————————————————————————————————————————————————————————
+# ASIAN 41 (53.9%) 25 (54.3%) 36 (52.2%) 30 (60.0%) 39 (60.9%) 32 (57.1%)
+# AGE
+# Mean 31.22 34.60 35.06 38.63 36.44 37.66
+# STRATA1
+# A 11 10 14 10 11 7
+# B 11 9 15 7 11 14
+# C 19 6 7 13 17 11
+# BLACK OR AFRICAN AMERICAN 18 (23.7%) 12 (26.1%) 16 (23.2%) 12 (24.0%) 14 (21.9%) 14 (25.0%)
+# AGE
+# Mean 34.06 34.58 33.88 36.33 33.21 34.21
+# STRATA1
+# A 5 2 5 6 3 7
+# B 6 5 3 4 4 4
+# C 7 5 8 2 7 3
+# WHITE 17 (22.4%) 9 (19.6%) 17 (24.6%) 8 (16.0%) 11 (17.2%) 10 (17.9%)
+# AGE
+# Mean 34.12 40.00 32.41 34.62 33.00 30.80
+# STRATA1
+# A 5 3 3 3 3 5
+# B 5 4 8 4 5 2
+# C 7 2 6 1 3 3
+So the column paths are as follows:
+
+col_paths_summary(tbl2)
# label path
+# —————————————————————————————————————————————
+# A: Drug X ARM, A: Drug X
+# F ARM, A: Drug X, SEX, F
+# M ARM, A: Drug X, SEX, M
+# B: Placebo ARM, B: Placebo
+# F ARM, B: Placebo, SEX, F
+# M ARM, B: Placebo, SEX, M
+# C: Combination ARM, C: Combination
+# F ARM, C: Combination, SEX, F
+# M ARM, C: Combination, SEX, M
+and the row paths are as follows:
+
+row_paths_summary(tbl2)
# rowname node_class path
+# ———————————————————————————————————————————————————————————————————————————————————————————————————————————————
+# ASIAN ContentRow RACE, ASIAN, @content, ASIAN
+# AGE LabelRow RACE, ASIAN, AGE
+# Mean DataRow RACE, ASIAN, AGE, Mean
+# STRATA1 LabelRow RACE, ASIAN, STRATA1
+# A DataRow RACE, ASIAN, STRATA1, A
+# B DataRow RACE, ASIAN, STRATA1, B
+# C DataRow RACE, ASIAN, STRATA1, C
+# BLACK OR AFRICAN AMERICAN ContentRow RACE, BLACK OR AFRICAN AMERICAN, @content, BLACK OR AFRICAN AMERICAN
+# AGE LabelRow RACE, BLACK OR AFRICAN AMERICAN, AGE
+# Mean DataRow RACE, BLACK OR AFRICAN AMERICAN, AGE, Mean
+# STRATA1 LabelRow RACE, BLACK OR AFRICAN AMERICAN, STRATA1
+# A DataRow RACE, BLACK OR AFRICAN AMERICAN, STRATA1, A
+# B DataRow RACE, BLACK OR AFRICAN AMERICAN, STRATA1, B
+# C DataRow RACE, BLACK OR AFRICAN AMERICAN, STRATA1, C
+# WHITE ContentRow RACE, WHITE, @content, WHITE
+# AGE LabelRow RACE, WHITE, AGE
+# Mean DataRow RACE, WHITE, AGE, Mean
+# STRATA1 LabelRow RACE, WHITE, STRATA1
+# A DataRow RACE, WHITE, STRATA1, A
+# B DataRow RACE, WHITE, STRATA1, B
+# C DataRow RACE, WHITE, STRATA1, C
+To get a semantically meaningful subset of our table, then, we can
+use [
(or tt_at_path()
which underlies it)
# C: Combination
+# F M
+# ———————————————————————————————————
+# ASIAN 39 (60.9%) 32 (57.1%)
+# AGE
+# Mean 36.44 37.66
+# STRATA1
+# A 11 7
+# B 11 14
+# C 17 11
+We can also retrieve individual cell-values via the
+value_at()
convenience function, which takes a pair of row
+and column paths which resolve together to an individual cell,
+e.g. average age for Asian female patients in arm A:
# [1] 31.21951
+You can also request information from non-cell specific paths with
+the cell_values()
function:
+cell_values(tbl2, c("RACE", "ASIAN", "AGE", "Mean"), c("ARM", "A: Drug X"))
# $`A: Drug X.F`
+# [1] 31.21951
+#
+# $`A: Drug X.M`
+# [1] 34.6
+Note the return value of cell_values()
is always a list
+even if you specify a path to a cell:
+cell_values(tbl2, c("RACE", "ASIAN", "AGE", "Mean"), c("ARM", "A: Drug X", "SEX", "F"))
# $`A: Drug X.F`
+# [1] 31.21951
+vignettes/tabulation_concepts.Rmd
+ tabulation_concepts.Rmd
In this vignette we will introduce some theory behind using layouts +for table creation. Much of the theory also holds true when using other +table packages. For this vignette we will use the following +packages:
+ +The data we use is the following, created with random number +generators:
+
+add_subgroup <- function(x) paste0(tolower(x), sample(1:3, length(x), TRUE))
+
+set.seed(1)
+
+df <- tibble(
+ x = rnorm(100),
+ c1 = factor(sample(c("A", "B", "C"), 100, replace = TRUE), levels = c("A", "B", "C")),
+ r1 = factor(sample(c("U", "V", "W"), 100, replace = TRUE), levels = c("U", "V", "W"))
+) %>%
+ mutate(
+ c2 = add_subgroup(c1),
+ r2 = add_subgroup(r1),
+ y = as.numeric(2 * as.numeric(c1) - 3 * as.numeric(r1))
+ ) %>%
+ select(c1, c2, r1, r2, x, y)
+
+df
# # A tibble: 100 × 6
+# c1 c2 r1 r2 x y
+# <fct> <chr> <fct> <chr> <dbl> <dbl>
+# 1 B b2 U u3 -0.626 1
+# 2 A a3 V v2 0.184 -4
+# 3 B b1 V v2 -0.836 -2
+# 4 B b3 V v2 1.60 -2
+# 5 B b1 U u1 0.330 1
+# 6 C c1 U u3 -0.820 3
+# 7 A a3 U u3 0.487 -1
+# 8 B b1 U u3 0.738 1
+# 9 C c3 V v2 0.576 0
+# 10 C c3 U u2 -0.305 3
+# # ℹ 90 more rows
+Let’s look at a table that has 3 columns and 3 rows. Each row
+represents a different analysis (functions foo
,
+bar
, zoo
that return an rcell()
+object):
A B C
+------------------------------------------------
+foo_label foo(df_A) foo(df_B) foo(df_C)
+bar_label bar(df_A) bar(df_B) bar(df_C)
+zoo_label zoo(df_A) zoo(df_B) zoo(df_C)
+The data passed to the analysis functions are a subset defined by the +respective column and:
+
+df_A <- df %>% filter(c1 == "A")
+df_B <- df %>% filter(c1 == "B")
+df_C <- df %>% filter(c1 == "C")
Let’s do this on the concrete data with analyze()
:
+foo <- prod
+bar <- sum
+zoo <- mean
+
+lyt <- basic_table() %>%
+ split_cols_by("c1") %>%
+ analyze("x", function(df) foo(df$x), var_labels = "foo label", format = "xx.xx") %>%
+ analyze("x", function(df) bar(df$x), var_labels = "bar label", format = "xx.xx") %>%
+ analyze("x", function(df) zoo(df$x), var_labels = "zoo label", format = "xx.xx")
+
+tbl <- build_table(lyt, df)
# Warning: Non-unique sibling analysis table names. Using Labels instead. Use the table_names argument to analyze to avoid this when analyzing the same variable multiple times.
+# occured at (row) path: root
+
+tbl
# A B C
+# ——————————————————————————————————
+# foo label
+# foo label 0.00 -0.00 -0.00
+# bar label
+# bar label 1.87 4.37 4.64
+# zoo label
+# zoo label 0.05 0.13 0.18
+or if we wanted the x
variable instead of the data
+frame:
A B C
+------------------------------------------------
+foo_label foo(x_A) foo(x_B) foo(x_C)
+bar_label bar(x_A) bar(x_B) bar(x_C)
+zoo_label zoo(x_A) zoo(x_B) zoo(x_C)
+where:
+
+x_A <- df_A$x
+x_B <- df_B$x
+x_C <- df_C$x
The function passed to afun
is evaluated using argument
+matching. If afun
has an argument x
the
+analysis variable specified in vars
in
+analyze()
is passed to the function, and if
+afun
has an argument df
then a subset of the
+dataset is passed to afun
:
+lyt2 <- basic_table() %>%
+ split_cols_by("c1") %>%
+ analyze("x", foo, var_labels = "foo label", format = "xx.xx") %>%
+ analyze("x", bar, var_labels = "bar label", format = "xx.xx") %>%
+ analyze("x", zoo, var_labels = "zoo label", format = "xx.xx")
+
+tbl2 <- build_table(lyt2, df)
# Warning: Non-unique sibling analysis table names. Using Labels instead. Use the table_names argument to analyze to avoid this when analyzing the same variable multiple times.
+# occured at (row) path: root
+
+tbl2
# A B C
+# ————————————————————————————————
+# foo label
+# foo 0.00 -0.00 -0.00
+# bar label
+# bar 1.87 4.37 4.64
+# zoo label
+# zoo 0.05 0.13 0.18
+Note that it is also possible that a function returns multiple rows
+with in_rows()
:
+lyt3 <- basic_table() %>%
+ split_cols_by("c1") %>%
+ analyze("x", function(x) {
+ in_rows(
+ "row 1" = rcell(mean(x), format = "xx.xx"),
+ "row 2" = rcell(sd(x), format = "xx.xxx")
+ )
+ }, var_labels = "foo label") %>%
+ analyze("x", function(x) {
+ in_rows(
+ "more rows 1" = rcell(median(x), format = "xx.x"),
+ "even more rows 1" = rcell(IQR(x), format = "xx.xx")
+ )
+ }, var_labels = "bar label", format = "xx.xx")
+
+tbl3 <- build_table(lyt3, df)
# Warning: Non-unique sibling analysis table names. Using Labels instead. Use the table_names argument to analyze to avoid this when analyzing the same variable multiple times.
+# occured at (row) path: root
+
+tbl3
# A B C
+# ——————————————————————————————————————————
+# foo label
+# row 1 0.05 0.13 0.18
+# row 2 0.985 0.815 0.890
+# bar label
+# more rows 1 -0.0 0.2 0.3
+# even more rows 1 1.20 1.15 1.16
+This is how we recommend you specify the row names explicitly.
+Let’s say we would like to create the following table:
+ A B C
+--------------------------------------
+U foo(df_UA) foo(df_UB) foo(df_UC)
+V foo(df_VA) foo(df_VB) foo(df_VC)
+W foo(df_WA) foo(df_WB) foo(df_WC)
+where df_*
are subsets of df
as
+follows:
+df_UA <- df %>% filter(r1 == "U", c1 == "A")
+df_VA <- df %>% filter(r1 == "V", c1 == "A")
+df_WA <- df %>% filter(r1 == "W", c1 == "A")
+df_UB <- df %>% filter(r1 == "U", c1 == "B")
+df_VB <- df %>% filter(r1 == "V", c1 == "B")
+df_WB <- df %>% filter(r1 == "W", c1 == "C")
+df_UC <- df %>% filter(r1 == "U", c1 == "C")
+df_VC <- df %>% filter(r1 == "V", c1 == "C")
+df_WC <- df %>% filter(r1 == "W", c1 == "C")
further note that df_*
are of the same class as
+df
, i.e. tibble
s. Hence foo
+aggregates the subset of our data to a cell value.
Given a function foo
(ignore the ...
for
+now):
we can start calculating the cell values individually:
+
+foo(df_UA)
# [1] "17 x 6"
+
+foo(df_VA)
# [1] "9 x 6"
+
+foo(df_WA)
# [1] "14 x 6"
+
+foo(df_UB)
# [1] "13 x 6"
+
+foo(df_VB)
# [1] "15 x 6"
+
+foo(df_WB)
# [1] "11 x 6"
+
+foo(df_UC)
# [1] "10 x 6"
+
+foo(df_VC)
# [1] "5 x 6"
+
+foo(df_WC)
# [1] "11 x 6"
+Now we are still missing the table structure:
+
+matrix(
+ list(
+ foo(df_UA),
+ foo(df_VA),
+ foo(df_WA),
+ foo(df_UB),
+ foo(df_VB),
+ foo(df_WB),
+ foo(df_UC),
+ foo(df_VC),
+ foo(df_WC)
+ ),
+ byrow = FALSE, ncol = 3
+)
# [,1] [,2] [,3]
+# [1,] "17 x 6" "13 x 6" "10 x 6"
+# [2,] "9 x 6" "15 x 6" "5 x 6"
+# [3,] "14 x 6" "11 x 6" "11 x 6"
+In rtables
this type of tabulation is done with
+layouts
:
+lyt4 <- basic_table() %>%
+ split_cols_by("c1") %>%
+ split_rows_by("r1") %>%
+ analyze("x", foo)
+
+tbl4 <- build_table(lyt4, df)
+tbl4
# A B C
+# ————————————————————————————————
+# U
+# foo 17 x 6 13 x 6 10 x 6
+# V
+# foo 9 x 6 15 x 6 5 x 6
+# W
+# foo 14 x 6 6 x 6 11 x 6
+or if we would not want to see the foo
label we would
+have to use:
+lyt5 <- basic_table() %>%
+ split_cols_by("c1") %>%
+ split_rows_by("r1") %>%
+ summarize_row_groups(cfun = foo, format = "xx")
+
+tbl5 <- build_table(lyt5, df)
+tbl5
# A B C
+# ———————————————————————————
+# 17 x 6 13 x 6 10 x 6
+# 9 x 6 15 x 6 5 x 6
+# 14 x 6 6 x 6 11 x 6
+but now the row labels have disappeared. This is because
+cfun
needs to define its row label. So let’s redefine
+foo
:
+foo <- function(df, labelstr) {
+ rcell(paste(dim(df), collapse = " x "), format = "xx", label = labelstr)
+}
+
+lyt6 <- basic_table() %>%
+ split_cols_by("c1") %>%
+ split_rows_by("r1") %>%
+ summarize_row_groups(cfun = foo)
+
+tbl6 <- build_table(lyt6, df)
+tbl6
# A B C
+# ————————————————————————————
+# U 17 x 6 13 x 6 10 x 6
+# V 9 x 6 15 x 6 5 x 6
+# W 14 x 6 6 x 6 11 x 6
+Now let’s calculate the mean of df$y
for pattern I:
+foo <- function(df, labelstr) {
+ rcell(mean(df$y), label = labelstr, format = "xx.xx")
+}
+
+lyt7 <- basic_table() %>%
+ split_cols_by("c1") %>%
+ split_rows_by("r1") %>%
+ summarize_row_groups(cfun = foo)
+
+tbl7 <- build_table(lyt7, df)
+tbl7
# A B C
+# —————————————————————————
+# U -1.00 1.00 3.00
+# V -4.00 -2.00 0.00
+# W -7.00 -5.00 -3.00
+Note that foo
has the variable information hard-encoded
+in the function body. Let’s try some alternatives returning to
+analyze()
:
+lyt8 <- basic_table() %>%
+ split_cols_by("c1") %>%
+ split_rows_by("r1") %>%
+ analyze("y", afun = mean)
+
+tbl8 <- build_table(lyt8, df)
+tbl8
# A B C
+# —————————————————————
+# U
+# mean -1 1 3
+# V
+# mean -4 -2 0
+# W
+# mean -7 -5 -3
+Note that the subset of the y
variable is passed as the
+x
argument to mean()
. We could also get the
+data.frame
instead of the variable:
+lyt9 <- basic_table() %>%
+ split_cols_by("c1") %>%
+ split_rows_by("r1") %>%
+ analyze("y", afun = function(df) mean(df$y))
+
+tbl9 <- build_table(lyt9, df)
+tbl9
# A B C
+# ——————————————————
+# U
+# y -1 1 3
+# V
+# y -4 -2 0
+# W
+# y -7 -5 -3
+which is in contrast to:
+
+lyt10 <- basic_table() %>%
+ split_cols_by("c1") %>%
+ split_rows_by("r1") %>%
+ analyze("y", afun = function(x) mean(x))
+
+tbl10 <- build_table(lyt10, df)
+tbl10
# A B C
+# ——————————————————
+# U
+# y -1 1 3
+# V
+# y -4 -2 0
+# W
+# y -7 -5 -3
+where the function receives the subset of y
.
Pattern I is an interesting one as we can add more row structure +(with further splits). Consider the following table:
+ A B C
+--------------------------------------
+U
+ u1 foo(<>) foo(<>) foo(<>)
+ u2 foo(<>) foo(<>) foo(<>)
+ u3 foo(<>) foo(<>) foo(<>)
+V
+ v1 foo(<>) foo(<>) foo(<>)
+ v2 foo(<>) foo(<>) foo(<>)
+ v3 foo(<>) foo(<>) foo(<>)
+W
+ w1 foo(<>) foo(<>) foo(<>)
+ w2 foo(<>) foo(<>) foo(<>)
+ w3 foo(<>) foo(<>) foo(<>)
+where <>
represents the data that is represented
+by the cell. So for the cell U > u1, A
we would have the
+subset:
# # A tibble: 2 × 6
+# c1 c2 r1 r2 x y
+# <fct> <chr> <fct> <chr> <dbl> <dbl>
+# 1 A a2 U u1 1.12 -1
+# 2 A a1 U u1 0.594 -1
+and so on. We can get this table as follows:
+
+lyt11 <- basic_table() %>%
+ split_cols_by("c1") %>%
+ split_rows_by("r1") %>%
+ split_rows_by("r2") %>%
+ summarize_row_groups(cfun = function(df, labelstr) {
+ rcell(mean(df$x), format = "xx.xx", label = paste("mean x for", labelstr))
+ })
+
+tbl11 <- build_table(lyt11, df)
+tbl11
# A B C
+# ———————————————————————————————————————
+# U
+# mean x for u3 -0.04 0.36 -0.25
+# mean x for u1 0.86 0.32 NA
+# mean x for u2 -0.28 0.38 0.08
+# V
+# mean x for v2 0.01 0.55 0.60
+# mean x for v3 -0.03 -0.30 1.06
+# mean x for v1 0.56 -0.27 -0.54
+# W
+# mean x for w1 -0.58 0.42 0.67
+# mean x for w3 0.56 0.69 -0.39
+# mean x for w2 -1.99 -0.10 0.53
+or, if we wanted to calculate two summaries per row split:
+
+s_mean_sd <- function(x) {
+ in_rows("mean (sd)" = rcell(c(mean(x), sd(x)), format = "xx.xx (xx.xx)"))
+}
+
+s_range <- function(x) {
+ in_rows("range" = rcell(range(x), format = "xx.xx - xx.xx"))
+}
+
+lyt12 <- basic_table() %>%
+ split_cols_by("c1") %>%
+ split_rows_by("r1") %>%
+ split_rows_by("r2") %>%
+ analyze("x", s_mean_sd, show_labels = "hidden") %>%
+ analyze("x", s_range, show_labels = "hidden")
+
+tbl12 <- build_table(lyt12, df)
# Warning: Non-unique sibling analysis table names. Using Labels instead. Use the table_names argument to analyze to avoid this when analyzing the same variable multiple times.
+# occured at (row) path: r1[U]->r2[u3]
+# Warning in min(x): no non-missing arguments to min; returning Inf
+# Warning in max(x): no non-missing arguments to max; returning -Inf
+# Warning: Non-unique sibling analysis table names. Using Labels instead. Use the table_names argument to analyze to avoid this when analyzing the same variable multiple times.
+# occured at (row) path: r1[U]->r2[u1]
+# Warning: Non-unique sibling analysis table names. Using Labels instead. Use the table_names argument to analyze to avoid this when analyzing the same variable multiple times.
+# occured at (row) path: r1[U]->r2[u2]
+# Warning: Non-unique sibling analysis table names. Using Labels instead. Use the table_names argument to analyze to avoid this when analyzing the same variable multiple times.
+# occured at (row) path: r1[V]->r2[v2]
+# Warning: Non-unique sibling analysis table names. Using Labels instead. Use the table_names argument to analyze to avoid this when analyzing the same variable multiple times.
+# occured at (row) path: r1[V]->r2[v3]
+# Warning: Non-unique sibling analysis table names. Using Labels instead. Use the table_names argument to analyze to avoid this when analyzing the same variable multiple times.
+# occured at (row) path: r1[V]->r2[v1]
+# Warning: Non-unique sibling analysis table names. Using Labels instead. Use the table_names argument to analyze to avoid this when analyzing the same variable multiple times.
+# occured at (row) path: r1[W]->r2[w1]
+# Warning: Non-unique sibling analysis table names. Using Labels instead. Use the table_names argument to analyze to avoid this when analyzing the same variable multiple times.
+# occured at (row) path: r1[W]->r2[w3]
+# Warning: Non-unique sibling analysis table names. Using Labels instead. Use the table_names argument to analyze to avoid this when analyzing the same variable multiple times.
+# occured at (row) path: r1[W]->r2[w2]
+
+tbl12
# A B C
+# ———————————————————————————————————————————————————————————
+# U
+# u3
+# mean (sd) -0.04 (1.18) 0.36 (1.41) -0.25 (0.72)
+# range -1.80 - 1.47 -1.28 - 2.40 -0.82 - 0.56
+# u1
+# mean (sd) 0.86 (0.38) 0.32 (0.51) NA
+# range 0.59 - 1.12 -0.48 - 0.94 Inf - -Inf
+# u2
+# mean (sd) -0.28 (0.96) 0.38 (0.67) 0.08 (0.91)
+# range -1.52 - 1.43 -0.39 - 0.82 -0.93 - 1.51
+# V
+# v2
+# mean (sd) 0.01 (0.25) 0.55 (1.14) 0.60 (0.03)
+# range -0.16 - 0.18 -0.84 - 1.60 0.58 - 0.62
+# v3
+# mean (sd) -0.03 (0.37) -0.30 (0.36) 1.06 (NA)
+# range -0.41 - 0.33 -0.62 - 0.03 1.06 - 1.06
+# v1
+# mean (sd) 0.56 (1.10) -0.27 (0.73) -0.54 (1.18)
+# range -0.16 - 2.17 -1.22 - 0.59 -1.38 - 0.29
+# W
+# w1
+# mean (sd) -0.58 (0.85) 0.42 (NA) 0.67 (0.39)
+# range -1.25 - 0.61 0.42 - 0.42 0.37 - 1.21
+# w3
+# mean (sd) 0.56 (0.85) 0.69 (NA) -0.39 (1.68)
+# range -0.71 - 1.98 0.69 - 0.69 -2.21 - 1.10
+# w2
+# mean (sd) -1.99 (NA) -0.10 (0.47) 0.53 (0.60)
+# range -1.99 - -1.99 -0.61 - 0.39 -0.10 - 1.16
+Which has the following structure:
+ A B C
+---------------------------------------------------------
+U
+ u1
+ mean_sd s_mean_sd(<>) s_mean_sd(<>) s_mean_sd(<>)
+ range s_range(<>) s_range(<>) s_range(<>)
+ u2
+ mean_sd s_mean_sd(<>) s_mean_sd(<>) s_mean_sd(<>)
+ range s_range(<>) s_range(<>) s_range(<>)
+ u3
+ mean_sd s_mean_sd(<>) s_mean_sd(<>) s_mean_sd(<>)
+ range s_range(<>) s_range(<>) s_range(<>)
+V
+ v1
+ mean_sd s_mean_sd(<>) s_mean_sd(<>) s_mean_sd(<>)
+ range s_range(<>) s_range(<>) s_range(<>)
+ v2
+ mean_sd s_mean_sd(<>) s_mean_sd(<>) s_mean_sd(<>)
+ range s_range(<>) s_range(<>) s_range(<>)
+ v3
+ mean_sd s_mean_sd(<>) s_mean_sd(<>) s_mean_sd(<>)
+ range s_range(<>) s_range(<>) s_range(<>)
+W
+ w1
+ mean_sd s_mean_sd(<>) s_mean_sd(<>) s_mean_sd(<>)
+ range s_range(<>) s_range(<>) s_range(<>)
+ w2
+ mean_sd s_mean_sd(<>) s_mean_sd(<>) s_mean_sd(<>)
+ range s_range(<>) s_range(<>) s_range(<>)
+ w3
+ mean_sd s_mean_sd(<>) s_mean_sd(<>) s_mean_sd(<>)
+ range s_range(<>) s_range(<>) s_range(<>)
+The rows U
, u1
, u2
, …,
+W
, w1
, w2
, w3
are
+label rows and the other rows (with mean_sd
and
+range
) are data rows. Currently we do not have content rows
+in the table. Content rows summarize the data defined by their splitting
+(i.e. V > v1, B
). So if we wanted to add content rows at
+the r2
split level then we would get:
A B C
+---------------------------------------------------------
+U
+ u1 s_cfun_2(<>) s_cfun_2(<>) s_cfun_2(<>)
+ mean_sd s_mean_sd(<>) s_mean_sd(<>) s_mean_sd(<>)
+ range s_range(<>) s_range(<>) s_range(<>)
+ u2 s_cfun_2(<>) s_cfun_2(<>) s_cfun_2(<>)
+ mean_sd s_mean_sd(<>) s_mean_sd(<>) s_mean_sd(<>)
+ range s_range(<>) s_range(<>) s_range(<>)
+ u3 s_cfun_2(<>) s_cfun_2(<>) s_cfun_2(<>)
+ mean_sd s_mean_sd(<>) s_mean_sd(<>) s_mean_sd(<>)
+ range s_range(<>) s_range(<>) s_range(<>)
+V
+ v1 s_cfun_2(<>) s_cfun_2(<>) s_cfun_2(<>)
+ mean_sd s_mean_sd(<>) s_mean_sd(<>) s_mean_sd(<>)
+ range s_range(<>) s_range(<>) s_range(<>)
+ v2 s_cfun_2(<>) s_cfun_2(<>) s_cfun_2(<>)
+ mean_sd s_mean_sd(<>) s_mean_sd(<>) s_mean_sd(<>)
+ range s_range(<>) s_range(<>) s_range(<>)
+ v3 s_cfun_2(<>) s_cfun_2(<>) s_cfun_2(<>)
+ mean_sd s_mean_sd(<>) s_mean_sd(<>) s_mean_sd(<>)
+ range s_range(<>) s_range(<>) s_range(<>)
+W
+ w1 s_cfun_2(<>) s_cfun_2(<>) s_cfun_2(<>)
+ mean_sd s_mean_sd(<>) s_mean_sd(<>) s_mean_sd(<>)
+ range s_range(<>) s_range(<>) s_range(<>)
+ w2 s_cfun_2(<>) s_cfun_2(<>) s_cfun_2(<>)
+ mean_sd s_mean_sd(<>) s_mean_sd(<>) s_mean_sd(<>)
+ range s_range(<>) s_range(<>) s_range(<>)
+ w3 s_cfun_2(<>) s_cfun_2(<>) s_cfun_2(<>)
+ mean_sd s_mean_sd(<>) s_mean_sd(<>) s_mean_sd(<>)
+ range s_range(<>) s_range(<>) s_range(<>)
+where s_cfun_2
is the content function and either
+returns one row via rcell()
or multiple rows via
+in_rows()
. The data represented by <>
+for the content rows is same data as for it’s descendant, i.e. for the
+U > u1, A
content row cell it is
+df %>% filter(r1 == "U", r2 == "u1", c1 == "A")
. Note
+that content functions cfun
operate only on data frames and
+not on vectors/variables so they must take the df
argument.
+Further, a cfun
must also have the labelstr
+argument which is the split level. This way, the cfun
can
+define its own row name. In order to get the table above we can use the
+layout framework as follows:
+s_mean_sd <- function(x) {
+ in_rows("mean (sd)" = rcell(c(mean(x), sd(x)), format = "xx.xx (xx.xx)"))
+}
+
+s_range <- function(x) {
+ in_rows("range" = rcell(range(x), format = "xx.xx - xx.xx"))
+}
+
+s_cfun_2 <- function(df, labelstr) {
+ rcell(nrow(df), format = "xx", label = paste(labelstr, "(n)"))
+}
+
+lyt13 <- basic_table() %>%
+ split_cols_by("c1") %>%
+ split_rows_by("r1") %>%
+ split_rows_by("r2") %>%
+ summarize_row_groups(cfun = s_cfun_2) %>%
+ analyze("x", s_mean_sd, show_labels = "hidden") %>%
+ analyze("x", s_range, show_labels = "hidden")
+
+tbl13 <- build_table(lyt13, df)
# Warning: Non-unique sibling analysis table names. Using Labels instead. Use the table_names argument to analyze to avoid this when analyzing the same variable multiple times.
+# occured at (row) path: r1[U]->r2[u3]
+# Warning in min(x): no non-missing arguments to min; returning Inf
+# Warning in max(x): no non-missing arguments to max; returning -Inf
+# Warning: Non-unique sibling analysis table names. Using Labels instead. Use the table_names argument to analyze to avoid this when analyzing the same variable multiple times.
+# occured at (row) path: r1[U]->r2[u1]
+# Warning: Non-unique sibling analysis table names. Using Labels instead. Use the table_names argument to analyze to avoid this when analyzing the same variable multiple times.
+# occured at (row) path: r1[U]->r2[u2]
+# Warning: Non-unique sibling analysis table names. Using Labels instead. Use the table_names argument to analyze to avoid this when analyzing the same variable multiple times.
+# occured at (row) path: r1[V]->r2[v2]
+# Warning: Non-unique sibling analysis table names. Using Labels instead. Use the table_names argument to analyze to avoid this when analyzing the same variable multiple times.
+# occured at (row) path: r1[V]->r2[v3]
+# Warning: Non-unique sibling analysis table names. Using Labels instead. Use the table_names argument to analyze to avoid this when analyzing the same variable multiple times.
+# occured at (row) path: r1[V]->r2[v1]
+# Warning: Non-unique sibling analysis table names. Using Labels instead. Use the table_names argument to analyze to avoid this when analyzing the same variable multiple times.
+# occured at (row) path: r1[W]->r2[w1]
+# Warning: Non-unique sibling analysis table names. Using Labels instead. Use the table_names argument to analyze to avoid this when analyzing the same variable multiple times.
+# occured at (row) path: r1[W]->r2[w3]
+# Warning: Non-unique sibling analysis table names. Using Labels instead. Use the table_names argument to analyze to avoid this when analyzing the same variable multiple times.
+# occured at (row) path: r1[W]->r2[w2]
+
+tbl13
# A B C
+# ———————————————————————————————————————————————————————————
+# U
+# u3 (n) 6 5 3
+# mean (sd) -0.04 (1.18) 0.36 (1.41) -0.25 (0.72)
+# range -1.80 - 1.47 -1.28 - 2.40 -0.82 - 0.56
+# u1 (n) 2 5 0
+# mean (sd) 0.86 (0.38) 0.32 (0.51) NA
+# range 0.59 - 1.12 -0.48 - 0.94 Inf - -Inf
+# u2 (n) 9 3 7
+# mean (sd) -0.28 (0.96) 0.38 (0.67) 0.08 (0.91)
+# range -1.52 - 1.43 -0.39 - 0.82 -0.93 - 1.51
+# V
+# v2 (n) 2 4 2
+# mean (sd) 0.01 (0.25) 0.55 (1.14) 0.60 (0.03)
+# range -0.16 - 0.18 -0.84 - 1.60 0.58 - 0.62
+# v3 (n) 3 4 1
+# mean (sd) -0.03 (0.37) -0.30 (0.36) 1.06 (NA)
+# range -0.41 - 0.33 -0.62 - 0.03 1.06 - 1.06
+# v1 (n) 4 7 2
+# mean (sd) 0.56 (1.10) -0.27 (0.73) -0.54 (1.18)
+# range -0.16 - 2.17 -1.22 - 0.59 -1.38 - 0.29
+# W
+# w1 (n) 4 1 4
+# mean (sd) -0.58 (0.85) 0.42 (NA) 0.67 (0.39)
+# range -1.25 - 0.61 0.42 - 0.42 0.37 - 1.21
+# w3 (n) 9 1 3
+# mean (sd) 0.56 (0.85) 0.69 (NA) -0.39 (1.68)
+# range -0.71 - 1.98 0.69 - 0.69 -2.21 - 1.10
+# w2 (n) 1 4 4
+# mean (sd) -1.99 (NA) -0.10 (0.47) 0.53 (0.60)
+# range -1.99 - -1.99 -0.61 - 0.39 -0.10 - 1.16
+In the same manner, if we want content rows for the r1
+split we can do it at as follows:
+lyt14 <- basic_table() %>%
+ split_cols_by("c1") %>%
+ split_rows_by("r1") %>%
+ summarize_row_groups(cfun = s_cfun_2) %>%
+ split_rows_by("r2") %>%
+ summarize_row_groups(cfun = s_cfun_2) %>%
+ analyze("x", s_mean_sd, show_labels = "hidden") %>%
+ analyze("x", s_range, show_labels = "hidden")
+
+tbl14 <- build_table(lyt14, df)
# Warning: Non-unique sibling analysis table names. Using Labels instead. Use the table_names argument to analyze to avoid this when analyzing the same variable multiple times.
+# occured at (row) path: r1[U]->r2[u3]
+# Warning in min(x): no non-missing arguments to min; returning Inf
+# Warning in max(x): no non-missing arguments to max; returning -Inf
+# Warning: Non-unique sibling analysis table names. Using Labels instead. Use the table_names argument to analyze to avoid this when analyzing the same variable multiple times.
+# occured at (row) path: r1[U]->r2[u1]
+# Warning: Non-unique sibling analysis table names. Using Labels instead. Use the table_names argument to analyze to avoid this when analyzing the same variable multiple times.
+# occured at (row) path: r1[U]->r2[u2]
+# Warning: Non-unique sibling analysis table names. Using Labels instead. Use the table_names argument to analyze to avoid this when analyzing the same variable multiple times.
+# occured at (row) path: r1[V]->r2[v2]
+# Warning: Non-unique sibling analysis table names. Using Labels instead. Use the table_names argument to analyze to avoid this when analyzing the same variable multiple times.
+# occured at (row) path: r1[V]->r2[v3]
+# Warning: Non-unique sibling analysis table names. Using Labels instead. Use the table_names argument to analyze to avoid this when analyzing the same variable multiple times.
+# occured at (row) path: r1[V]->r2[v1]
+# Warning: Non-unique sibling analysis table names. Using Labels instead. Use the table_names argument to analyze to avoid this when analyzing the same variable multiple times.
+# occured at (row) path: r1[W]->r2[w1]
+# Warning: Non-unique sibling analysis table names. Using Labels instead. Use the table_names argument to analyze to avoid this when analyzing the same variable multiple times.
+# occured at (row) path: r1[W]->r2[w3]
+# Warning: Non-unique sibling analysis table names. Using Labels instead. Use the table_names argument to analyze to avoid this when analyzing the same variable multiple times.
+# occured at (row) path: r1[W]->r2[w2]
+
+tbl14
# A B C
+# ———————————————————————————————————————————————————————————
+# U (n) 17 13 10
+# u3 (n) 6 5 3
+# mean (sd) -0.04 (1.18) 0.36 (1.41) -0.25 (0.72)
+# range -1.80 - 1.47 -1.28 - 2.40 -0.82 - 0.56
+# u1 (n) 2 5 0
+# mean (sd) 0.86 (0.38) 0.32 (0.51) NA
+# range 0.59 - 1.12 -0.48 - 0.94 Inf - -Inf
+# u2 (n) 9 3 7
+# mean (sd) -0.28 (0.96) 0.38 (0.67) 0.08 (0.91)
+# range -1.52 - 1.43 -0.39 - 0.82 -0.93 - 1.51
+# V (n) 9 15 5
+# v2 (n) 2 4 2
+# mean (sd) 0.01 (0.25) 0.55 (1.14) 0.60 (0.03)
+# range -0.16 - 0.18 -0.84 - 1.60 0.58 - 0.62
+# v3 (n) 3 4 1
+# mean (sd) -0.03 (0.37) -0.30 (0.36) 1.06 (NA)
+# range -0.41 - 0.33 -0.62 - 0.03 1.06 - 1.06
+# v1 (n) 4 7 2
+# mean (sd) 0.56 (1.10) -0.27 (0.73) -0.54 (1.18)
+# range -0.16 - 2.17 -1.22 - 0.59 -1.38 - 0.29
+# W (n) 14 6 11
+# w1 (n) 4 1 4
+# mean (sd) -0.58 (0.85) 0.42 (NA) 0.67 (0.39)
+# range -1.25 - 0.61 0.42 - 0.42 0.37 - 1.21
+# w3 (n) 9 1 3
+# mean (sd) 0.56 (0.85) 0.69 (NA) -0.39 (1.68)
+# range -0.71 - 1.98 0.69 - 0.69 -2.21 - 1.10
+# w2 (n) 1 4 4
+# mean (sd) -1.99 (NA) -0.10 (0.47) 0.53 (0.60)
+# range -1.99 - -1.99 -0.61 - 0.39 -0.10 - 1.16
+In pagination, content rows and label rows get repeated if a page is
+split in a descendant of a content row. So, for example, if we were to
+split the following table at ***
:
A B C
+---------------------------------------------------------
+U
+ u1 (n) s_cfun_2(<>) s_cfun_2(<>) s_cfun_2(<>)
+ mean_sd s_mean_sd(<>) s_mean_sd(<>) s_mean_sd(<>)
+***
+ range s_range(<>) s_range(<>) s_range(<>)
+ u2 (n) s_cfun_2(<>) s_cfun_2(<>) s_cfun_2(<>)
+ mean_sd s_mean_sd(<>) s_mean_sd(<>) s_mean_sd(<>)
+ range s_range(<>) s_range(<>) s_range(<>)
+Then we would get the following two tables:
+ A B C
+---------------------------------------------------------
+U
+ u1 (n) s_cfun_2(<>) s_cfun_2(<>) s_cfun_2(<>)
+ mean_sd s_mean_sd(<>) s_mean_sd(<>) s_mean_sd(<>)
+and
+ A B C
+---------------------------------------------------------
+U
+ u1 (n) s_cfun_2(<>) s_cfun_2(<>) s_cfun_2(<>)
+ range s_range(<>) s_range(<>) s_range(<>)
+ u2 (n) s_cfun_2(<>) s_cfun_2(<>) s_cfun_2(<>)
+ mean_sd s_mean_sd(<>) s_mean_sd(<>) s_mean_sd(<>)
+ range s_range(<>) s_range(<>) s_range(<>)
+vignettes/tabulation_dplyr.Rmd
+ tabulation_dplyr.Rmd
In this vignette, we would like to discuss the similarities and
+differences between dplyr
and rtable
.
Much of the rtables
framework focuses on
+tabulation/summarizing of data and then the visualization of the table.
+In this vignette, we focus on summarizing data using dplyr
+and contrast it to rtables
. We won’t pay attention to the
+table visualization/markup and just derive the cell content.
Using dplyr
to summarize data and gt
to
+visualize the table is a good way if the tabulation is of a certain
+nature or complexity. However, there are tables such as the table
+created in the introduction
+vignette that take some effort to create with dplyr
. Part
+of the effort is due to fact that when using dplyr
the
+table data is stored in data.frame
s or tibble
s
+which is not the most natural way to represent a table as we will show
+in this vignette.
If you know a more elegant way of deriving the table content with
+dplyr
please let us know and we will update the
+vignette.
Here is the table and data used in the introduction
+vignette:
+n <- 400
+
+set.seed(1)
+
+df <- tibble(
+ arm = factor(sample(c("Arm A", "Arm B"), n, replace = TRUE), levels = c("Arm A", "Arm B")),
+ country = factor(sample(c("CAN", "USA"), n, replace = TRUE, prob = c(.55, .45)), levels = c("CAN", "USA")),
+ gender = factor(sample(c("Female", "Male"), n, replace = TRUE), levels = c("Female", "Male")),
+ handed = factor(sample(c("Left", "Right"), n, prob = c(.6, .4), replace = TRUE), levels = c("Left", "Right")),
+ age = rchisq(n, 30) + 10
+) %>% mutate(
+ weight = 35 * rnorm(n, sd = .5) + ifelse(gender == "Female", 140, 180)
+)
+
+lyt <- basic_table(show_colcounts = TRUE) %>%
+ split_cols_by("arm") %>%
+ split_cols_by("gender") %>%
+ split_rows_by("country") %>%
+ summarize_row_groups() %>%
+ split_rows_by("handed") %>%
+ summarize_row_groups() %>%
+ analyze("age", afun = mean, format = "xx.x")
+
+tbl <- build_table(lyt, df)
+tbl
# Arm A Arm B
+# Female Male Female Male
+# (N=96) (N=105) (N=92) (N=107)
+# ————————————————————————————————————————————————————————————
+# CAN 45 (46.9%) 64 (61.0%) 46 (50.0%) 62 (57.9%)
+# Left 32 (33.3%) 42 (40.0%) 26 (28.3%) 37 (34.6%)
+# mean 38.9 40.4 40.3 37.7
+# Right 13 (13.5%) 22 (21.0%) 20 (21.7%) 25 (23.4%)
+# mean 36.6 40.2 40.2 40.6
+# USA 51 (53.1%) 41 (39.0%) 46 (50.0%) 45 (42.1%)
+# Left 34 (35.4%) 19 (18.1%) 25 (27.2%) 25 (23.4%)
+# mean 40.4 39.7 39.2 40.1
+# Right 17 (17.7%) 22 (21.0%) 21 (22.8%) 20 (18.7%)
+# mean 36.9 39.8 38.5 39.0
+We will start by deriving the first data cell on row 3 (note, row 1
+and 2 have content cells, see the introduction
+vignette). Cell 3,1 contains the mean age for left handed & female
+Canadians in “Arm A”:
+mean(df$age[df$country == "CAN" & df$arm == "Arm A" & df$gender == "Female" & df$handed == "Left"])
# [1] 38.86979
+or with dplyr
:
+df %>%
+ filter(country == "CAN", arm == "Arm A", gender == "Female", handed == "Left") %>%
+ summarise(mean_age = mean(age))
# # A tibble: 1 × 1
+# mean_age
+# <dbl>
+# 1 38.9
+Further, dplyr
gives us other verbs to easily get the
+average age of left handed Canadians for each group defined by the 4
+columns:
+df %>%
+ group_by(arm, gender) %>%
+ filter(country == "CAN", handed == "Left") %>%
+ summarise(mean_age = mean(age))
# `summarise()` has grouped output by 'arm'. You can override using the `.groups`
+# argument.
+# # A tibble: 4 × 3
+# # Groups: arm [2]
+# arm gender mean_age
+# <fct> <fct> <dbl>
+# 1 Arm A Female 38.9
+# 2 Arm A Male 40.4
+# 3 Arm B Female 40.3
+# 4 Arm B Male 37.7
+We can further get to all the average age cell values with:
+
+average_age <- df %>%
+ group_by(arm, gender, country, handed) %>%
+ summarise(mean_age = mean(age))
# `summarise()` has grouped output by 'arm', 'gender', 'country'. You can
+# override using the `.groups` argument.
+
+average_age
# # A tibble: 16 × 5
+# # Groups: arm, gender, country [8]
+# arm gender country handed mean_age
+# <fct> <fct> <fct> <fct> <dbl>
+# 1 Arm A Female CAN Left 38.9
+# 2 Arm A Female CAN Right 36.6
+# 3 Arm A Female USA Left 40.4
+# 4 Arm A Female USA Right 36.9
+# 5 Arm A Male CAN Left 40.4
+# 6 Arm A Male CAN Right 40.2
+# 7 Arm A Male USA Left 39.7
+# 8 Arm A Male USA Right 39.8
+# 9 Arm B Female CAN Left 40.3
+# 10 Arm B Female CAN Right 40.2
+# 11 Arm B Female USA Left 39.2
+# 12 Arm B Female USA Right 38.5
+# 13 Arm B Male CAN Left 37.7
+# 14 Arm B Male CAN Right 40.6
+# 15 Arm B Male USA Left 40.1
+# 16 Arm B Male USA Right 39.0
+In rtable
syntax, we need the following code to get to
+the same content:
+lyt <- basic_table() %>%
+ split_cols_by("arm") %>%
+ split_cols_by("gender") %>%
+ split_rows_by("country") %>%
+ split_rows_by("handed") %>%
+ analyze("age", afun = mean, format = "xx.x")
+
+tbl <- build_table(lyt, df)
+tbl
# Arm A Arm B
+# Female Male Female Male
+# ————————————————————————————————————————
+# CAN
+# Left
+# mean 38.9 40.4 40.3 37.7
+# Right
+# mean 36.6 40.2 40.2 40.6
+# USA
+# Left
+# mean 40.4 39.7 39.2 40.1
+# Right
+# mean 36.9 39.8 38.5 39.0
+As mentioned in the introduction to this vignette, please ignore the
+difference in arranging and formatting the data: it’s possible to
+condense the rtable
more and it is possible to make the
+tibble
look more like the reference table using the
+gt
R package.
In terms of tabulation for this example there was arguably not much
+added by rtables
over dplyr
.
Unlike in rtables
the different levels of summarization
+are discrete computations in dplyr
which we will then need
+to combine
We first focus on the count and percentage information for handedness +within each country (for each arm-gender pair), along with the analysis +row mean values:
+
+c_h_df <- df %>%
+ group_by(arm, gender, country, handed) %>%
+ summarize(mean = mean(age), c_h_count = n()) %>%
+ ## we need the sum below to *not* be by country, so that we're dividing by the column counts
+ ungroup(country) %>%
+ # now the `handed` grouping has been removed, therefore we can calculate percent now:
+ mutate(n_col = sum(c_h_count), c_h_percent = c_h_count / n_col)
# `summarise()` has grouped output by 'arm', 'gender', 'country'. You can
+# override using the `.groups` argument.
+
+c_h_df
# # A tibble: 16 × 8
+# # Groups: arm, gender [4]
+# arm gender country handed mean c_h_count n_col c_h_percent
+# <fct> <fct> <fct> <fct> <dbl> <int> <int> <dbl>
+# 1 Arm A Female CAN Left 38.9 32 96 0.333
+# 2 Arm A Female CAN Right 36.6 13 96 0.135
+# 3 Arm A Female USA Left 40.4 34 96 0.354
+# 4 Arm A Female USA Right 36.9 17 96 0.177
+# 5 Arm A Male CAN Left 40.4 42 105 0.4
+# 6 Arm A Male CAN Right 40.2 22 105 0.210
+# 7 Arm A Male USA Left 39.7 19 105 0.181
+# 8 Arm A Male USA Right 39.8 22 105 0.210
+# 9 Arm B Female CAN Left 40.3 26 92 0.283
+# 10 Arm B Female CAN Right 40.2 20 92 0.217
+# 11 Arm B Female USA Left 39.2 25 92 0.272
+# 12 Arm B Female USA Right 38.5 21 92 0.228
+# 13 Arm B Male CAN Left 37.7 37 107 0.346
+# 14 Arm B Male CAN Right 40.6 25 107 0.234
+# 15 Arm B Male USA Left 40.1 25 107 0.234
+# 16 Arm B Male USA Right 39.0 20 107 0.187
+which has 16 rows (cells) like the average_age
data
+frame defined above. Next, we will derive the group information for
+countries:
+c_df <- df %>%
+ group_by(arm, gender, country) %>%
+ summarize(c_count = n()) %>%
+ # now the `handed` grouping has been removed, therefore we can calculate percent now:
+ mutate(n_col = sum(c_count), c_percent = c_count / n_col)
# `summarise()` has grouped output by 'arm', 'gender'. You can override using the
+# `.groups` argument.
+
+c_df
# # A tibble: 8 × 6
+# # Groups: arm, gender [4]
+# arm gender country c_count n_col c_percent
+# <fct> <fct> <fct> <int> <int> <dbl>
+# 1 Arm A Female CAN 45 96 0.469
+# 2 Arm A Female USA 51 96 0.531
+# 3 Arm A Male CAN 64 105 0.610
+# 4 Arm A Male USA 41 105 0.390
+# 5 Arm B Female CAN 46 92 0.5
+# 6 Arm B Female USA 46 92 0.5
+# 7 Arm B Male CAN 62 107 0.579
+# 8 Arm B Male USA 45 107 0.421
+Finally, we left_join()
the two levels of summary to get
+a data.frame containing the full set of values which make up the body of
+our table (note, however, they are not in the same order):
# Joining with `by = join_by(arm, gender, country, n_col)`
+Alternatively, we could calculate only the counts in
+c_h_df
, and use mutate()
after the
+left_join()
to divide the counts by the n_col
+values which are more naturally calculated within c_df
.
+This would simplify c_h_df
’s creation somewhat by not
+requiring the explicit ungroup()
, but it prevents each
+level of summarization from being a self-contained set of
+computations.
The rtables
call in contrast is:
+lyt <- basic_table(show_colcounts = TRUE) %>%
+ split_cols_by("arm") %>%
+ split_cols_by("gender") %>%
+ split_rows_by("country") %>%
+ summarize_row_groups() %>%
+ split_rows_by("handed") %>%
+ summarize_row_groups() %>%
+ analyze("age", afun = mean, format = "xx.x")
+
+tbl <- build_table(lyt, df)
+tbl
# Arm A Arm B
+# Female Male Female Male
+# (N=96) (N=105) (N=92) (N=107)
+# ————————————————————————————————————————————————————————————
+# CAN 45 (46.9%) 64 (61.0%) 46 (50.0%) 62 (57.9%)
+# Left 32 (33.3%) 42 (40.0%) 26 (28.3%) 37 (34.6%)
+# mean 38.9 40.4 40.3 37.7
+# Right 13 (13.5%) 22 (21.0%) 20 (21.7%) 25 (23.4%)
+# mean 36.6 40.2 40.2 40.6
+# USA 51 (53.1%) 41 (39.0%) 46 (50.0%) 45 (42.1%)
+# Left 34 (35.4%) 19 (18.1%) 25 (27.2%) 25 (23.4%)
+# mean 40.4 39.7 39.2 40.1
+# Right 17 (17.7%) 22 (21.0%) 21 (22.8%) 20 (18.7%)
+# mean 36.9 39.8 38.5 39.0
+We can now spot check that the values are the same
+
+frm_rtables_h <- cell_values(
+ tbl,
+ rowpath = c("country", "CAN", "handed", "Right", "@content"),
+ colpath = c("arm", "Arm B", "gender", "Female")
+)[[1]]
+frm_rtables_h
# [1] 20.0000000 0.2173913
+
+frm_dplyr_h <- full_dplyr %>%
+ filter(country == "CAN" & handed == "Right" & arm == "Arm B" & gender == "Female") %>%
+ select(c_h_count, c_h_percent)
+
+frm_dplyr_h
# # A tibble: 1 × 2
+# c_h_count c_h_percent
+# <int> <dbl>
+# 1 20 0.217
+
+frm_rtables_c <- cell_values(
+ tbl,
+ rowpath = c("country", "CAN", "@content"),
+ colpath = c("arm", "Arm A", "gender", "Male")
+)[[1]]
+
+frm_rtables_c
# [1] 64.0000000 0.6095238
+
+frm_dplyr_c <- full_dplyr %>%
+ filter(country == "CAN" & arm == "Arm A" & gender == "Male") %>%
+ select(c_count, c_percent)
+
+frm_dplyr_c
# # A tibble: 2 × 2
+# c_count c_percent
+# <int> <dbl>
+# 1 64 0.610
+# 2 64 0.610
+Further, the rtable
syntax has hopefully also become a
+bit more straightforward to derive the cell values than with
+dplyr
for this particular table.
In this vignette learned that:
+dplyr
and
+data.frame
or tibble
as data structure
+dplyr
keeps simple things simplertables
streamlines the construction of complex
+tablesWe recommend that you continue reading the clinical_trials
+vignette where we create a number of more advanced tables using
+layouts.
vignettes/title_footer.Rmd
+ title_footer.Rmd
An rtables
table can be annotated with three types of
+header (title) information, as well as three types of footer
+information.
Header information comes in two forms that are specified directly +(main title and subtitles), as well as one that is populated +automatically as necessary (page title, which we will see in the next +section).
+Similarly, footer materials come with two directly specified +components: main footer and provenance footer, in addition to one that +is computed when necessary: referential footnotes.
+basic_table()
accepts the values for each static title
+and footer element during layout construction:
+library(rtables)
+library(dplyr)
+lyt <- basic_table(
+ title = "Study XXXXXXXX",
+ subtitles = c("subtitle YYYYYYYYYY", "subtitle2 ZZZZZZZZZ"),
+ main_footer = "Analysis was done using cool methods that are correct",
+ prov_footer = "file: /path/to/stuff/that/lives/there HASH:1ac41b242a"
+) %>%
+ split_cols_by("ARM") %>%
+ split_rows_by("SEX", split_fun = drop_split_levels) %>%
+ split_rows_by("STRATA1") %>%
+ analyze("AGE", mean, format = "xx.x")
+
+tbl <- build_table(lyt, DM)
+cat(export_as_txt(tbl, paginate = TRUE, page_break = "\n\n\n"))
# Study XXXXXXXX
+# subtitle YYYYYYYYYY
+# subtitle2 ZZZZZZZZZ
+#
+# ——————————————————————————————————————————————————
+# A: Drug X B: Placebo C: Combination
+# ——————————————————————————————————————————————————
+# F
+# A
+# mean 30.9 32.9 36.0
+# B
+# mean 34.9 32.9 34.4
+# C
+# mean 35.2 36.0 34.3
+# M
+# A
+# mean 35.1 31.1 35.6
+# B
+# mean 36.6 32.1 34.4
+# C
+# mean 37.4 32.8 32.8
+# ——————————————————————————————————————————————————
+#
+# Analysis was done using cool methods that are correct
+#
+# file: /path/to/stuff/that/lives/there HASH:1ac41b242a
+We often want to split tables based on the values of one or more
+variables (e.g., lab measurement) and then paginate separately
+within each of those table subsections. In rtables
we
+do this via page by row splits.
Row splits can be declared page by splits by setting
+page_by = TRUE
in the split_rows_by*()
call,
+as below.
When page by splits are present, page titles are generated
+automatically by appending the split value (typically a factor level,
+though it need not be), to the page_prefix
, separated by a
+:
. By default, page_prefix
is name of the
+variable being split.
+lyt2 <- basic_table(
+ title = "Study XXXXXXXX",
+ subtitles = c("subtitle YYYYYYYYYY", "subtitle2 ZZZZZZZZZ"),
+ main_footer = "Analysis was done using cool methods that are correct",
+ prov_footer = "file: /path/to/stuff/that/lives/there HASH:1ac41b242a"
+) %>%
+ split_cols_by("ARM") %>%
+ split_rows_by("SEX", page_by = TRUE, page_prefix = "Patient Subset - Gender", split_fun = drop_split_levels) %>%
+ split_rows_by("STRATA1") %>%
+ analyze("AGE", mean, format = "xx.x")
+
+tbl2 <- build_table(lyt2, DM)
+cat(export_as_txt(tbl2, paginate = TRUE, page_break = "\n\n~~~~ Page Break ~~~~\n\n"))
# Study XXXXXXXX
+# subtitle YYYYYYYYYY
+# subtitle2 ZZZZZZZZZ
+# Patient Subset - Gender: F
+#
+# ——————————————————————————————————————————————————
+# A: Drug X B: Placebo C: Combination
+# ——————————————————————————————————————————————————
+# A
+# mean 30.9 32.9 36.0
+# B
+# mean 34.9 32.9 34.4
+# C
+# mean 35.2 36.0 34.3
+# ——————————————————————————————————————————————————
+#
+# Analysis was done using cool methods that are correct
+#
+# file: /path/to/stuff/that/lives/there HASH:1ac41b242a
+#
+#
+# ~~~~ Page Break ~~~~
+#
+# Study XXXXXXXX
+# subtitle YYYYYYYYYY
+# subtitle2 ZZZZZZZZZ
+# Patient Subset - Gender: M
+#
+# ——————————————————————————————————————————————————
+# A: Drug X B: Placebo C: Combination
+# ——————————————————————————————————————————————————
+# A
+# mean 35.1 31.1 35.6
+# B
+# mean 36.6 32.1 34.4
+# C
+# mean 37.4 32.8 32.8
+# ——————————————————————————————————————————————————
+#
+# Analysis was done using cool methods that are correct
+#
+# file: /path/to/stuff/that/lives/there HASH:1ac41b242a
+Page by row splits can be nested, but only within other page_by +splits, they cannot be nested within traditional row splits. In this +case, a page title for each page by split will be present on every +resulting page, as seen below:
+
+lyt3 <- basic_table(
+ title = "Study XXXXXXXX",
+ subtitles = c("subtitle YYYYYYYYYY", "subtitle2 ZZZZZZZZZ"),
+ main_footer = "Analysis was done using cool methods that are correct",
+ prov_footer = "file: /path/to/stuff/that/lives/there HASH:1ac41b242a"
+) %>%
+ split_cols_by("ARM") %>%
+ split_rows_by("SEX", page_by = TRUE, page_prefix = "Patient Subset - Gender", split_fun = drop_split_levels) %>%
+ split_rows_by("STRATA1", page_by = TRUE, page_prefix = "Stratification - Strata") %>%
+ analyze("AGE", mean, format = "xx.x")
+
+tbl3 <- build_table(lyt3, DM)
+cat(export_as_txt(tbl3, paginate = TRUE, page_break = "\n\n~~~~ Page Break ~~~~\n\n"))
# Study XXXXXXXX
+# subtitle YYYYYYYYYY
+# subtitle2 ZZZZZZZZZ
+# Patient Subset - Gender: F
+# Stratification - Strata: A
+#
+# ——————————————————————————————————————————————————
+# A: Drug X B: Placebo C: Combination
+# ——————————————————————————————————————————————————
+# mean 30.9 32.9 36.0
+# ——————————————————————————————————————————————————
+#
+# Analysis was done using cool methods that are correct
+#
+# file: /path/to/stuff/that/lives/there HASH:1ac41b242a
+#
+#
+# ~~~~ Page Break ~~~~
+#
+# Study XXXXXXXX
+# subtitle YYYYYYYYYY
+# subtitle2 ZZZZZZZZZ
+# Patient Subset - Gender: F
+# Stratification - Strata: B
+#
+# ——————————————————————————————————————————————————
+# A: Drug X B: Placebo C: Combination
+# ——————————————————————————————————————————————————
+# mean 34.9 32.9 34.4
+# ——————————————————————————————————————————————————
+#
+# Analysis was done using cool methods that are correct
+#
+# file: /path/to/stuff/that/lives/there HASH:1ac41b242a
+#
+#
+# ~~~~ Page Break ~~~~
+#
+# Study XXXXXXXX
+# subtitle YYYYYYYYYY
+# subtitle2 ZZZZZZZZZ
+# Patient Subset - Gender: F
+# Stratification - Strata: C
+#
+# ——————————————————————————————————————————————————
+# A: Drug X B: Placebo C: Combination
+# ——————————————————————————————————————————————————
+# mean 35.2 36.0 34.3
+# ——————————————————————————————————————————————————
+#
+# Analysis was done using cool methods that are correct
+#
+# file: /path/to/stuff/that/lives/there HASH:1ac41b242a
+#
+#
+# ~~~~ Page Break ~~~~
+#
+# Study XXXXXXXX
+# subtitle YYYYYYYYYY
+# subtitle2 ZZZZZZZZZ
+# Patient Subset - Gender: M
+# Stratification - Strata: A
+#
+# ——————————————————————————————————————————————————
+# A: Drug X B: Placebo C: Combination
+# ——————————————————————————————————————————————————
+# mean 35.1 31.1 35.6
+# ——————————————————————————————————————————————————
+#
+# Analysis was done using cool methods that are correct
+#
+# file: /path/to/stuff/that/lives/there HASH:1ac41b242a
+#
+#
+# ~~~~ Page Break ~~~~
+#
+# Study XXXXXXXX
+# subtitle YYYYYYYYYY
+# subtitle2 ZZZZZZZZZ
+# Patient Subset - Gender: M
+# Stratification - Strata: B
+#
+# ——————————————————————————————————————————————————
+# A: Drug X B: Placebo C: Combination
+# ——————————————————————————————————————————————————
+# mean 36.6 32.1 34.4
+# ——————————————————————————————————————————————————
+#
+# Analysis was done using cool methods that are correct
+#
+# file: /path/to/stuff/that/lives/there HASH:1ac41b242a
+#
+#
+# ~~~~ Page Break ~~~~
+#
+# Study XXXXXXXX
+# subtitle YYYYYYYYYY
+# subtitle2 ZZZZZZZZZ
+# Patient Subset - Gender: M
+# Stratification - Strata: C
+#
+# ——————————————————————————————————————————————————
+# A: Drug X B: Placebo C: Combination
+# ——————————————————————————————————————————————————
+# mean 37.4 32.8 32.8
+# ——————————————————————————————————————————————————
+#
+# Analysis was done using cool methods that are correct
+#
+# file: /path/to/stuff/that/lives/there HASH:1ac41b242a
+Referential footnotes are footnotes associated with a particular +component of a table: a column, a row, or a cell. They can be added +during tabulation via analysis functions, but they can also be added +post-hoc once a table is created.
+They are rendered as a number within curly braces within the table +body, row, or column labels, followed by a message associated with that +number printed below the table during rendering.
+
+afun <- function(df, .var, .spl_context) {
+ val <- .spl_context$value[NROW(.spl_context)]
+ rw_fnotes <- if (val == "C") list("This is strata level C for these patients") else list()
+ cl_fnotes <- if (val == "B" && df[1, "ARM", drop = TRUE] == "C: Combination") {
+ list("these Strata B patients got the drug combination")
+ } else {
+ list()
+ }
+
+ in_rows(
+ mean = mean(df[[.var]]),
+ .row_footnotes = rw_fnotes,
+ .cell_footnotes = cl_fnotes,
+ .formats = c(mean = "xx.x")
+ )
+}
+
+lyt <- basic_table(
+ title = "Study XXXXXXXX",
+ subtitles = c("subtitle YYYYYYYYYY", "subtitle2 ZZZZZZZZZ"),
+ main_footer = "Analysis was done using cool methods that are correct",
+ prov_footer = "file: /path/to/stuff/that/lives/there HASH:1ac41b242a"
+) %>%
+ split_cols_by("ARM") %>%
+ split_rows_by("SEX", page_by = TRUE, page_prefix = "Patient Subset - Gender", split_fun = drop_split_levels) %>%
+ split_rows_by("STRATA1") %>%
+ analyze("AGE", afun, format = "xx.x")
+
+tbl <- build_table(lyt, DM)
+cat(export_as_txt(tbl, paginate = TRUE, page_break = "\n\n\n"))
# Study XXXXXXXX
+# subtitle YYYYYYYYYY
+# subtitle2 ZZZZZZZZZ
+# Patient Subset - Gender: F
+#
+# ——————————————————————————————————————————————————————
+# A: Drug X B: Placebo C: Combination
+# ——————————————————————————————————————————————————————
+# A
+# mean 30.9 32.9 36.0
+# B
+# mean 34.9 32.9 34.4 {1}
+# C
+# mean {2} 35.2 36.0 34.3
+# ——————————————————————————————————————————————————————
+#
+# {1} - these Strata B patients got the drug combination
+# {2} - This is strata level C for these patients
+# ——————————————————————————————————————————————————————
+#
+# Analysis was done using cool methods that are correct
+#
+# file: /path/to/stuff/that/lives/there HASH:1ac41b242a
+#
+#
+#
+# Study XXXXXXXX
+# subtitle YYYYYYYYYY
+# subtitle2 ZZZZZZZZZ
+# Patient Subset - Gender: M
+#
+# ——————————————————————————————————————————————————————
+# A: Drug X B: Placebo C: Combination
+# ——————————————————————————————————————————————————————
+# A
+# mean 35.1 31.1 35.6
+# B
+# mean 36.6 32.1 34.4 {1}
+# C
+# mean {2} 37.4 32.8 32.8
+# ——————————————————————————————————————————————————————
+#
+# {1} - these Strata B patients got the drug combination
+# {2} - This is strata level C for these patients
+# ——————————————————————————————————————————————————————
+#
+# Analysis was done using cool methods that are correct
+#
+# file: /path/to/stuff/that/lives/there HASH:1ac41b242a
+We note that typically the type of footnote added within the analysis +function would be dependent on the computations done to calculate the +cell value(s), e.g., a model not converging. We simply use context +information as an illustrative proxy for that.
+The procedure for adding footnotes to content (summary row) rows or +cells is identical to the above, when done within a content +function.
+In addition to inserting referential footnotes at tabulation time +within our analysis functions, we can also annotate our tables with them +post-hoc.
+This is also the only way to add footnotes to column +labels, as those cannot be controlled within an analysis or content +function.
+
+## from ?tolower example slightly modified
+.simpleCap <- function(x) {
+ if (length(x) > 1) {
+ return(sapply(x, .simpleCap))
+ }
+ s <- strsplit(tolower(x), " ")[[1]]
+ paste(toupper(substring(s, 1, 1)), substring(s, 2), sep = "", collapse = " ")
+}
+
+adsl2 <- ex_adsl %>%
+ filter(SEX %in% c("M", "F") & RACE %in% (levels(RACE)[1:3])) %>%
+ ## we trim the level names here solely due to space considerations
+ mutate(ethnicity = .simpleCap(gsub("(.*)OR.*", "\\1", RACE)), RACE = factor(RACE))
+
+lyt2 <- basic_table() %>%
+ split_cols_by("ARM") %>%
+ split_cols_by("SEX", split_fun = drop_split_levels) %>%
+ split_rows_by("RACE", labels_var = "ethnicity", split_fun = drop_split_levels) %>%
+ summarize_row_groups() %>%
+ analyze(c("AGE", "STRATA1"))
+
+tbl2 <- build_table(lyt2, adsl2)
+tbl2
# A: Drug X B: Placebo C: Combination
+# F M F M F M
+# ———————————————————————————————————————————————————————————————————————————————————————
+# Asian 41 (53.9%) 25 (54.3%) 36 (52.2%) 30 (60.0%) 39 (60.9%) 32 (57.1%)
+# AGE
+# Mean 31.22 34.60 35.06 38.63 36.44 37.66
+# STRATA1
+# A 11 10 14 10 11 7
+# B 11 9 15 7 11 14
+# C 19 6 7 13 17 11
+# Black 18 (23.7%) 12 (26.1%) 16 (23.2%) 12 (24.0%) 14 (21.9%) 14 (25.0%)
+# AGE
+# Mean 34.06 34.58 33.88 36.33 33.21 34.21
+# STRATA1
+# A 5 2 5 6 3 7
+# B 6 5 3 4 4 4
+# C 7 5 8 2 7 3
+# White 17 (22.4%) 9 (19.6%) 17 (24.6%) 8 (16.0%) 11 (17.2%) 10 (17.9%)
+# AGE
+# Mean 34.12 40.00 32.41 34.62 33.00 30.80
+# STRATA1
+# A 5 3 3 3 3 5
+# B 5 4 8 4 5 2
+# C 7 2 6 1 3 3
+We do this with the fnotes_at_path<-
function which
+accepts a row path, a column path, and a value for the full set of
+footnotes for the defined locations (NULL
or a
+character
vector).
A non-NULL
row path with a NULL
column path
+specifies the footnote(s) should be attached to the row, while
+NULL
row path with non-NULL
column path
+indicates they go with the column. Both being non-NULL
+indicates a cell (and must resolve to an individual cell).
# A: Drug X B: Placebo C: Combination
+# F M F M F M
+# ——————————————————————————————————————————————————————————————————————————————————————————
+# Asian {1, 2} 41 (53.9%) 25 (54.3%) 36 (52.2%) 30 (60.0%) 39 (60.9%) 32 (57.1%)
+# AGE
+# Mean 31.22 34.60 35.06 38.63 36.44 37.66
+# STRATA1
+# A 11 10 14 10 11 7
+# B 11 9 15 7 11 14
+# C 19 6 7 13 17 11
+# Black 18 (23.7%) 12 (26.1%) 16 (23.2%) 12 (24.0%) 14 (21.9%) 14 (25.0%)
+# AGE
+# Mean 34.06 34.58 33.88 36.33 33.21 34.21
+# STRATA1
+# A 5 2 5 6 3 7
+# B 6 5 3 4 4 4
+# C 7 5 8 2 7 3
+# White 17 (22.4%) 9 (19.6%) 17 (24.6%) 8 (16.0%) 11 (17.2%) 10 (17.9%)
+# AGE
+# Mean 34.12 40.00 32.41 34.62 33.00 30.80
+# STRATA1
+# A 5 3 3 3 3 5
+# B 5 4 8 4 5 2
+# C 7 2 6 1 3 3
+# ——————————————————————————————————————————————————————————————————————————————————————————
+#
+# {1} - hi
+# {2} - there
+# ——————————————————————————————————————————————————————————————————————————————————————————
+
+# A: Drug X B: Placebo {NA} C: Combination
+# F M F M F M
+# ——————————————————————————————————————————————————————————————————————————————————————————
+# Asian {1, 2} 41 (53.9%) 25 (54.3%) 36 (52.2%) 30 (60.0%) 39 (60.9%) 32 (57.1%)
+# AGE
+# Mean 31.22 34.60 35.06 38.63 36.44 37.66
+# STRATA1
+# A 11 10 14 10 11 7
+# B 11 9 15 7 11 14
+# C 19 6 7 13 17 11
+# Black 18 (23.7%) 12 (26.1%) 16 (23.2%) 12 (24.0%) 14 (21.9%) 14 (25.0%)
+# AGE
+# Mean 34.06 34.58 33.88 36.33 33.21 34.21
+# STRATA1
+# A 5 2 5 6 3 7
+# B 6 5 3 4 4 4
+# C 7 5 8 2 7 3
+# White 17 (22.4%) 9 (19.6%) 17 (24.6%) 8 (16.0%) 11 (17.2%) 10 (17.9%)
+# AGE
+# Mean 34.12 40.00 32.41 34.62 33.00 30.80
+# STRATA1
+# A 5 3 3 3 3 5
+# B 5 4 8 4 5 2
+# C 7 2 6 1 3 3
+# ——————————————————————————————————————————————————————————————————————————————————————————
+#
+# {1} - hi
+# {2} - there
+# {NA} - this is a placebo
+# ——————————————————————————————————————————————————————————————————————————————————————————
+Note to step into a content row we must add that to the path, even +though we didn’t need it to put a footnote on the full row.
+Currently, content rows by default are named with the label
+rather than name of the corresponding facet. This is reflected
+in the output of, e.g., row_paths_summary
.
+row_paths_summary(tbl2)
# rowname node_class path
+# ———————————————————————————————————————————————————————————————————————————
+# Asian ContentRow RACE, ASIAN, @content, Asian
+# AGE LabelRow RACE, ASIAN, AGE
+# Mean DataRow RACE, ASIAN, AGE, Mean
+# STRATA1 LabelRow RACE, ASIAN, STRATA1
+# A DataRow RACE, ASIAN, STRATA1, A
+# B DataRow RACE, ASIAN, STRATA1, B
+# C DataRow RACE, ASIAN, STRATA1, C
+# Black ContentRow RACE, BLACK OR AFRICAN AMERICAN, @content, Black
+# AGE LabelRow RACE, BLACK OR AFRICAN AMERICAN, AGE
+# Mean DataRow RACE, BLACK OR AFRICAN AMERICAN, AGE, Mean
+# STRATA1 LabelRow RACE, BLACK OR AFRICAN AMERICAN, STRATA1
+# A DataRow RACE, BLACK OR AFRICAN AMERICAN, STRATA1, A
+# B DataRow RACE, BLACK OR AFRICAN AMERICAN, STRATA1, B
+# C DataRow RACE, BLACK OR AFRICAN AMERICAN, STRATA1, C
+# White ContentRow RACE, WHITE, @content, White
+# AGE LabelRow RACE, WHITE, AGE
+# Mean DataRow RACE, WHITE, AGE, Mean
+# STRATA1 LabelRow RACE, WHITE, STRATA1
+# A DataRow RACE, WHITE, STRATA1, A
+# B DataRow RACE, WHITE, STRATA1, B
+# C DataRow RACE, WHITE, STRATA1, C
+So we can add our footnotes to the cell like so:
+
+fnotes_at_path(
+ tbl2,
+ rowpath = c("RACE", "ASIAN", "@content", "Asian"),
+ colpath = c("ARM", "B: Placebo", "SEX", "F")
+) <- "These asian women got placebo treatments"
+tbl2
# A: Drug X B: Placebo {NA} C: Combination
+# F M F M F M
+# ——————————————————————————————————————————————————————————————————————————————————————————————
+# Asian {1, 2} 41 (53.9%) 25 (54.3%) 36 (52.2%) {3} 30 (60.0%) 39 (60.9%) 32 (57.1%)
+# AGE
+# Mean 31.22 34.60 35.06 38.63 36.44 37.66
+# STRATA1
+# A 11 10 14 10 11 7
+# B 11 9 15 7 11 14
+# C 19 6 7 13 17 11
+# Black 18 (23.7%) 12 (26.1%) 16 (23.2%) 12 (24.0%) 14 (21.9%) 14 (25.0%)
+# AGE
+# Mean 34.06 34.58 33.88 36.33 33.21 34.21
+# STRATA1
+# A 5 2 5 6 3 7
+# B 6 5 3 4 4 4
+# C 7 5 8 2 7 3
+# White 17 (22.4%) 9 (19.6%) 17 (24.6%) 8 (16.0%) 11 (17.2%) 10 (17.9%)
+# AGE
+# Mean 34.12 40.00 32.41 34.62 33.00 30.80
+# STRATA1
+# A 5 3 3 3 3 5
+# B 5 4 8 4 5 2
+# C 7 2 6 1 3 3
+# ——————————————————————————————————————————————————————————————————————————————————————————————
+#
+# {1} - hi
+# {2} - there
+# {3} - These asian women got placebo treatments
+# {NA} - this is a placebo
+# ——————————————————————————————————————————————————————————————————————————————————————————————
+