diff --git a/NAMESPACE b/NAMESPACE index caf664eb2..a53328608 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -141,7 +141,6 @@ export(ref_msg) export(ref_symbol) export(remove_split_levels) export(reorder_split_levels) -export(result_df_specs) export(rheader) export(rm_all_colcounts) export(row_cells) diff --git a/NEWS.md b/NEWS.md index c3f2b54c3..700823eb7 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,11 +1,17 @@ ## rtables 0.6.10.9004 -### Miscellaneous - * Split `docx` document generation to the new package [`rtables.officer`](https://github.com/insightsengineering/rtables.officer). - ### New Features * Experimental pagination is now possible in `tt_as_flextable()` and `export_as_docx()`. * Added handling of widths in `tt_as_flextable()`. Now it is possible to change column widths for `.docx` exports. + * Initialized vignette about quality control outputs of `as_result_df()`. + * Initialized parameter `make_ard` output for single-line statistical outputs. + +### Miscellaneous + * Split `docx` document generation to the new package [`rtables.officer`](https://github.com/insightsengineering/rtables.officer). + * Refactored `as_result_df()` parameters `as_strings` and `as_viewer` into `data_format = c("full_precision", "strings", "numeric")` following the same outputs. + * Refactored `as_result_df()` to have a standard behavior, with all the relevant parameters, and a possibility to add personalized `spec`. + * Removed `result_df_specs()`, because `as_result_df()` was a too shallow wrapper. + * Merged behavior of `as_result_df()` parameters `as_is` and `simplify` parameters to remove structural information. ### Bug Fixes * Fixed bug that was keeping indentation space characters in top left information when making a `flextable` from a `TableTree` object. diff --git a/R/tt_as_df.R b/R/tt_as_df.R index 911d5145d..409ad660a 100644 --- a/R/tt_as_df.R +++ b/R/tt_as_df.R @@ -1,38 +1,29 @@ -# data.frame output ------------------------------------------------------------ - +# as_result_df ------------------------------------------------------------ #' Generate a result data frame #' #' Collection of utilities to extract `data.frame` objects from `TableTree` objects. #' #' @inheritParams gen_args -#' @param spec (`string`)\cr the specification to use to extract the result data frame. See Details below. -#' @param simplify (`flag`)\cr whether the result data frame should only have labels and result columns visible. -#' @param ... additional arguments passed to spec-specific result data frame conversion function. Currently it can be -#' one or more of the following parameters (valid only for `v0_experimental` spec. for now): -#' - `expand_colnames`: when `TRUE`, the result data frame will have expanded column names above the usual -#' output. This is useful when the result data frame is used for further processing. -#' - `simplify`: when `TRUE`, the result data frame will have only visible labels and result columns. -#' - `as_strings`: when `TRUE`, the result data frame will have all values as strings, as they appear -#' in the final table (it can also be retrieved from `matrix_form(tt)$strings`). This is also true for -#' column counts if `expand_colnames = TRUE`. -#' - `as_viewer`: when `TRUE`, the result data frame will have all values as they appear in the final table, -#' i.e. with the same precision and numbers, but in easy-to-use numeric form. -#' - `keep_label_rows`: when `TRUE`, the result data frame will have all labels as they appear in the -#' final table. -#' - `as_is`: when `TRUE`, the result data frame will have all the values as they appear in the final table, -#' but without information about the row structure. Row labels will be assigned to rows so to work well -#' with [df_to_tt()]. -#' -#' @details `as_result_df()`: Result data frame specifications may differ in the exact information -#' they include and the form in which they represent it. Specifications whose names end in "_experimental" -#' are subject to change without notice, but specifications without the "_experimental" -#' suffix will remain available *including any bugs in their construction* indefinitely. +#' @param spec (`function`)\cr function that generates the result data frame from a table (`TableTree`). +#' It defaults to `NULL`, for standard processing. +#' @param expand_colnames (`flag`)\cr when `TRUE`, the result data frame will have expanded column +#' names above the usual output. This is useful when the result data frame is used for further processing. +#' @param data_format (`string`)\cr the format of the data in the result data frame. It can be one value +#' between `"full_precision"` (default), `"strings"`, and `"numeric"`. The last two values show the numeric +#' data with the visible precision. +#' @param make_ard (`flag`)\cr when `TRUE`, the result data frame will have only one statistic per row. +#' @param keep_label_rows (`flag`)\cr when `TRUE`, the result data frame will have all labels +#' as they appear in the final table. +#' @param simplify (`flag`)\cr when `TRUE`, the result data frame will have only visible labels and +#' result columns. Consider showing also label rows with `keep_label_rows = TRUE`. This output can be +#' used again to create a `TableTree` object with [df_to_tt()]. +#' @param ... additional arguments passed to spec-specific result data frame function (`spec`). #' #' @return #' * `as_result_df` returns a result `data.frame`. #' -#' @seealso [df_to_tt()] when using `as_is = TRUE` and [formatters::make_row_df()] to have a comprehensive view of the -#' hierarchical structure of the rows. +#' @seealso [df_to_tt()] when using `simplify = TRUE` and [formatters::make_row_df()] to have a +#' comprehensive view of the hierarchical structure of the rows. #' #' @examples #' lyt <- basic_table() %>% @@ -41,220 +32,214 @@ #' analyze(c("AGE", "BMRKR2")) #' #' tbl <- build_table(lyt, ex_adsl) -#' as_result_df(tbl) +#' as_result_df(tbl, simplify = TRUE) #' #' @name data.frame_export #' @export -as_result_df <- function(tt, spec = "v0_experimental", simplify = FALSE, ...) { +as_result_df <- function(tt, spec = NULL, + data_format = c("full_precision", "strings", "numeric"), + make_ard = FALSE, + expand_colnames = FALSE, + keep_label_rows = FALSE, + simplify = FALSE, + ...) { + data_format <- data_format[[1]] checkmate::assert_class(tt, "VTableTree") - checkmate::assert_string(spec) + checkmate::assert_function(spec, null.ok = TRUE) + checkmate::assert_choice(data_format[[1]], choices = eval(formals(as_result_df)[["data_format"]])) + checkmate::assert_flag(make_ard) + checkmate::assert_flag(expand_colnames) + checkmate::assert_flag(keep_label_rows) checkmate::assert_flag(simplify) if (nrow(tt) == 0) { return(sanitize_table_struct(tt)) } - result_df_fun <- lookup_result_df_specfun(spec) - out <- result_df_fun(tt, ...) - - if (simplify) { - out <- .simplify_result_df(out) + if (make_ard) { + simplify <- FALSE + expand_colnames <- TRUE + keep_label_rows <- FALSE } - out -} - -# Function that selects specific outputs from the result data frame -.simplify_result_df <- function(df) { - col_df <- colnames(df) - row_names_col <- which(col_df == "row_name") - result_cols <- seq(which(col_df == "node_class") + 1, length(col_df)) - - df[, c(row_names_col, result_cols)] -} - -# Not used in rtables -# .split_colwidths <- function(ptabs, nctot, colwidths) { -# ret <- list() -# i <- 1L -# -# rlw <- colwidths[1] -# colwidths <- colwidths[-1] -# donenc <- 0 -# while (donenc < nctot) { -# curnc <- NCOL(ptabs[[i]]) -# ret[[i]] <- c(rlw, colwidths[seq_len(curnc)]) -# colwidths <- colwidths[-1 * seq_len(curnc)] -# donenc <- donenc + curnc -# i <- i + 1 -# } -# ret -# } - -#' @describeIn data.frame_export A list of functions that extract result data frames from `TableTree`s. -#' -#' @return -#' * `result_df_specs()` returns a named list of result data frame extraction functions by "specification". -#' -#' @examples -#' result_df_specs() -#' -#' @export -result_df_specs <- function() { - list(v0_experimental = result_df_v0_experimental) -} - -lookup_result_df_specfun <- function(spec) { - if (!(spec %in% names(result_df_specs()))) { - stop( - "unrecognized result data frame specification: ", - spec, - "If that specification is correct you may need to update your version of rtables" - ) - } - result_df_specs()[[spec]] -} + if (is.null(spec)) { + raw_cvals <- cell_values(tt) + ## if the table has one row and multiple columns, sometimes the cell values returns a list of the cell values + ## rather than a list of length 1 representing the single row. This is bad but may not be changeable + ## at this point. + if (nrow(tt) == 1 && length(raw_cvals) > 1) { + raw_cvals <- list(raw_cvals) + } -result_df_v0_experimental <- function(tt, - as_viewer = FALSE, - as_strings = FALSE, - expand_colnames = FALSE, - keep_label_rows = FALSE, - as_is = FALSE) { - checkmate::assert_flag(as_viewer) - checkmate::assert_flag(as_strings) - checkmate::assert_flag(expand_colnames) - checkmate::assert_flag(keep_label_rows) - checkmate::assert_flag(as_is) + # Flatten the list of lists (rows) of cell values into a data frame + cellvals <- as.data.frame(do.call(rbind, raw_cvals)) + row.names(cellvals) <- NULL - if (as_is) { - keep_label_rows <- TRUE - expand_colnames <- FALSE - } + if (nrow(tt) == 1 && ncol(tt) == 1) { + colnames(cellvals) <- names(raw_cvals) + } - raw_cvals <- cell_values(tt) - ## if the table has one row and multiple columns, sometimes the cell values returns a list of the cell values - ## rather than a list of length 1 representing the single row. This is bad but may not be changeable - ## at this point. - if (nrow(tt) == 1 && length(raw_cvals) > 1) { - raw_cvals <- list(raw_cvals) - } + if (data_format %in% c("strings", "numeric")) { + # we keep previous calculations to check the format of the data + mf_tt <- matrix_form(tt) + mf_result_chars <- mf_strings(mf_tt)[-seq_len(mf_nlheader(mf_tt)), -1] + mf_result_chars <- .remove_empty_elements(mf_result_chars) + mf_result_numeric <- as.data.frame( + .make_numeric_char_mf(mf_result_chars) + ) + mf_result_chars <- as.data.frame(mf_result_chars) + if (!setequal(dim(mf_result_numeric), dim(cellvals)) || !setequal(dim(mf_result_chars), dim(cellvals))) { + stop( + "The extracted numeric data.frame does not have the same dimension of the", + " cell values extracted with cell_values(). This is a bug. Please report it." + ) # nocov + } + if (data_format == "strings") { + colnames(mf_result_chars) <- colnames(cellvals) + cellvals <- mf_result_chars + } else { + colnames(mf_result_numeric) <- colnames(cellvals) + cellvals <- mf_result_numeric + } + } - # Flatten the list of lists (rows) of cell values into a data frame - cellvals <- as.data.frame(do.call(rbind, raw_cvals)) - row.names(cellvals) <- NULL + rdf <- make_row_df(tt) + cinfo_df <- col_info(tt) + ci_coltree <- coltree(cinfo_df) + column_split_names <- .get_column_split_name(ci_coltree) # used only in make_ard - if (nrow(tt) == 1 && ncol(tt) == 1) { - colnames(cellvals) <- names(raw_cvals) - } - - if (as_viewer || as_strings) { - # we keep previous calculations to check the format of the data - mf_tt <- matrix_form(tt) - mf_result_chars <- mf_strings(mf_tt)[-seq_len(mf_nlheader(mf_tt)), -1] - mf_result_chars <- .remove_empty_elements(mf_result_chars) - mf_result_numeric <- as.data.frame( - .make_numeric_char_mf(mf_result_chars) + df <- rdf[, c("name", "label", "abs_rownumber", "path", "reprint_inds", "node_class")] + # Removing initial root elements from path (out of the loop -> right maxlen) + df$path <- lapply(df$path, .remove_root_elems_from_path, + which_root_name = c("root", "rbind_root"), + all = TRUE ) - mf_result_chars <- as.data.frame(mf_result_chars) - if (!setequal(dim(mf_result_numeric), dim(cellvals)) || !setequal(dim(mf_result_chars), dim(cellvals))) { - stop( - "The extracted numeric data.frame does not have the same dimension of the", - " cell values extracted with cell_values(). This is a bug. Please report it." - ) # nocov - } - if (as_strings) { - colnames(mf_result_chars) <- colnames(cellvals) - cellvals <- mf_result_chars + maxlen <- max(lengths(df$path)) + + # Loop for metadata (path and details from make_row_df) + metadf <- do.call( + rbind.data.frame, + lapply( + seq_len(NROW(df)), + function(ii) { + handle_rdf_row(df[ii, ], maxlen = maxlen) + } + ) + ) + + # Should we keep label rows with NAs instead of values? + if (keep_label_rows) { + cellvals_mat_struct <- as.data.frame( + matrix(NA, nrow = nrow(rdf), ncol = ncol(cellvals)) + ) + colnames(cellvals_mat_struct) <- colnames(cellvals) + cellvals_mat_struct[metadf$node_class != "LabelRow", ] <- cellvals + ret <- cbind(metadf, cellvals_mat_struct) } else { - colnames(mf_result_numeric) <- colnames(cellvals) - cellvals <- mf_result_numeric + ret <- cbind( + metadf[metadf$node_class != "LabelRow", ], + cellvals + ) } - } - rdf <- make_row_df(tt) + # If we want to expand colnames + if (expand_colnames) { + col_name_structure <- .get_formatted_colnames(clayout(tt)) + number_of_non_data_cols <- which(colnames(ret) == "node_class") + if (NCOL(ret) - number_of_non_data_cols != NCOL(col_name_structure)) { + stop( + "When expanding colnames structure, we were not able to find the same", + " number of columns as in the result data frame. This is a bug. Please report it." + ) # nocov + } - df <- rdf[, c("name", "label", "abs_rownumber", "path", "reprint_inds", "node_class")] - # Removing initial root elements from path (out of the loop -> right maxlen) - df$path <- lapply(df$path, .remove_root_elems_from_path, - which_root_name = c("root", "rbind_root"), - all = TRUE - ) - maxlen <- max(lengths(df$path)) + buffer_rows_for_colnames <- matrix( + rep("", number_of_non_data_cols * NROW(col_name_structure)), + nrow = NROW(col_name_structure) + ) + + header_colnames_matrix <- cbind(buffer_rows_for_colnames, data.frame(col_name_structure)) + colnames(header_colnames_matrix) <- colnames(ret) + + count_row <- NULL + if (disp_ccounts(tt)) { + ccounts <- col_counts(tt) + if (data_format == "strings") { + ccounts <- mf_strings(mf_tt)[mf_nlheader(mf_tt), ] + ccounts <- .remove_empty_elements(ccounts) + } + count_row <- c(rep("", number_of_non_data_cols), ccounts) + header_colnames_matrix <- rbind(header_colnames_matrix, count_row) + } + ret <- rbind(header_colnames_matrix, ret) + } - # Loop for metadata (path and details from make_row_df) - metadf <- do.call( - rbind.data.frame, - lapply( - seq_len(NROW(df)), - function(ii) { - handle_rdf_row(df[ii, ], maxlen = maxlen) + # ARD part for one stat per row + if (make_ard) { + # Unnecessary columns + ret_tmp <- ret[, !colnames(ret) %in% c("row_num", "is_group_summary", "node_class")] + + # Indexes of real columns (visible in the output, but no row names) + only_col_indexes <- seq(which(colnames(ret_tmp) == "label_name") + 1, ncol(ret_tmp)) + + # Core row names + col_label_rows <- grepl("", number_of_non_data_cols * NROW(col_name_structure)), - nrow = NROW(col_name_structure) - ) + # take out rownames + rownames(out) <- NULL + } else { + # Applying specs + out <- spec(tt, ...) + } - header_colnames_matrix <- cbind(buffer_rows_for_colnames, data.frame(col_name_structure)) - colnames(header_colnames_matrix) <- colnames(ret) + out +} - count_row <- NULL - if (disp_ccounts(tt)) { - ccounts <- col_counts(tt) - if (as_strings) { - ccounts <- mf_strings(mf_tt)[mf_nlheader(mf_tt), ] - ccounts <- .remove_empty_elements(ccounts) - } - count_row <- c(rep("", number_of_non_data_cols), ccounts) - header_colnames_matrix <- rbind(header_colnames_matrix, count_row) - } - ret <- rbind(header_colnames_matrix, ret) +# Helper function to get column split names +.get_column_split_name <- function(ci_coltree) { + # ci stands for column information + if (is(ci_coltree, "LayoutAxisTree")) { + kids <- tree_children(ci_coltree) + return(unlist(lapply(kids, .get_column_split_name))) } + sapply(pos_splits(tree_pos(ci_coltree)), spl_payload) +} - # Using only labels for row names and losing information about paths - if (as_is) { - tmp_rownames <- ret$label_name - ret <- ret[, -seq_len(which(colnames(ret) == "node_class"))] - if (length(unique(tmp_rownames)) == length(tmp_rownames)) { - rownames(ret) <- tmp_rownames - } else { - ret <- cbind("label_name" = tmp_rownames, ret) - rownames(ret) <- NULL - } - } else { - rownames(ret) <- NULL +# Function that selects specific outputs from the result data frame +.simplify_result_df <- function(df) { + col_df <- colnames(df) + if (!all(c("label_name", "node_class") %in% col_df)) { + stop("Please simplify the result data frame only when it has 'label_name' and 'node_class' columns.") } + label_names_col <- which(col_df == "label_name") + result_cols <- seq(which(col_df == "node_class") + 1, length(col_df)) - ret + df[, c(label_names_col, result_cols)] } .remove_empty_elements <- function(char_df) { @@ -407,7 +392,8 @@ handle_rdf_row <- function(rdfrow, maxlen) { return(ret) } } - +# path_enriched_df ------------------------------------------------------------ +# #' @describeIn data.frame_export Transform a `TableTree` object to a path-enriched `data.frame`. #' #' @param path_fun (`function`)\cr function to transform paths into single-string row/column names. diff --git a/_pkgdown.yml b/_pkgdown.yml index 1e610c4db..257eeee4b 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -66,6 +66,7 @@ articles: contents: - clinical_trials - baseline + - ard_how_to - title: Table Customization navbar: Table Customization @@ -298,7 +299,6 @@ reference: desc: Functions for generating result data frames from rtables TableTree objects contents: - as_result_df - - result_df_specs - title: internal desc: Internal documented methods diff --git a/man/data.frame_export.Rd b/man/data.frame_export.Rd index 4e952446d..f87dc8db0 100644 --- a/man/data.frame_export.Rd +++ b/man/data.frame_export.Rd @@ -3,40 +3,45 @@ \name{data.frame_export} \alias{data.frame_export} \alias{as_result_df} -\alias{result_df_specs} \alias{path_enriched_df} \title{Generate a result data frame} \usage{ -as_result_df(tt, spec = "v0_experimental", simplify = FALSE, ...) - -result_df_specs() +as_result_df( + tt, + spec = NULL, + data_format = c("full_precision", "strings", "numeric"), + make_ard = FALSE, + expand_colnames = FALSE, + keep_label_rows = FALSE, + simplify = FALSE, + ... +) path_enriched_df(tt, path_fun = collapse_path, value_fun = collapse_values) } \arguments{ \item{tt}{(\code{TableTree} or related class)\cr a \code{TableTree} object representing a populated table.} -\item{spec}{(\code{string})\cr the specification to use to extract the result data frame. See Details below.} +\item{spec}{(\code{function})\cr function that generates the result data frame from a table (\code{TableTree}). +It defaults to \code{NULL}, for standard processing.} -\item{simplify}{(\code{flag})\cr whether the result data frame should only have labels and result columns visible.} +\item{data_format}{(\code{string})\cr the format of the data in the result data frame. It can be one value +between \code{"full_precision"} (default), \code{"strings"}, and \code{"numeric"}. The last two values show the numeric +data with the visible precision.} -\item{...}{additional arguments passed to spec-specific result data frame conversion function. Currently it can be -one or more of the following parameters (valid only for \code{v0_experimental} spec. for now): -\itemize{ -\item \code{expand_colnames}: when \code{TRUE}, the result data frame will have expanded column names above the usual -output. This is useful when the result data frame is used for further processing. -\item \code{simplify}: when \code{TRUE}, the result data frame will have only visible labels and result columns. -\item \code{as_strings}: when \code{TRUE}, the result data frame will have all values as strings, as they appear -in the final table (it can also be retrieved from \code{matrix_form(tt)$strings}). This is also true for -column counts if \code{expand_colnames = TRUE}. -\item \code{as_viewer}: when \code{TRUE}, the result data frame will have all values as they appear in the final table, -i.e. with the same precision and numbers, but in easy-to-use numeric form. -\item \code{keep_label_rows}: when \code{TRUE}, the result data frame will have all labels as they appear in the -final table. -\item \code{as_is}: when \code{TRUE}, the result data frame will have all the values as they appear in the final table, -but without information about the row structure. Row labels will be assigned to rows so to work well -with \code{\link[=df_to_tt]{df_to_tt()}}. -}} +\item{make_ard}{(\code{flag})\cr when \code{TRUE}, the result data frame will have only one statistic per row.} + +\item{expand_colnames}{(\code{flag})\cr when \code{TRUE}, the result data frame will have expanded column +names above the usual output. This is useful when the result data frame is used for further processing.} + +\item{keep_label_rows}{(\code{flag})\cr when \code{TRUE}, the result data frame will have all labels +as they appear in the final table.} + +\item{simplify}{(\code{flag})\cr when \code{TRUE}, the result data frame will have only visible labels and +result columns. Consider showing also label rows with \code{keep_label_rows = TRUE}. This output can be +used again to create a \code{TableTree} object with \code{\link[=df_to_tt]{df_to_tt()}}.} + +\item{...}{additional arguments passed to spec-specific result data frame function (\code{spec}).} \item{path_fun}{(\code{function})\cr function to transform paths into single-string row/column names.} @@ -48,10 +53,6 @@ with \code{\link[=df_to_tt]{df_to_tt()}}. \item \code{as_result_df} returns a result \code{data.frame}. } -\itemize{ -\item \code{result_df_specs()} returns a named list of result data frame extraction functions by "specification". -} - \itemize{ \item \code{path_enriched_df()} returns a \code{data.frame} of \code{tt}'s cell values (processed by \code{value_fun}, with columns named by the full column paths (processed by \code{path_fun} and an additional \code{row_path} column with the row paths (processed @@ -61,16 +62,8 @@ by \code{path_fun}). \description{ Collection of utilities to extract \code{data.frame} objects from \code{TableTree} objects. } -\details{ -\code{as_result_df()}: Result data frame specifications may differ in the exact information -they include and the form in which they represent it. Specifications whose names end in "_experimental" -are subject to change without notice, but specifications without the "_experimental" -suffix will remain available \emph{including any bugs in their construction} indefinitely. -} \section{Functions}{ \itemize{ -\item \code{result_df_specs()}: A list of functions that extract result data frames from \code{TableTree}s. - \item \code{path_enriched_df()}: Transform a \code{TableTree} object to a path-enriched \code{data.frame}. }} @@ -81,9 +74,7 @@ lyt <- basic_table() \%>\% analyze(c("AGE", "BMRKR2")) tbl <- build_table(lyt, ex_adsl) -as_result_df(tbl) - -result_df_specs() +as_result_df(tbl, simplify = TRUE) lyt <- basic_table() \%>\% split_cols_by("ARM") \%>\% @@ -94,6 +85,6 @@ path_enriched_df(tbl) } \seealso{ -\code{\link[=df_to_tt]{df_to_tt()}} when using \code{as_is = TRUE} and \code{\link[formatters:make_row_df]{formatters::make_row_df()}} to have a comprehensive view of the -hierarchical structure of the rows. +\code{\link[=df_to_tt]{df_to_tt()}} when using \code{simplify = TRUE} and \code{\link[formatters:make_row_df]{formatters::make_row_df()}} to have a +comprehensive view of the hierarchical structure of the rows. } diff --git a/tests/testthat/test-binding.R b/tests/testthat/test-binding.R index 12b30cea6..98886b790 100644 --- a/tests/testthat/test-binding.R +++ b/tests/testthat/test-binding.R @@ -213,7 +213,10 @@ test_that("count visibility syncing works when cbinding", { ## cause problems with any of the column info checks test_that("equivalent split funs withs differrent environments dont' block rbinding", { - combodf <- tibble::tribble( + skip_if_not_installed("tibble") + require(tibble, quietly = TRUE) + + combodf <- tribble( ~valname, ~label, ~levelcombo, ~exargs, "A_B", "Arms A+B", c("A: Drug X", "B: Placebo"), list(), "A_C", "Arms A+C", c("A: Drug X", "C: Combination"), list() diff --git a/tests/testthat/test-result_data_frame.R b/tests/testthat/test-result_data_frame.R index 9a806e206..0e34647fb 100644 --- a/tests/testthat/test-result_data_frame.R +++ b/tests/testthat/test-result_data_frame.R @@ -1,13 +1,11 @@ context("Result Data Frames") test_that("Result Data Frame generation works v0", { - ## change here (only) when v0 is crystalized (no longer experimental) - spec_version <- "v0_experimental" lyt <- make_big_lyt() tbl <- build_table(lyt, rawdat) - result_df <- as_result_df(tbl, spec_version) + result_df <- as_result_df(tbl) expect_identical( result_df[2, "ARM1.M"][[1]], c(37, 37 / 256) @@ -30,7 +28,7 @@ test_that("Result Data Frame generation works v0", { analyze(c("AGE", "BMRKR2")) tbl2 <- build_table(lyt, ex_adsl) - result_df2 <- as_result_df(tbl2, spec_version) + result_df2 <- as_result_df(tbl2) ## regression test expect_false(any(is.na(result_df2$spl_var_1))) @@ -46,7 +44,7 @@ test_that("Result Data Frame generation works v0", { analyze_colvars(afun = length, inclNAs = TRUE) tbl3 <- build_table(lyt3, test) - result_df3 <- as_result_df(tbl3, spec_version) + result_df3 <- as_result_df(tbl3) expect_identical(nrow(result_df3), 1L) @@ -67,27 +65,29 @@ test_that("Result Data Frame generation works v0", { ) }) -test_that("as_result_df works with visual output (as_viewer)", { +test_that("as_result_df works with visual output (data_format as numeric)", { lyt <- make_big_lyt() tbl <- build_table(lyt, rawdat) - res <- expect_silent(as_result_df(tbl, simplify = TRUE, as_viewer = TRUE)) + res <- expect_silent(as_result_df(tbl, simplify = TRUE, data_format = "numeric")) expect_equal(res$ARM1.M[[1]], c(116.0, 45.3)) - res <- expect_silent(as_result_df(tbl, simplify = TRUE, as_viewer = TRUE, as_strings = TRUE)) + res <- expect_silent(as_result_df(tbl, data_format = "strings", simplify = TRUE)) expect_equal(res$ARM1.M[[1]], "116 (45.3%)") mf <- matrix_form(tbl) string_tbl <- mf_strings(mf)[-seq_len(mf_nlheader(mf)), ] - string_tbl <- string_tbl[nzchar(string_tbl[, 2]), ] + string_tbl <- as.data.frame(string_tbl[nzchar(string_tbl[, 2]), ]) colnames(string_tbl) <- colnames(res) - expect_equal(res, data.frame(string_tbl)) + rownames(string_tbl) <- NULL + expect_equal(res, string_tbl) - res <- expect_silent(as_result_df(tbl, simplify = TRUE, as_strings = TRUE, expand_colnames = TRUE)) + res <- expect_silent(as_result_df(tbl, simplify = TRUE, data_format = "strings", expand_colnames = TRUE)) string_tbl <- mf_strings(mf) string_tbl <- data.frame(string_tbl[nzchar(string_tbl[, 2]), ]) colnames(string_tbl) <- colnames(res) - string_tbl$row_name[seq_len(mf_nlheader(mf))] <- res$row_name[seq_len(mf_nlheader(mf))] + string_tbl$label_name[seq_len(mf_nlheader(mf))] <- res$label_name[seq_len(mf_nlheader(mf))] + rownames(string_tbl) <- NULL expect_equal(res, string_tbl) expect_silent(basic_table() %>% build_table(DM) %>% as_result_df()) @@ -97,11 +97,11 @@ test_that("as_result_df works with visual output (as_viewer)", { build_table(DM) expect_equal(as_result_df(tbl)$`all obs`, 5.851948, tolerance = 1e-6) expect_equal( - as_result_df(tbl, as_viewer = TRUE)$`all obs`, - as.numeric(as_result_df(tbl, as_strings = TRUE)$`all obs`) + as_result_df(tbl, data_format = "numeric")$`all obs`, + as.numeric(as_result_df(tbl, data_format = "strings")$`all obs`) ) expect_equal(as_result_df(tbl, expand_colnames = TRUE)$`all obs`[2], "356") - expect_equal(as_result_df(tbl, expand_colnames = TRUE, as_strings = TRUE)$`all obs`[2], "(N=356)") + expect_equal(as_result_df(tbl, expand_colnames = TRUE, data_format = "strings")$`all obs`[2], "(N=356)") # Test for integer extraction and ranges @@ -111,7 +111,7 @@ test_that("as_result_df works with visual output (as_viewer)", { analyze("AGE", afun = function(x) list(a = mean(x), b = range(x))) tbl <- build_table(lyt, ex_adsl) - expect_equal(as_result_df(tbl, simplify = TRUE, as_viewer = TRUE)[2, 2][[1]], c(24, 46)) + expect_equal(as_result_df(tbl, simplify = TRUE, data_format = "numeric")[2, 2][[1]], c(24, 46)) # Test for tables with less than 3 rows tbl <- rtable( @@ -159,8 +159,8 @@ test_that("as_result_df keeps label rows", { rd1 <- as_result_df(tbl, keep_label_rows = TRUE) rd2 <- as_result_df(tbl, keep_label_rows = TRUE, expand_colnames = TRUE) - rd3 <- as_result_df(tbl, keep_label_rows = TRUE, expand_colnames = TRUE, as_strings = TRUE) - rd4 <- as_result_df(tbl, keep_label_rows = TRUE, expand_colnames = TRUE, as_viewer = TRUE) + rd3 <- as_result_df(tbl, keep_label_rows = TRUE, expand_colnames = TRUE, data_format = "strings") + rd4 <- as_result_df(tbl, keep_label_rows = TRUE, expand_colnames = TRUE, data_format = "numeric") expect_equal(nrow(rd1), nrow(rd2) - 2) expect_equal(nrow(rd1), nrow(rd3) - 2) @@ -193,12 +193,12 @@ test_that("as_result_df keeps label rows", { ) }) -test_that("as_result_df as_is is producing a data.frame that is compatible with df_to_tt", { +test_that("as_result_df simplify is producing a data.frame that is compatible with df_to_tt", { # More challenging labels lyt <- make_big_lyt() tbl <- build_table(lyt, rawdat) - ard_out <- as_result_df(tbl, as_is = TRUE) + ard_out <- as_result_df(tbl, simplify = TRUE, keep_label_rows = TRUE) mf_tbl <- matrix_form(tbl) # Label works @@ -214,7 +214,7 @@ test_that("as_result_df as_is is producing a data.frame that is compatible with init_tbl <- df_to_tt(mtcars) end_tbl <- init_tbl %>% - as_result_df(as_is = TRUE) %>% + as_result_df(simplify = TRUE) %>% df_to_tt() expect_equal( diff --git a/vignettes/ard_how_to.Rmd b/vignettes/ard_how_to.Rmd new file mode 100644 index 000000000..22bf7d118 --- /dev/null +++ b/vignettes/ard_how_to.Rmd @@ -0,0 +1,92 @@ +--- +title: "How to generate QC-ready result data frames from tables" +author: "Davide Garolini" +date: "`r Sys.Date()`" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{How to generate QC-ready result data frames from tables} + %\VignetteEncoding{UTF-8} + %\VignetteEngine{knitr::rmarkdown} +editor_options: + chunk_output_type: console +--- + +```{r, include = FALSE} +suggested_dependent_pkgs <- c("dplyr") +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>", + eval = all(vapply( + suggested_dependent_pkgs, + requireNamespace, + logical(1), + quietly = TRUE + )) +) +``` + +```{r, echo=FALSE} +knitr::opts_chunk$set(comment = "#") +``` + + +```{css, echo=FALSE} +.reveal .r code { + white-space: pre; +} +``` +# Disclaimer + +This vignette is a work in progress. + +## Create the example table + +First of all we need a table to retrieve all the necessary information. Borrowing one from the [vignette](https://insightsengineering.github.io/rtables/latest-tag/articles/clinical_trials.html) about clinical trials. + +```{r} +library(rtables) +ADSL <- ex_adsl # Example ADSL dataset +mean_sd_custom <- function(x) { + mean <- mean(x, na.rm = FALSE) + sd <- sd(x, na.rm = FALSE) + + rcell(c(mean, sd), label = "Mean (SD)", format = "xx.x (xx.x)") +} +counts_percentage_custom <- function(x) { + # browser() + cnts <- table(x) + out <- lapply(cnts, function(x) { + perc <- x / sum(cnts) + rcell(c(x, perc), format = "xx. (xx.%)") + }) + in_rows(.list = as.list(out), .labels = names(cnts)) +} + +lyt <- basic_table(show_colcounts = TRUE, colcount_format = "N=xx") %>% + # split_rows_by("STRATA1", split_fun = keep_split_levels(c("A"))) %>% + # split_cols_by("STRATA2") %>% + split_cols_by("ARM", split_fun = keep_split_levels(c("A: Drug X", "B: Placebo"))) %>% + analyze(vars = "AGE", afun = mean_sd_custom) %>% + analyze(vars = "SEX", afun = counts_percentage_custom) + +tbl <- build_table(lyt, ADSL) +tbl +``` + +## Convert the table to a result data frame + +The `as_result_df` function is the one that converts a table to a result data frame. The result data frame is a data frame that contains the result of the summary table and is ready to be used for quality control purposes. This may differ for different standard and lets see how to produce different outputs. Final goal is having clearly one result for row. Lets play with different options. + +```{r} +as_result_df(tbl) + +as_result_df(tbl, data_format = "strings") +as_result_df(tbl, simplify = TRUE) +as_result_df(tbl, simplify = TRUE, keep_label_rows = TRUE) +as_result_df(tbl, simplify = TRUE, keep_label_rows = TRUE, expand_colnames = TRUE) +``` + +Now lets get the final `ARD` output. This is the one that is ready to be used for quality control purposes. +```{r} +as_result_df(tbl, make_ard = TRUE) +```