insightsengineering · edelarua · Sep 5, 2024 · Nov 14, 2023 · Nov 14, 2023 · Nov 14, 2023
diff --git a/NEWS.md b/NEWS.md
@@ -19,6 +19,7 @@
 
 ### Miscellaneous
 * Began deprecation of the confusing functions `summary_formats` and `summary_labels`.
+* Enhanced general descriptions of analyze and summarize functions throughout package documentation.
 
 # tern 0.9.5
 

diff --git a/R/abnormal.R b/R/abnormal.R
@@ -1,13 +1,21 @@
-#' Patient counts with abnormal range values
+#' Count patients with abnormal range values
 #'
 #' @description `r lifecycle::badge("stable")`
 #'
-#' Primary analysis variable `.var` indicates the abnormal range result (`character` or `factor`)
-#' and additional analysis variables are `id` (`character` or `factor`) and `baseline` (`character` or
-#' `factor`). For each direction specified in `abnormal` (e.g. high or low) count patients in the
-#' numerator and denominator as follows:
-#'   * `num` : The number of patients with this abnormality recorded while on treatment.
-#'   * `denom`: The number of patients with at least one post-baseline assessment.
+#' The analyze function [count_abnormal()] creates a layout element to count patients with abnormal analysis range
+#' values in each direction.
+#'
+#' This function analyzes primary analysis variable `var` which indicates abnormal range results.
+#' Additional analysis variables that can be supplied as a list via the `variables` parameter are
+#' `id` (defaults to `USUBJID`), a variable to indicate unique subject identifiers, and `baseline`
+#' (defaults to `BNRIND`), a variable to indicate baseline reference ranges.
+#'
+#' For each direction specified via the `abnormal` parameter (e.g. High or Low), a fraction of
+#' patient counts is returned, with numerator and denominator calculated as follows:
+#'   * `num`: The number of patients with this abnormality recorded while on treatment.
+#'   * `denom`: The total number of patients with at least one post-baseline assessment.
+#'
+#' This function assumes that `df` has been filtered to only include post-baseline records.
 #'
 #' @inheritParams argument_convention
 #' @param abnormal (named `list`)\cr list identifying the abnormal range level(s) in `var`. Defaults to
@@ -19,11 +27,12 @@
 #'   to see available statistics for this function.
 #'
 #' @note
-#' * `count_abnormal()` only works with a single variable containing multiple abnormal levels.
-#' * `df` should be filtered to include only post-baseline records.
-#' * the denominator includes patients that might have other abnormal levels at baseline,
-#'   and patients with missing baseline. Patients with these abnormalities at
-#'   baseline can be optionally excluded from numerator and denominator.
+#' * `count_abnormal()` only considers a single variable that contains multiple abnormal levels.
+#' * `df` should be filtered to only include post-baseline records.
+#' * The denominator includes patients that may have other abnormal levels at baseline,
+#'   and patients missing baseline records. Patients with these abnormalities at
+#'   baseline can be optionally excluded from numerator and denominator via the
+#'   `exclude_base_abn` parameter.
 #'
 #' @name abnormal
 #' @include formatting_functions.R

diff --git a/R/abnormal_by_baseline.R b/R/abnormal_by_baseline.R
@@ -1,20 +1,31 @@
-#' Patient counts with abnormal range values by baseline status
+#' Count patients with abnormal analysis range values by baseline status
 #'
 #' @description `r lifecycle::badge("stable")`
 #'
-#' Primary analysis variable `.var` indicates the abnormal range result (`character` or `factor`), and additional
-#' analysis variables are `id` (`character` or `factor`) and `baseline` (`character` or `factor`). For each
-#' direction specified in `abnormal` (e.g. high or low) we condition on baseline range result and count
-#' patients in the numerator and denominator as follows:
-#'   * `Not <Abnormal>`
-#'     * `denom`: the number of patients without abnormality at baseline (excluding those with missing baseline)
-#'     * `num`:  the number of patients in `denom` who also have at least one abnormality post-baseline
-#'   * `<Abnormal>`
-#'     * `denom`: the number of patients with abnormality at baseline
-#'     * `num`: the number of patients in `denom` who also have at least one abnormality post-baseline
+#' The analyze function [count_abnormal_by_baseline()] creates a layout element to count patients with abnormal
+#' analysis range values, categorized by baseline status.
+#'
+#' This function analyzes primary analysis variable `var` which indicates abnormal range results. Additional
+#' analysis variables that can be supplied as a list via the `variables` parameter are `id` (defaults to
+#' `USUBJID`), a variable to indicate unique subject identifiers, and `baseline` (defaults to `BNRIND`), a
+#' variable to indicate baseline reference ranges.
+#'
+#' For each direction specified via the `abnormal` parameter (e.g. High or Low), we condition on baseline
+#' range result and count patients in the numerator and denominator as follows for each of the following
+#' categories:
+#'   * `Not <abnormality>`
+#'     * `num`:  The number of patients without abnormality at baseline (excluding those with missing baseline)
+#'       and with at least one abnormality post-baseline.
+#'     * `denom`: The number of patients without abnormality at baseline (excluding those with missing baseline).
+#'   * `<Abnormality>`
+#'     * `num`: The number of patients with abnormality as baseline and at least one abnormality post-baseline.
+#'     * `denom`: The number of patients with abnormality at baseline.
 #'   * `Total`
-#'     * `denom`: the number of patients with at least one valid measurement post-baseline
-#'     * `num`: the number of patients in `denom` who also have at least one abnormality post-baseline
+#'     * `num`: The number of patients with at least one post-baseline record and at least one abnormality
+#'       post-baseline.
+#'     * `denom`: The number of patients with at least one post-baseline record.
+#'
+#' This function assumes that `df` has been filtered to only include post-baseline records.
 #'
 #' @inheritParams argument_convention
 #' @param abnormal (`character`)\cr values identifying the abnormal range level(s) in `.var`.
@@ -23,8 +34,8 @@
 #'
 #' @note
 #' * `df` should be filtered to include only post-baseline records.
-#' * If the baseline variable or analysis variable contains `NA`, it is expected that `NA` has been
-#'   conveyed to `na_level` appropriately beforehand with [df_explicit_na()] or [explicit_na()].
+#' * If the baseline variable or analysis variable contains `NA` records, it is expected that `df` has been
+#'   pre-processed using [df_explicit_na()] or [explicit_na()].
 #'
 #' @seealso Relevant description function [d_count_abnormal_by_baseline()].
 #'

diff --git a/R/abnormal_by_marked.R b/R/abnormal_by_marked.R
@@ -2,14 +2,28 @@
 #'
 #' @description `r lifecycle::badge("stable")`
 #'
-#' Primary analysis variable `.var` indicates whether single, replicated or last marked laboratory
-#' abnormality was observed (`factor`). Additional analysis variables are `id` (`character` or `factor`)
-#' and `direction` (`factor`) indicating the direction of the abnormality. Denominator is number of
-#' patients with at least one valid measurement during the analysis.
-#'   * For `Single, not last` and `Last or replicated`: Numerator is number of patients
-#'     with `Single, not last` and `Last or replicated` levels, respectively.
-#'   * For `Any`: Numerator is the number of patients with either single or
-#'     replicated marked abnormalities.
+#' The analyze function [count_abnormal_by_marked()] creates a layout element to count patients with marked laboratory
+#' abnormalities for each direction of abnormality, categorized by parameter value.
+#'
+#' This function analyzes primary analysis variable `var` which indicates whether a single, replicated,
+#' or last marked laboratory abnormality was observed. Levels of `var` to include for each marked lab
+#' abnormality (`single` and `last_replicated`) can be supplied via the `category` parameter. Additional
+#' analysis variables that can be supplied as a list via the `variables` parameter are `id` (defaults
+#' to `USUBJID`), a variable to indicate unique subject identifiers, `param` (defaults to `PARAM`), a
+#' variable to indicate parameter values, and `direction` (defaults to `abn_dir`), a variable to indicate
+#' abnormality directions.
+#'
+#' For each combination of `param` and `direction` levels, marked lab abnormality counts are calculated
+#' as follows:
+#'   * `Single, not last` & `Last or replicated`: The number of patients with `Single, not last`
+#'     and `Last or replicated` values, respectively.
+#'   * `Any`: The number of patients with either single or replicated marked abnormalities.
+#'
+#' Fractions are calculated by dividing the above counts by the number of patients with at least one
+#' valid measurement recorded during the analysis.
+#'
+#' Prior to using this function in your table layout you must use [rtables::split_rows_by()] to create two
+#' row splits, one on variable `param` and one on variable `direction`.
 #'
 #' @inheritParams argument_convention
 #' @param category (`list`)\cr a list with different marked category names for single

diff --git a/R/abnormal_by_worst_grade.R b/R/abnormal_by_worst_grade.R
@@ -1,20 +1,31 @@
-#' Patient counts with the most extreme post-baseline toxicity grade per direction of abnormality
+#' Count patients by most extreme post-baseline toxicity grade per direction of abnormality
 #'
 #' @description `r lifecycle::badge("stable")`
 #'
-#' Primary analysis variable `.var` indicates the toxicity grade (`factor`), and additional
-#' analysis variables are `id` (`character` or `factor`), `param` (`factor`) and `grade_dir` (`factor`).
-#' The pre-processing steps are crucial when using this function.
-#' For a certain direction (e.g. high or low) this function counts
-#' patients in the denominator as number of patients with at least one valid measurement during treatment,
-#' and patients in the numerator as follows:
-#'   * `1` to `4`: Numerator is number of patients with worst grades 1-4 respectively;
-#'   * `Any`: Numerator is number of patients with at least one abnormality, which means grade is different from 0.
+#' The analyze function [count_abnormal_by_worst_grade()] creates a layout element to count patients by highest (worst)
+#' analysis toxicity grade post-baseline for each direction, categorized by parameter value.
+#'
+#' This function analyzes primary analysis variable `var` which indicates toxicity grades. Additional
+#' analysis variables that can be supplied as a list via the `variables` parameter are `id` (defaults to
+#' `USUBJID`), a variable to indicate unique subject identifiers, `param` (defaults to `PARAM`), a variable
+#' to indicate parameter values, and `grade_dir` (defaults to `GRADE_DIR`), a variable to indicate directions
+#' (e.g. High or Low) for each toxicity grade supplied in `var`.
+#'
+#' For each combination of `param` and `grade_dir` levels, patient counts by worst
+#' grade are calculated as follows:
+#'   * `1` to `4`: The number of patients with worst grades 1-4, respectively.
+#'   * `Any`: The number of patients with at least one abnormality (i.e. grade is not 0).
+#'
+#' Fractions are calculated by dividing the above counts by the number of patients with at least one
+#' valid measurement recorded during treatment.
 #'
 #' Pre-processing is crucial when using this function and can be done automatically using the
 #' [h_adlb_abnormal_by_worst_grade()] helper function. See the description of this function for details on the
 #' necessary pre-processing steps.
 #'
+#' Prior to using this function in your table layout you must use [rtables::split_rows_by()] to create two row
+#' splits, one on variable `param` and one on variable `grade_dir`.
+#'
 #' @inheritParams argument_convention
 #' @param .stats (`character`)\cr statistics to select for the table. Run `get_stats("abnormal_by_worst_grade")`
 #'   to see available statistics for this function.

diff --git a/R/abnormal_by_worst_grade_worsen.R b/R/abnormal_by_worst_grade_worsen.R
@@ -1,8 +1,27 @@
-#' Patient counts for laboratory events (worsen from baseline) by highest grade post-baseline
+#' Count patients with toxicity grades that have worsened from baseline by highest grade post-baseline
 #'
 #' @description `r lifecycle::badge("stable")`
 #'
-#' Patient count and fraction for laboratory events (worsen from baseline) shift table.
+#' The analyze function [count_abnormal_lab_worsen_by_baseline()] creates a layout element to count patients with
+#' analysis toxicity grades which have worsened from baseline, categorized by highest (worst) grade post-baseline.
+#'
+#' This function analyzes primary analysis variable `var` which indicates analysis toxicity grades. Additional
+#' analysis variables that can be supplied as a list via the `variables` parameter are `id` (defaults to `USUBJID`),
+#' a variable to indicate unique subject identifiers, `baseline_var` (defaults to `BTOXGR`), a variable to indicate
+#' baseline toxicity grades, and `direction_var` (defaults to `GRADDIR`), a variable to indicate toxicity grade
+#' directions of interest to include (e.g. `"H"` (high), `"L"` (low), or `"B"` (both)).
+#'
+#' For the direction(s) specified in `direction_var`, patient counts by worst grade for patients who have
+#' worsened from baseline are calculated as follows:
+#'   * `1` to `4`: The number of patients who have worsened from their baseline grades with worst
+#'     grades 1-4, respectively.
+#'   * `Any`: The total number of patients who have worsened from their baseline grades.
+#'
+#' Fractions are calculated by dividing the above counts by the number of patients who's analysis toxicity grades
+#' have worsened from baseline toxicity grades during treatment.
+#'
+#' Prior to using this function in your table layout you must use [rtables::split_rows_by()] to create a row
+#' split on variable `direction_var`.
 #'
 #' @inheritParams argument_convention
 #' @param variables (named `list` of `string`)\cr list of additional analysis variables including:
@@ -12,7 +31,8 @@
 #' @param .stats (`character`)\cr statistics to select for the table. Run `get_stats("abnormal_by_worst_grade_worsen")`
 #'   to see all available statistics.
 #'
-#' @seealso Relevant helper functions [h_adlb_worsen()] and [h_worsen_counter()]
+#' @seealso Relevant helper functions [h_adlb_worsen()] and [h_worsen_counter()] which are used within
+#' [s_count_abnormal_lab_worsen_by_baseline()] to process input data.
 #'
 #' @name abnormal_by_worst_grade_worsen
 #' @order 1

diff --git a/R/analyze_colvars_functions.R b/R/analyze_colvars_functions.R
@@ -1,4 +1,4 @@
-#' Analyze functions on columns
+#' Analyze functions in columns
 #'
 #' @description
 #' These functions are wrappers of [rtables::analyze_colvars()] which apply corresponding `tern`

diff --git a/R/analyze_functions.R b/R/analyze_functions.R
@@ -6,6 +6,7 @@
 #' to add an analysis to a given table layout:
 #'
 #' * [analyze_num_patients()]
+#' * [analyze_vars()]
 #' * [compare_vars()]
 #' * [count_abnormal()]
 #' * [count_abnormal_by_baseline()]
@@ -32,14 +33,13 @@
 #'   leverage `analyze_colvars` to have the context split in rows and the analysis
 #'   methods in columns.
 #' * [summarize_change()]
-#' * [analyze_vars()]
 #' * [surv_time()]
 #' * [surv_timepoint()]
 #' * [test_proportion_diff()]
 #'
 #' @seealso
-#'   * [summarize_functions] for functions which are wrappers for [rtables::summarize_row_groups()].
 #'   * [analyze_colvars_functions] for functions that are wrappers for [rtables::analyze_colvars()].
+#'   * [summarize_functions] for functions which are wrappers for [rtables::summarize_row_groups()].
 #'
 #' @name analyze_functions
 NULL
diff --git a/R/analyze_variables.R b/R/analyze_variables.R
@@ -32,11 +32,11 @@ control_analyze_vars <- function(conf_level = 0.95,
 #'
 #' @description `r lifecycle::badge("stable")`
 #'
-#' The analyze function [analyze_vars()] generates a summary of one or more variables, using the S3 generic function
-#' [s_summary()] to calculate a list of summary statistics. A list of all available statistics for numeric
-#' variables can be viewed by running `get_stats("analyze_vars_numeric")` and for non-numeric variables by running
-#' `get_stats("analyze_vars_counts")`. Use the `.stats` parameter to specify the statistics to include in your output
-#' summary table.
+#' The analyze function [analyze_vars()] creates a layout element to summarize one or more variables, using the S3
+#' generic function [s_summary()] to calculate a list of summary statistics. A list of all available statistics for
+#' numeric variables can be viewed by running `get_stats("analyze_vars_numeric")` and for non-numeric variables by
+#' running `get_stats("analyze_vars_counts")`. Use the `.stats` parameter to specify the statistics to include in your
+#' output summary table.
 #'
 #' @details
 #' **Automatic digit formatting:** The number of digits to display can be automatically determined from the analyzed

diff --git a/R/analyze_vars_in_cols.R b/R/analyze_vars_in_cols.R
@@ -1,10 +1,12 @@
-#' Summarize numeric variables in columns
+#' Analyze numeric variables in columns
 #'
 #' @description `r lifecycle::badge("experimental")`
 #'
-#' Layout-creating function which can be used for creating column-wise summary tables.
-#' This function sets the analysis methods as column labels and is a wrapper for
-#' [rtables::analyze_colvars()]. It was designed principally for PK tables.
+#' The layout-creating function [analyze_vars_in_cols()] creates a layout element to generate a column-wise
+#' analysis table.
+#'
+#' This function sets the analysis methods as column labels and is a wrapper for [rtables::analyze_colvars()].
+#' It was designed principally for PK tables.
 #'
 #' @inheritParams argument_convention
 #' @inheritParams rtables::analyze_colvars
@@ -35,16 +37,13 @@
 #' Adding this function to an `rtable` layout will summarize the given variables, arrange the output
 #' in columns, and add it to the table layout.
 #'
-#' @note This is an experimental implementation of [rtables::summarize_row_groups()] and
-#'   [rtables::analyze_colvars()] that may be subjected to changes as `rtables` extends its
-#'   support to more complex analysis pipelines on the column space. For the same reasons,
-#'   we encourage to read the examples carefully and file issues for cases that differ from
-#'   them.
-#'
-#'   Here `labelstr` behaves differently than usual. If it is not defined (default as `NULL`),
-#'   row labels are assigned automatically to the split values in case of `rtables::analyze_colvars`
-#'   (`do_summarize_row_groups = FALSE`, the default), and to the group label for
-#'   `do_summarize_row_groups = TRUE`.
+#' @note
+#' * This is an experimental implementation of [rtables::summarize_row_groups()] and [rtables::analyze_colvars()]
+#'   that may be subjected to changes as `rtables` extends its support to more complex analysis pipelines in the
+#'   column space. We encourage users to read the examples carefully and file issues for different use cases.
+#' * In this function, `labelstr` behaves atypically. If `labelstr = NULL` (the default), row labels are assigned
+#'   automatically as the split values if `do_summarize_row_groups = FALSE` (the default), and as the group label
+#'   if `do_summarize_row_groups = TRUE`.
 #'
 #' @seealso [analyze_vars()], [rtables::analyze_colvars()].
 #'