Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

261 multiple improvements to get_code function and it's documentation #263

Merged
merged 47 commits into from
Jan 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
c8a448c
test for ; case
m7pr Jan 18, 2024
a942d60
merge
m7pr Jan 18, 2024
d276477
Merge branch 'main' of https://github.com/insightsengineering/teal.data
m7pr Jan 22, 2024
e6c163f
fix if statement for single function call detection
m7pr Jan 22, 2024
c3b1146
Merge e6c163f40bf0585195d6db42a4e80a992aa30754 into cab0e418d8e1350cc…
m7pr Jan 22, 2024
a902355
[skip actions] Restyle files
github-actions[bot] Jan 22, 2024
8ec9e57
Empty-Commit
m7pr Jan 22, 2024
790722f
[skip actions] Roxygen Man Pages Auto Update
dependabot-preview[bot] Jan 22, 2024
040d88b
update roxygen
m7pr Jan 22, 2024
96e944c
Merge branch '261_get_code_fun_definition@main' of https://github.com…
m7pr Jan 22, 2024
3950d07
Empty-Commit
m7pr Jan 22, 2024
0201311
docs update
gogonzo Jan 22, 2024
75a2363
do not return downstream effects for objects
m7pr Jan 22, 2024
07c17a1
Merge 75a23639f65bb9eebd8c16a8048e7c7f150183a2 into cab0e418d8e1350cc…
m7pr Jan 22, 2024
ea3f6f1
[skip actions] Restyle files
github-actions[bot] Jan 22, 2024
3bb617b
change name of the test
m7pr Jan 22, 2024
c71ce5f
Merge branch '261_get_code_fun_definition@main' of https://github.com…
m7pr Jan 22, 2024
614280d
typo
m7pr Jan 22, 2024
ad3a9af
fix docs
gogonzo Jan 22, 2024
06b2837
Merge branch '261_get_code_fun_definition@main' of https://github.com…
gogonzo Jan 22, 2024
f7766e1
fix docs
gogonzo Jan 22, 2024
605dbd3
simplify examples
gogonzo Jan 22, 2024
a74707d
another proposition
gogonzo Jan 22, 2024
5dcf34f
update documentation
Jan 23, 2024
36d4c55
Merge branch 'main' into 261_get_code_fun_definition@main
m7pr Jan 23, 2024
cf78609
265 fix issue for using code in `quote` function with `get_code` (#266)
m7pr Jan 23, 2024
c334217
update documentation
Jan 23, 2024
43128b3
Revert quote detection from 265 (#269)
m7pr Jan 23, 2024
03fc5a4
update documentation
Jan 23, 2024
3f4b60c
unify dataset use
Jan 23, 2024
12ac157
271 fix `assign` function detection for cases with more than 2 argume…
m7pr Jan 24, 2024
4623736
Merge 12ac15761666889d4495dcdc8164016ea12d7bea into 7c83774d6a6574384…
m7pr Jan 24, 2024
6458b6e
[skip actions] Restyle files
github-actions[bot] Jan 24, 2024
127000e
Empty-Commit
m7pr Jan 24, 2024
1ce4ad2
cleanup tests
m7pr Jan 24, 2024
bfb3fdd
Merge 1ce4ad20ed957d2c7890a82dd8f25f8a61c4f49d into 7c83774d6a6574384…
m7pr Jan 24, 2024
a464e02
[skip actions] Restyle files
github-actions[bot] Jan 24, 2024
8e5b211
Empty-Commit
m7pr Jan 24, 2024
5917a92
267 fix `@linksto` tag with `eval()` in last line of the evaluated co…
m7pr Jan 24, 2024
50b2825
Merge 5917a92c1d1f790a163de4faa1c062c05e6146e6 into 7c83774d6a6574384…
m7pr Jan 24, 2024
4e5785e
[skip actions] Restyle files
github-actions[bot] Jan 24, 2024
aabd2b9
restart cicd
gogonzo Jan 24, 2024
de59c13
Update R/teal_data-get_code.R
m7pr Jan 24, 2024
34ebcaf
missing comment
m7pr Jan 24, 2024
14f1099
fix issue with ; usage in code
m7pr Jan 24, 2024
3ef0707
latest change in get_children appends a new row in calls_pd with the …
m7pr Jan 24, 2024
1cb250c
simplify code
gogonzo Jan 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ Encoding: UTF-8
Language: en-US
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.0
RoxygenNote: 7.3.1
Collate:
'cdisc_data.R'
'data.R'
Expand Down
81 changes: 73 additions & 8 deletions R/teal_data-get_code.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,23 +3,86 @@
#' Retrieve code from `teal_data` object.
#'
#' Retrieve code stored in `@code`, which (in principle) can be used to recreate all objects found in `@env`.
#' Use `datanames` to limit the code to one or more of the data sets enumerated in `@datanames`.
#' Use `datanames` to limit the code to one or more of the datasets enumerated in `@datanames`.
#' If the code has not passed verification (with [`verify()`]), a warning will be prepended.
#'
#' @section Notes for Developers:
#' To learn more about how a subset of code needed to reproduce a specific data set is extracted from all code,
#' see [`get_code_dependency()`].
#' @section Extracting dataset-specific code:
#' When `datanames` is specified, the code returned will be limited to the lines needed to _create_
#' the requested datasets. The code stored in the `@code` slot is analyzed statically to determine
#' which lines the datasets of interest depend upon. The analysis works well when objects are created
chlebowa marked this conversation as resolved.
Show resolved Hide resolved
#' with standard infix assignment operators (see `?assignOps`) but it can fail in some situations.
chlebowa marked this conversation as resolved.
Show resolved Hide resolved
#'
#' Consider the following examples:
#'
#' _Case 1: Usual assignments._
#' ```r
#' data <- teal_data() |>
#' within({
#' foo <- function(x) {
#' x + 1
#' }
#' x <- 0
#' y <- foo(x)
#' })
#' get_code(data, datanames = "y")
#' ```
#' `x` has no dependencies, so `get_code(data, datanames = "x")` will return only the second call.\cr
#' `y` depends on `x` and `foo`, so `get_code(data, datanames = "y")` will contain all three calls.
#'
#' _Case 2: Some objects are created by a function's side effects._
#' ```r
#' data <- teal_data() |>
#' within({
#' foo <- function() {
#' x <<- x + 1
#' }
#' x <- 0
#' foo()
#' y <- x
#' })
#' get_code(data, datanames = "y")
#' ```
#' Here, `y` depends on `x` but `x` is modified by `foo` as a side effect (not by reassignment)
#' and so `get_code(data, datanames = "y")` will not return the `foo()` call.\cr
#' To overcome this limitation, code dependencies can be specified manually.
#' Lines where side effects occur can be flagged by adding "`# @linksto <object name>`" at the end.\cr
#' Note that `within` evaluates code passed to `expr` as is and comments are ignored.
#' In order to include comments in code one must use the `eval_code` function instead.
#'
#' ```r
#' data <- teal_data() |>
#' eval_code("
#' foo <- function() {
#' x <<- x + 1
#' }
#' x <- 0
#' foo() # @linksto x
#' y <- x
#' ")
#' get_code(data, datanames = "y")
#' ```
#' Now the `foo()` call will be properly included in the code required to recreate `y`.
#'
#' Note that two functions that create objects as side effects, `assign` and `data`, are handled automatically.
#'
#' Here are known cases where manual tagging is necessary:
#' - non-standard assignment operators, _e.g._ `%<>%`
#' - objects used as conditions in `if` statements: `if (<condition>)`
#' - objects used to iterate over in `for` loops: `for(i in <sequence>)`
#' - creating and evaluating language objects, _e.g._ `eval(<call>)`
#'
#'
#' @param object (`teal_data`)
#' @param datanames `r lifecycle::badge("experimental")` (`character`) vector of data set names to return the code for.
#' @param datanames `r lifecycle::badge("experimental")` (`character`) vector of dataset names to return the code for.
m7pr marked this conversation as resolved.
Show resolved Hide resolved
#' For more details see the "Extracting dataset-specific code" section.
#' @param deparse (`logical`) flag specifying whether to return code as `character` (`deparse = TRUE`) or as
#' `expression` (`deparse = FALSE`).
#'
#' @return
#' Either string or an expression representing code used to create the requested data sets.
#' Either a character string or an expression. If `datanames` is used to request a specific dataset,
#' only code that _creates_ that dataset (not code that uses it) is returned. Otherwise, all contents of `@code`.
chlebowa marked this conversation as resolved.
Show resolved Hide resolved
#'
#' @examples
#'
#' tdata1 <- teal_data()
#' tdata1 <- within(tdata1, {
#' a <- 1
Expand All @@ -33,8 +96,10 @@
#' tdata2 <- teal_data(x1 = iris, code = "x1 <- iris")
#' get_code(tdata2)
#' get_code(verify(tdata2))
#'
#' @rdname get_code
#' @aliases get_code,teal_data-method
#' @aliases get_code
#'
#' @export
setMethod("get_code", signature = "teal_data", definition = function(object, deparse = TRUE, datanames = NULL) {
checkmate::assert_character(datanames, min.len = 1L, null.ok = TRUE)
Expand Down
100 changes: 78 additions & 22 deletions R/utils-get_code_dependency.R
Original file line number Diff line number Diff line change
Expand Up @@ -99,9 +99,19 @@ find_call <- function(call_pd, text) {
#' @keywords internal
#' @noRd
extract_calls <- function(pd) {
calls <- lapply(pd[pd$parent == 0, "id"], get_children, pd = pd)
calls <- lapply(
pd[pd$parent == 0, "id"],
function(parent) {
rbind(
pd[pd$id == parent, c("token", "text", "id", "parent")],
get_children(pd = pd, parent = parent)
)
}
)
calls <- Filter(function(call) !(nrow(call) == 1 && call$token == "';'"), calls)
calls <- Filter(Negate(is.null), calls)
fix_comments(calls)
calls <- fix_shifted_comments(calls)
fix_arrows(calls)
}

#' @keywords internal
Expand All @@ -118,19 +128,43 @@ get_children <- function(pd, parent) {
}
}

#' Fixes edge case of comments being shifted to the next call.
#' @keywords internal
#' @noRd
fix_comments <- function(calls) {
# If the first token is a COMMENT, then it belongs to the previous call.
fix_shifted_comments <- function(calls) {
# If the first or the second token is a @linksto COMMENT,
# then it belongs to the previous call.
if (length(calls) >= 2) {
for (i in 2:length(calls)) {
if (grepl("@linksto", calls[[i]][1, "text"])) {
calls[[i - 1]] <- rbind(calls[[i - 1]], calls[[i]][1, ])
calls[[i]] <- calls[[i]][-1, ]
comment_idx <- grep("@linksto", calls[[i]][, "text"])
if (isTRUE(comment_idx[1] <= 2)) {
calls[[i - 1]] <- rbind(
calls[[i - 1]],
calls[[i]][seq_len(comment_idx[1]), ]
)
calls[[i]] <- calls[[i]][-seq_len(comment_idx[1]), ]
}
}
}
calls
Filter(nrow, calls)
}

#' Fixes edge case of `<-` assignment operator being called as function,
#' which is \code{`<-`(y,x)} instead of traditional `y <- x`.
#' @keywords internal
#' @noRd
fix_arrows <- function(calls) {
lapply(
calls,
function(call) {
call[call$token == "SYMBOL_FUNCTION_CALL" & call$text == "`<-`", c("token", "text")] <- c("LEFT_ASSIGN", "<-")
call[call$token == "SYMBOL_FUNCTION_CALL" & call$text == "`->`", c("token", "text")] <- c("RIGHT_ASSIGN", "->")
call[call$token == "SYMBOL_FUNCTION_CALL" & call$text == "`<<-`", c("token", "text")] <- c("LEFT_ASSIGN", "<-")
call[call$token == "SYMBOL_FUNCTION_CALL" & call$text == "`->>`", c("token", "text")] <- c("RIGHT_ASSIGN", "->")
call[call$token == "SYMBOL_FUNCTION_CALL" & call$text == "`=`", c("token", "text")] <- c("LEFT_ASSIGN", "<-")
call
}
)
}

# code_graph ----
Expand Down Expand Up @@ -199,29 +233,50 @@ extract_occurrence <- function(calls_pd) {
data_call <- find_call(call_pd, "data")
if (data_call) {
sym <- call_pd[data_call + 1, "text"]
return(c(gsub("^['\"]|['\"]$", "", sym), "<-", "data"))
return(c(gsub("^['\"]|['\"]$", "", sym), "<-"))
}
# Handle assign().
# Handle assign(x = ).
assign_call <- find_call(call_pd, "assign")
if (assign_call) {
# Check if parameters were named.
# "','" is for unnamed parameters, where "SYMBOL_SUB" is for named.
# "EQ_SUB" is for `=` appearing after the name of the named parameter.
if (any(call_pd$token == "SYMBOL_SUB")) {
params <- call_pd[call_pd$token == "SYMBOL_SUB", "text"]
pos <- match("x", params, nomatch = length(params) + 1L)
params <- call_pd[call_pd$token %in% c("SYMBOL_SUB", "','", "EQ_SUB"), "text"]
# Remove sequence of "=", ",".
if (length(params > 1)) {
remove <- integer(0)
for (i in 2:length(params)) {
if (params[i - 1] == "=" & params[i] == ",") {
remove <- c(remove, i - 1, i)
}
}
if (length(remove)) params <- params[-remove]
}
pos <- match("x", setdiff(params, ","), nomatch = match(",", params, nomatch = 0))
if (!pos) {
return(character(0L))
}
# pos is indicator of the place of 'x'
# 1. All parameters are named, but none is 'x' - return(character(0L))
# 2. Some parameters are named, 'x' is in named parameters: match("x", setdiff(params, ","))
# - check "x" in params being just a vector of named parameters.
# 3. Some parameters are named, 'x' is not in named parameters
# - check first appearance of "," (unnamed parameter) in vector parameters.
} else {
# Object is the first entry after 'assign'.
pos <- 1
}
sym <- call_pd[assign_call + pos, "text"]
return(c(gsub("^['\"]|['\"]$", "", sym), "<-", "assign"))
return(c(gsub("^['\"]|['\"]$", "", sym), "<-"))
}

# What occurs in a function body is not tracked.
x <- call_pd[!is_in_function(call_pd), ]
sym_cond <- which(x$token %in% c("SYMBOL", "SYMBOL_FUNCTION_CALL"))

if (length(sym_cond) == 0) {
return(character(0))
return(character(0L))
}
# Watch out for SYMBOLS after $ and @. For x$a x@a: x is object, a is not.
# For x$a, a's ID is $'s ID-2 so we need to remove all IDs that have ID = $ID - 2.
Expand All @@ -232,20 +287,21 @@ extract_occurrence <- function(calls_pd) {
sym_cond <- setdiff(sym_cond, which(x$id %in% after_dollar))
}

# If there was an assignment operation detect direction of it.
ass_cond <- grep("ASSIGN", x$token)
if (length(ass_cond)) { # NOTE 1
sym_cond <- sym_cond[sym_cond > ass_cond] # NOTE 2
if (!length(ass_cond)) {
return(c("<-", unique(x[sym_cond, "text"])))
}
if ((length(ass_cond) && x$text[ass_cond] == "->") || !length(ass_cond)) { # NOTE 3

sym_cond <- sym_cond[sym_cond > ass_cond] # NOTE 1
# If there was an assignment operation detect direction of it.
if (unique(x$text[ass_cond]) == "->") { # NOTE 2
sym_cond <- rev(sym_cond)
}

append(unique(x[sym_cond, "text"]), "<-", after = 1)

### NOTE 3: What if there are 2+ assignments, e.g. a <- b -> c or e.g. a <- b <- c.
### NOTE 2: For cases like 'eval(expression(b <- b + 2))' removes 'eval(expression('.
### NOTE 1: Cases like 'data(iris)' that do not have an assignment operator.
### NOTE 1: Then they are parsed as c("iris", "<-", "data")
### NOTE 2: What if there are 2 assignments: e.g. a <- b -> c.
### NOTE 1: For cases like 'eval(expression(b <- b + 2))' removes 'eval(expression('.
}
)
}
Expand Down
50 changes: 0 additions & 50 deletions man/get_code-teal_data-method.Rd

This file was deleted.

Loading