Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

topological_sort for datanames(teal_data()) #318

Closed
wants to merge 24 commits into from
Closed
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
c8a448c
test for ; case
m7pr Jan 18, 2024
a942d60
merge
m7pr Jan 18, 2024
d276477
Merge branch 'main' of https://github.com/insightsengineering/teal.data
m7pr Jan 22, 2024
d2727d4
Merge branch 'main' of https://github.com/insightsengineering/teal.data
m7pr Jan 24, 2024
3fa1a8d
Merge branch 'main' of https://github.com/insightsengineering/teal.data
m7pr Jan 26, 2024
b4d1d9f
Merge branch 'main' of https://github.com/insightsengineering/teal.data
m7pr Feb 1, 2024
fc2e59e
Merge branch 'main' of https://github.com/insightsengineering/teal.data
m7pr Feb 5, 2024
484915e
Merge branch 'main' of https://github.com/insightsengineering/teal.data
m7pr Feb 6, 2024
a53e3a1
Merge branch 'main' of https://github.com/insightsengineering/teal.data
m7pr Feb 22, 2024
3c98a37
Merge branch 'main' of https://github.com/insightsengineering/teal.data
m7pr Mar 20, 2024
f195d1c
Merge branch 'main' of https://github.com/insightsengineering/teal.data
m7pr May 8, 2024
c0d8065
Merge branch 'main' of https://github.com/insightsengineering/teal.data
m7pr May 14, 2024
4eeca16
Merge branch 'main' of https://github.com/insightsengineering/teal.data
m7pr Jul 30, 2024
5cdb5f7
WIP - topological sort for datanames
m7pr Jul 30, 2024
621cfb5
Update R/teal_data-datanames.R
m7pr Jul 31, 2024
96228a0
add two tests for topological_sort
m7pr Jul 31, 2024
784965b
WIP - topological sort for datanames
m7pr Jul 31, 2024
2cbff27
Merge branch 'topological_sort@669_insertUI@main' of https://github.c…
m7pr Jul 31, 2024
3050093
Update R/teal_data-datanames.R
m7pr Jul 31, 2024
c8ebd82
[skip style] [skip vbump] Restyle files
github-actions[bot] Jul 31, 2024
8a84286
datanames() test for join_keys set during teal_data creation
m7pr Jul 31, 2024
e58c298
Merge branch 'topological_sort@669_insertUI@main' of https://github.c…
m7pr Jul 31, 2024
ee389a5
[skip roxygen] [skip vbump] Roxygen Man Pages Auto Update
dependabot-preview[bot] Jul 31, 2024
f6be311
set safely
gogonzo Jul 31, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions R/join_keys.R
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,7 @@ join_keys.teal_data <- function(...) {
#' join_keys(td)
`join_keys<-.teal_data` <- function(x, value) {
join_keys(x@join_keys) <- value
x@datanames <- sort_datanames(x@datanames, x@join_keys)
gogonzo marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reload datanames

Suggested change
x@datanames <- sort_datanames(x@datanames, x@join_keys)
datanames(x) <- x@datanames

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really nice, but needed to think a couple of time what it does.
If we have x@datanames <- sort_datanames(x@datanames, x@join_keys) then you know sorting is done.
If we have datanames(x) <- x@datanames then only if you know behavior datanames(x) you know that sorting is applied.

So maybe, to make it clear, let's have a comment:

datanames(x) <- x@datanames  #datanames() triggers sort_datanames()/topological_sort()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one thing missing. If we want to set datanames and include parent-datanames we need to setValidity on @datanames ~ names(@env) consistency

x
}

Expand Down
2 changes: 2 additions & 0 deletions R/teal_data-class.R
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,8 @@ new_teal_data <- function(data,
new_env <- rlang::env_clone(list2env(data), parent = parent.env(.GlobalEnv))
lockEnvironment(new_env, bindings = TRUE)

datanames <- sort_datanames(datanames, join_keys)

methods::new(
"teal_data",
env = new_env,
Expand Down
15 changes: 15 additions & 0 deletions R/teal_data-datanames.R
Original file line number Diff line number Diff line change
Expand Up @@ -43,10 +43,25 @@ setGeneric("datanames<-", function(x, value) standardGeneric("datanames<-"))
setMethod("datanames<-", signature = c("teal_data", "character"), definition = function(x, value) {
checkmate::assert_subset(value, names(x@env))
x@datanames <- value
x@datanames <- sort_datanames(x@datanames, x@join_keys)
gogonzo marked this conversation as resolved.
Show resolved Hide resolved
methods::validObject(x)
x
})
setMethod("datanames<-", signature = c("qenv.error", "character"), definition = function(x, value) {
methods::validObject(x)
x
})


#' @keywords internal
sort_datanames <- function(datanames, joinkeys) {

child_parent <- sapply(
datanames,
function(name) teal.data::parent(joinkeys, name),
m7pr marked this conversation as resolved.
Show resolved Hide resolved
USE.NAMES = TRUE,
simplify = FALSE
)

union(unlist(topological_sort(child_parent)), datanames)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
union(unlist(topological_sort(child_parent)), datanames)
union(
intersect(unlist(topological_sort(child_parent)), datanames), # join_keys can be set for inexisting datanames
datanames
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does performing intersect has any consequence to datanames?

With this suggestion, datanames always remains the same

Copy link
Contributor Author

@m7pr m7pr Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this union(intersect(A, B), B) is just B

is_B <- function(A, B) {
  unionAB <- union(intersect(A, B), B)
  cat('union', paste(unionAB, collapse = ','), '\n')
  cat('B',     paste(B,       collapse = ','), '\n')
  
  identical(
    sort(unionAB),
    sort(B)
  )
}

is_B(letters[sample(1:10, size = 5)], letters[sample(1:10, size = 3)])

table(
  replicate(
    100,
    is_B(letters[sample(1:10, size = 5)], letters[sample(1:10, size = 3)])
  )
)
> TRUE 
>  100

Copy link
Contributor

@gogonzo gogonzo Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@averissimo I don't think so, this can be translated into

# sorted but always a subset of datanames
# this is needed because join_keys can possibly contain different datanames
# as join_keys are not so stricltly protected
# - also insersect defines order based on the first argument
#   intersect(c("c", "b"), c("a", "b", "c"))
#   [1] "c" "b"
relational_datanames <- intersect(unlist(topological_sort(child_parent)), datanames)

# sorted relational_datanames are combined with unsorted 
# with relational_datanames being prior
union(relational_datanames, datanames)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the set is the same, but the order is different

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a separate branch https://github.com/insightsengineering/teal.data/tree/datanames_getter%40topological_sort%40669_insertUI%40main so it's easier to switch

  • sort_datanames() uses union(intersect(
  • sort_datanames() only applied on a getter of datanames()

And the result is the same - intersect( limited the set so the parent is not appended to datanames()

data <- teal_data() |> within(b <- data.frame(), a <- data.frame())
join_keys(data) <- join_keys(join_key("a", "b", key = "id"))
datanames(data) <- "b"
datanames(data)
> [1] "b"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@m7pr result is not the same when one specifies unsorted datanames

pkgload::load_all("teal.data")
data <- teal_data() |> within({
  a <- data.frame()
  b <- data.frame()
}) 
join_keys(data) <- join_keys(join_key("a", "b", key = "id")) 
datanames(data) <- c("b", "a")
datanames(data)

Which solves your issue in teal, where:

  • you wanted to sort datanames topologically
  • but not add datasets based on parent-child relationship

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If datanames() can potentially return names of objects which doesn't exist in @env then when the error should occur?

Yeah, let's extend the setter, so it checks for non-existing names in ls(get_env(teal_data()) and then we can go with a regular union(unlist(topological_sort(), datanames) without the interesct() needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but not add datasets based on parent-child relationship

But I thought we WANT to add parents. If we don't want to add parents, then intersect( is needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this

image

I get "a" "b" which is what we wanted. Ordered by the join_keys order

}
31 changes: 31 additions & 0 deletions tests/testthat/test-datanames.R
Original file line number Diff line number Diff line change
Expand Up @@ -41,3 +41,34 @@ testthat::test_that("datanames<- called on qenv.error does not change qenv.error
qec <- qe
testthat::expect_identical(qe, qec)
})

# topological_order ----
testthat::test_that("datanames return topological order of datasets once join_keys are specified", {
data <- within(teal_data(), {
ADTTE <- teal.data::rADTTE
iris <- iris
ADSL <- teal.data::rADSL
})
datanames(data) <- c("ADTTE", "iris", "ADSL")
join_keys(data) <- default_cdisc_join_keys[c("ADSL", "ADTTE")]
testthat::expect_identical(
datanames(data),
c("ADSL", "ADTTE", "iris")
)
})

testthat::test_that("datanames return topological order of datasets after datanames are caled after join_keys", {
data <- within(teal_data(), {
ADTTE <- teal.data::rADTTE
iris <- iris
ADSL <- teal.data::rADSL
})

join_keys(data) <- default_cdisc_join_keys[c("ADSL", "ADTTE")]
datanames(data) <- c("ADTTE", "iris", "ADSL")

testthat::expect_identical(
datanames(data),
c("ADSL", "ADTTE", "iris")
)
})
Loading