Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Change JoinKeys to something simpler #78

Comments

@gogonzo
Copy link
Contributor

gogonzo commented Jun 30, 2022

!!UPDATE!!

Read this comment for updated instruction on how to move forward with this issue:
insightsengineering/teal.gallery#75 (comment)


Feature description

JoinKeys is a R6 class which is not user friendly. JoinKeys should be a simple list or S4 class acting like a list. Reasons:

  1. join_keys should behave like a list so that teal.slice and teal.transform doesn't need
    teal.data as dependency. It might need for examples, but extracting objects by join_keys[["data1"]][["data2"]]
    will work with join_keys and with a simple list

  2. join_keys should be a named list a dataname. Each list item should be a named list of keys between datasets. It could be represented by such yaml:

orders:
  orders: 
    - order_id
  products: 
    - product_id
products:
  products: 
    - product_id
  orders: 
    - product_id

or when keys names are different

orders:
  orders: 
    - id
  products: 
    product_id: id
products:
  products: 
    - id
  orders: 
    id: product_id
  1. join_keys should be able to add keys symmetrically. It means that adding key between data1 and data2 should add the key between data2 and data1.
  2. Primary keys should be the first on the list.
  3. When parent dataset is specified for given dataset, then it should be placed as second element in the list.
  4. Each dataset can have only one parent dataset. Keys should be DAG (directed acyclic graph).
  5. Each time list is modified it should be validated (if no cycles, if keys are symmetrical, if keys are unique)
  6. join keys for given dataset can be empty list. It means that dataset is not connected with any other dataset. This could take over @datanames slot, as all datasets could be listed in join_keys (do we need this?)

Exported functions

  • join_keys(...) constructor
  • join_keys(data) getter
  • join_keys(data) <- setter

Constructor

jk <- join_keys(
  orders = list(
    orders = "id",
    products = c(product_id = "id")
  ),
  products = list(
    products = "id",
    studies = c(id = "product_id")
  )
)

Setters

jk[["orders"]][["products"]] <- c(product_id = "id") # update products -> orders
jk[["orders"]] <- modifyList(jk[["orders"]], list(products = c(product_id = "id")) # update accordingly 

jk[["orders"]] <- list(
  orders = list(
    orders = "id",
    products = c(product_id = "id")
  )
)

Having setter jk[["orders"]][["products"]]<- doesn't seem to be possible, as it will be challenging to check symmetry in the whole list of keys when setting only one item (parent list methods are not invoked). It means that we probably should stay with jk["orders", "products"]

Getters

join_keys(data) # returns join_keys from teal_data
join_keys(data)[["orders"]] # returns list of keys from orders dataset
join_keys(data)[["orders"]][["products"]] # returns vector of keys from orders to products

Changing JoinKeys will require change at least in these packages:

  • teal.slice
  • teal.transform
  • teal
@nikolas-burkoff
Copy link
Contributor

See insightsengineering/teal.slice#71 (comment) - we should make sure join keys object returns what we want it to in edge cases (and to have appropriate validation when using it outside of teal_data...)

@nikolas-burkoff
Copy link
Contributor

See here insightsengineering/teal#683 (comment)

I agree passing JoinKeys makes more sense. We can register [[, [, $ methods for JoinKeys to return a list for a given datanames so datasets$get_join_keys()$get()[datanames] won't bother anyone in the future.

@donyunardi
Copy link
Contributor

Need to reassess this. Team huddle after filter panel

@gogonzo
Copy link
Contributor Author

gogonzo commented Oct 27, 2023

Needs a quick implementation, please follow the comment

insightsengineering/teal.gallery#75 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment