Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gdr 2682 #132

Merged
merged 13 commits into from
Sep 30, 2024
Merged

Gdr 2682 #132

merged 13 commits into from
Sep 30, 2024

Conversation

darsoo
Copy link
Contributor

@darsoo darsoo commented Sep 17, 2024

Description

What changed?

Related JIRA issue:

Why was it changed?

Checklist for sustainable code base

  • I added tests for any code changed/added
  • I added documentation for any code changed/added
  • I made sure naming of any new functions is self-explanatory and consistent

Logistic checklist

  • Package version bumped
  • Changelog updated

Screenshots (optional)

@darsoo darsoo requested a review from a team as a code owner September 17, 2024 09:17
@darsoo darsoo requested review from j-smola and bczech and removed request for a team September 17, 2024 09:17
@bczech
Copy link
Contributor

bczech commented Sep 19, 2024

Please reoxygenate the documentation

@j-smola
Copy link
Contributor

j-smola commented Sep 25, 2024

@darsoo Please reoxygenate the documentation

Copy link
Contributor

@j-smola j-smola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please reoxygenate the documentation

duplicated_ids <- setdiff(duplicated_ids, duplicated_ids_with_clid)
} else {
duplicated_ids <- col_data[[cellline_name]][duplicated(col_data[[cellline_name]])]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why there is a different procedure for colData for SE and for data.table?
(I understand that in colData we do not have information about Drug, but it is misleading: the same content as input but different output depending on format - in the context of presence of cell line name duplicates)

>   dt <- data.table::data.table(
   DrugName = c("DrugA", "DrugB", "DrugC", "DrugD", "DrugC", "DrugD"), 
   Gnumber = c("G1", "G2", "G3", "G4", "G3", "G4"),
   CellLineName = c("ID1", "ID1", "ID2", "ID2", "ID2", "ID2"), 
   clid = c("C1", "C2", "C3", "C4", "C5", "C6")
  )
>   res_dt <- set_unique_cl_names_dt(dt)
> res_dt
   DrugName Gnumber CellLineName   clid
     <char>  <char>       <char> <char>
1:    DrugA      G1          ID1     C1
2:    DrugB      G2          ID1     C2
3:    DrugC      G3     ID2 (C3)     C3
4:    DrugD      G4     ID2 (C4)     C4
5:    DrugC      G3     ID2 (C5)     C5
6:    DrugD      G4     ID2 (C6)     C6

  dt <- S4Vectors::DataFrame(
    DrugName = c("DrugA", "DrugB", "DrugC", "DrugD", "DrugC", "DrugD"), 
    Gnumber = c("G1", "G2", "G3", "G4", "G3", "G4"),
    CellLineName = c("ID1", "ID1", "ID2", "ID2", "ID2", "ID2"), 
    clid = c("C1", "C2", "C3", "C4", "C5", "C6")
  )
>   res_S4 <- set_unique_cl_names_dt(dt)
> res_S4
DataFrame with 6 rows and 4 columns
     DrugName     Gnumber CellLineName        clid
  <character> <character>  <character> <character>
1       DrugA          G1     ID1 (C1)          C1
2       DrugB          G2     ID1 (C2)          C2
3       DrugC          G3     ID2 (C3)          C3
4       DrugD          G4     ID2 (C4)          C4
5       DrugC          G3     ID2 (C5)          C5
6       DrugD          G4     ID2 (C6)          C6

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point, but in my opinion it's a bit pointless. From a practical point of view DataFrame is only used for colData or rowData in SE, where there is no possibility for such a situation to occur. I don't see the need to complicate this logic.
More complex logic for data.table objects is required due to data specificity.

Copy link
Contributor

@j-smola j-smola Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO now is complicated and not consistent.
I would vote for the function always returning the same result, regardless of format. The user may want to use this feature in a context other than within the application.

The only thing to change is - instead of checking format of col_data input - just check whether unique_col_names is "CellLineName".
If true - just add suffix, if not - check other columns and add suffix accordingly.
You have that code already written.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but duplicated works in different way for data.table and DataFrame.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on the same page as @j-smola. It would be great to have consistent logic regardless of the data format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Added fix and test

R/standardize_MAE.R Outdated Show resolved Hide resolved
@j-smola
Copy link
Contributor

j-smola commented Sep 26, 2024

Please, reoxygenate the documentation (run devtools::document("./")) (You changed param description)

@darsoo
Copy link
Contributor Author

darsoo commented Sep 26, 2024

Please, reoxygenate the documentation (run devtools::document("./")) (You changed param description)

done

@darsoo darsoo merged commit 2875659 into main Sep 30, 2024
3 of 4 checks passed
@darsoo darsoo deleted the GDR-2682 branch September 30, 2024 12:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants