-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gdr 2682 #132
Conversation
Please reoxygenate the documentation |
@darsoo Please reoxygenate the documentation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reoxygenate the documentation
duplicated_ids <- setdiff(duplicated_ids, duplicated_ids_with_clid) | ||
} else { | ||
duplicated_ids <- col_data[[cellline_name]][duplicated(col_data[[cellline_name]])] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why there is a different procedure for colData for SE and for data.table?
(I understand that in colData we do not have information about Drug, but it is misleading: the same content as input but different output depending on format - in the context of presence of cell line name duplicates)
> dt <- data.table::data.table(
DrugName = c("DrugA", "DrugB", "DrugC", "DrugD", "DrugC", "DrugD"),
Gnumber = c("G1", "G2", "G3", "G4", "G3", "G4"),
CellLineName = c("ID1", "ID1", "ID2", "ID2", "ID2", "ID2"),
clid = c("C1", "C2", "C3", "C4", "C5", "C6")
)
> res_dt <- set_unique_cl_names_dt(dt)
> res_dt
DrugName Gnumber CellLineName clid
<char> <char> <char> <char>
1: DrugA G1 ID1 C1
2: DrugB G2 ID1 C2
3: DrugC G3 ID2 (C3) C3
4: DrugD G4 ID2 (C4) C4
5: DrugC G3 ID2 (C5) C5
6: DrugD G4 ID2 (C6) C6
dt <- S4Vectors::DataFrame(
DrugName = c("DrugA", "DrugB", "DrugC", "DrugD", "DrugC", "DrugD"),
Gnumber = c("G1", "G2", "G3", "G4", "G3", "G4"),
CellLineName = c("ID1", "ID1", "ID2", "ID2", "ID2", "ID2"),
clid = c("C1", "C2", "C3", "C4", "C5", "C6")
)
> res_S4 <- set_unique_cl_names_dt(dt)
> res_S4
DataFrame with 6 rows and 4 columns
DrugName Gnumber CellLineName clid
<character> <character> <character> <character>
1 DrugA G1 ID1 (C1) C1
2 DrugB G2 ID1 (C2) C2
3 DrugC G3 ID2 (C3) C3
4 DrugD G4 ID2 (C4) C4
5 DrugC G3 ID2 (C5) C5
6 DrugD G4 ID2 (C6) C6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point, but in my opinion it's a bit pointless. From a practical point of view DataFrame
is only used for colData or rowData in SE, where there is no possibility for such a situation to occur. I don't see the need to complicate this logic.
More complex logic for data.table objects is required due to data specificity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO now is complicated and not consistent.
I would vote for the function always returning the same result, regardless of format. The user may want to use this feature in a context other than within the application.
The only thing to change is - instead of checking format of col_data
input - just check whether unique_col_names
is "CellLineName"
.
If true - just add suffix, if not - check other columns and add suffix accordingly.
You have that code already written.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but duplicated
works in different way for data.table and DataFrame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on the same page as @j-smola. It would be great to have consistent logic regardless of the data format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Added fix and test
Co-authored-by: j-smola <[email protected]>
Please, reoxygenate the documentation (run |
done |
Description
What changed?
Related JIRA issue:
Why was it changed?
Checklist for sustainable code base
Logistic checklist
Screenshots (optional)