Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String_200 #32

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

String_200 #32

wants to merge 13 commits into from

Conversation

madhan0923
Copy link
Collaborator

This file is used to split the variables in a dataframe which has length > 200 into meaningful string variable with SUPP variables

Thank you for your Pull Request! We have developed this task checklist from the
Development Process
Guide

to help with the final steps of the process. Completing the below tasks helps to
ensure our reviewers can maximize their time on your code as well as making sure
the oak codebase remains robust and consistent.

Please check off each taskbox as an acknowledgment that you completed the task
or check off that it is not relevant to your Pull Request. This checklist is
part of the Github Action workflows and the Pull Request will not be merged into
the devel branch until you have checked off each task.

  • Place Closes #<insert_issue_number> into the beginning of your Pull z
    Request Title (Use Edit button in top-right if you need to update)
  • Code is formatted according to the
    tidyverse style guide. Run
    styler::style_file() to style R and Rmd files
  • Updated relevant unit tests or have written new unit tests, which should
    consider realistic data scenarios and edge cases, e.g. empty datasets, errors,
    boundary cases etc. - See
    Unit Test Guide
  • If you removed/replaced any function and/or function parameters, did you
    fully follow the
    deprecation guidance?
  • Update to all relevant roxygen headers and examples, including keywords
    and families. Refer to the
    categorization of functions to tag appropriate keyword/family.
  • Run devtools::document() so all .Rd files in the man folder and the
    NAMESPACE file in the project root are updated appropriately
  • Address any updates needed for vignettes and/or templates
  • Update NEWS.md if the changes pertain to a user-facing function (i.e. it
    has an @export tag) or documentation aimed at users (rather than developers)
  • Build oak site pkgdown::build_site() and check that all affected
    examples are displayed correctly and that all new functions occur on the "Reference" page.
  • Address or fix all lintr warnings and errors - lintr::lint_package()
  • Run R CMD check locally and address all errors and warnings - devtools::check()
  • Link the issue in the Development Section on the right hand side.
  • Address all merge conflicts and resolve appropriately
  • Pat yourself on the back for a job well done! Much love to your accomplishment!

This file is used to split the variables in a dataframe which has length > 200 into meaningful string variable with SUPP variables
Copy link

github-actions bot commented Dec 8, 2023

Code Coverage

Package Line Rate Health
sdtm.oak 83%
Summary 83% (258 / 311)

@galachad galachad self-requested a review December 8, 2023 17:48
Copy link
Member

@galachad galachad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL, I added some of initial comments. @madhan0923 can we consider of using S3 methods for diffrent classes?

@@ -0,0 +1,60 @@
R_SPLIT = function(domain_dataset,max_length_out = 200){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Please use tidyver styleguid and snake case https://style.tidyverse.org/syntax.html
  • Please add roxygen documentation, with examples
  • Please add tests cases
  • Please add namesacpes:: to all functions calls from tidyverse packages

char_200 = domain_dataset %>% select_if(~ max(nchar(.)) >= max_length_out)

#string split function
split_var <- function(string,max_length_out = 200) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not keep function inside function.

split_var <- function(string,max_length_out = 200) {

# Pattern spot
pattern = names(which.max(table(str_extract_all(string, "[:punct:]|[:blank:]")))) %>%
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pattern can be used for split, but it will not work as sep in paste function.

Comment on lines 46 to 54
outt <- map(char_200, ~ {
split_list <- map(.x, ~ {
cv <- as.data.frame(split_var(.x, max_length_out))
names(cv) <- seq_along(cv)
cv
})
split_df <- bind_rows(split_list)
split_df
}) %>% imap(.,~set_names(.x,.y)) %>% bind_cols()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we consider use of tidyr?

@madhan0923 madhan0923 self-assigned this Dec 18, 2023
@rammprasad
Copy link
Collaborator

@madhan0923 - Can you address the review comments? Also, merge main to this FB and make sure the pipeline is green.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In review
Development

Successfully merging this pull request may close these issues.

3 participants