Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update templates #2193

Merged
merged 3 commits into from
Oct 26, 2023
Merged

update templates #2193

merged 3 commits into from
Oct 26, 2023

Conversation

dgrassellyb
Copy link
Contributor

Thank you for your Pull Request! We have developed this task checklist from the Development Process Guide to help with the final steps of the process. Completing the below tasks helps to ensure our reviewers can maximize their time on your code as well as making sure the admiral codebase remains robust and consistent.

Please check off each taskbox as an acknowledgment that you completed the task or check off that it is not relevant to your Pull Request. This checklist is part of the Github Action workflows and the Pull Request will not be merged into the devel branch until you have checked off each task.

  • Place Closes #<insert_issue_number> into the beginning of your Pull Request Title (Use Edit button in top-right if you need to update)
  • Code is formatted according to the tidyverse style guide. Run styler::style_file() to style R and Rmd files
  • Updated relevant unit tests or have written new unit tests, which should consider realistic data scenarios and edge cases, e.g. empty datasets, errors, boundary cases etc. - See Unit Test Guide
  • If you removed/replaced any function and/or function parameters, did you fully follow the deprecation guidance?
  • Update to all relevant roxygen headers and examples, including keywords and families. Refer to the categorization of functions to tag appropriate keyword/family.
  • Run devtools::document() so all .Rd files in the man folder and the NAMESPACE file in the project root are updated appropriately
  • Address any updates needed for vignettes and/or templates
  • Update NEWS.md under the header # admiral (development version) if the changes pertain to a user-facing function (i.e. it has an @export tag) or documentation aimed at users (rather than developers)
  • Build admiral site pkgdown::build_site() and check that all affected examples are displayed correctly and that all new functions occur on the "Reference" page.
  • Address or fix all lintr warnings and errors - lintr::lint_package()
  • Run R CMD check locally and address all errors and warnings - devtools::check()
  • Link the issue in the Development Section on the right hand side.
  • Address all merge conflicts and resolve appropriately
  • Pat yourself on the back for a job well done! Much love to your accomplishment!

@github-actions
Copy link

github-actions bot commented Oct 26, 2023

Code Coverage

Package Line Rate Health
admiral 98%
Summary 98% (4404 / 4473)

@manciniedoardo manciniedoardo merged commit ae0fcdb into main Oct 26, 2023
@manciniedoardo manciniedoardo deleted the 618_templates_action_v2 branch October 26, 2023 10:16
@bms63
Copy link
Collaborator

bms63 commented Oct 26, 2023

@dgrassellyb @manciniedoardo Hi all I raised this in the previous PR. I don't think we should be using getwd() in templates that users have access to. Newbies could create datasets in places they are not supposed to be creating stuff in depending on where their current directory is set. While I think they might have trouble finding the temporary directory, I feel like getwd() can also be a bit of a challenge as well but will have a bigger impact.

What was wrong with the previous tempdir() call?

@bms63
Copy link
Collaborator

bms63 commented Oct 26, 2023

oh also could i get taken off the PR review auto-assignment? I think @kaz462 should take my place

@manciniedoardo
Copy link
Collaborator

What was wrong with the previous tempdir() call?

I believe @dgrassellyb needed to change something to get the templates to update pharmaverseadam? A

@dgrassellyb @manciniedoardo Hi all I raised this in the previous PR. I don't think we should be using getwd() in templates that users have access to. Newbies could create datasets in places they are not supposed to be creating stuff in depending on where their current directory is set. While I think they might have trouble finding the temporary directory, I feel like getwd() can also be a bit of a challenge as well but will have a bigger impact.

I don't know if this is a massive issue. If users are running the template for the first time then the first thing they'll usually do is alter those final lines of code to save the resulting dataset wherever they need it?

@dgrassellyb
Copy link
Contributor Author

dgrassellyb commented Oct 26, 2023

@dgrassellyb @manciniedoardo Hi all I raised this in the previous PR. I don't think we should be using getwd() in templates that users have access to. Newbies could create datasets in places they are not supposed to be creating stuff in depending on where their current directory is set. While I think they might have trouble finding the temporary directory, I feel like getwd() can also be a bit of a challenge as well but will have a bigger impact.

What was wrong with the previous tempdir() call?

Hi @bms63 I will remove you from the automatic PR reviewers, sorry for the spam !
And about this tempdir it was an issue for me (because it's creating a temp dir with random ID and it was more work to retrieve the data created from the github action)
But I have a quick solution for this : from the templates I can detect if it's running through github action or not (in the case of users local run I can in this case use tempdir) - would you be ok with this ?

@bms63
Copy link
Collaborator

bms63 commented Oct 26, 2023

What was wrong with the previous tempdir() call?

I believe @dgrassellyb needed to change something to get the templates to update pharmaverseadam? A

@dgrassellyb @manciniedoardo Hi all I raised this in the previous PR. I don't think we should be using getwd() in templates that users have access to. Newbies could create datasets in places they are not supposed to be creating stuff in depending on where their current directory is set. While I think they might have trouble finding the temporary directory, I feel like getwd() can also be a bit of a challenge as well but will have a bigger impact.

I don't know if this is a massive issue. If users are running the template for the first time then the first thing they'll usually do is alter those final lines of code to save the resulting dataset wherever they need it?

I actually don't think they will alter any lines - they will just run the entire code and see what happens - this is what I would do. They could end making datasets in their study folders and since we just use standard ADaM names overwrite their study datasets. Apologies for being nitpicky on this but I really don't think we should assume a user is going to change this part of the template. There is so much code already for them to work through.

@dgrassellyb I don't fully understand your solution. I'd prefer not to alter the template code. Could you inject/delete some temporary code into the templates to help you with your workflow?

@dgrassellyb
Copy link
Contributor Author

What was wrong with the previous tempdir() call?

I believe @dgrassellyb needed to change something to get the templates to update pharmaverseadam? A

@dgrassellyb @manciniedoardo Hi all I raised this in the previous PR. I don't think we should be using getwd() in templates that users have access to. Newbies could create datasets in places they are not supposed to be creating stuff in depending on where their current directory is set. While I think they might have trouble finding the temporary directory, I feel like getwd() can also be a bit of a challenge as well but will have a bigger impact.

I don't know if this is a massive issue. If users are running the template for the first time then the first thing they'll usually do is alter those final lines of code to save the resulting dataset wherever they need it?

I actually don't think they will alter any lines - they will just run the entire code and see what happens - this is what I would do. They could end making datasets in their study folders and since we just use standard ADaM names overwrite their study datasets. Apologies for being nitpicky on this but I really don't think we should assume a user is going to change this part of the template. There is so much code already for them to work through.

@dgrassellyb I don't fully understand your solution. I'd prefer not to alter the template code. Could you inject/delete some temporary code into the templates to help you with your workflow?

My solution would be the following before saving results to the folder:

if (Sys.getenv("GITHUB_ACTIONS") != "") {
     dir <- file.path(getwd(), "tmp")
}
else {
    dir <- tempdir()
}

GITHUB_ACTIONS is an env variable defined from github workflows (we are already using this kind of logic in some other workflows to make distinction between local exec / user local env

@manciniedoardo
Copy link
Collaborator

Hmmm, not sure I like this being at the bottom of the template - this could indeed confuse users i feel.

@dgrassellyb
Copy link
Contributor Author

@bms63 / @manciniedoardo I will create an issue and make some updates to handle this from the workflow (so putting back also temp dir)

@manciniedoardo
Copy link
Collaborator

Thanks @dgrassellyb !

@dgrassellyb
Copy link
Contributor Author

dgrassellyb commented Oct 27, 2023

@manciniedoardo / @bms63 :
Just made some tests and I can see the tempdir() instruction is creating each time a new temp directory .. is this really the behavior you want ? (having a temp directory for each template ?) -
some examples of folders created :
"/tmp/Rtmpfl9Z0x" (ad_adae)
"/tmp/RtmpMUrqVF" (ad_adcm) (...)

(I am not completely aware of the final use of these data from users but it seems complex I guess to navigate every subfolders to get the produced data.. )

@dgrassellyb
Copy link
Contributor Author

dgrassellyb commented Oct 27, 2023

Just making also a quick test but something to consider also but tempdir() is creating a temporary folder that will be deleted after the R session - it seems that it is the behavior of tempdir func :

root@bcf2f57b6eda:/workspace# ls /tmp
root@bcf2f57b6eda:/workspace# R
> tempdir()
[1] "/tmp/RtmpixCuDB"
> list.files("/tmp")
[1] "RtmpixCuDB"
> q()
root@bcf2f57b6eda:/workspace# ls /tmp 

it will be only available during the R session (during the template execution) and then erased at the end (so we are not keeping rda data) (I don't know if it's a behavior depending of users os but it's what's happening for me)
But also agree with you @bms63 that I am not fan of getcwd(), I will suggest maybe keeping the tmp folder but creating it in a different way :

tmp_dir <- dirname(tempdir())
dir.create(file.path(tmp_dir, "templates_data"))

(I made some tests and templates_data is not removed after R session) (and naming it templates_data would allow to get data in a fixed path and not with random ids generated)

@bundfussr
Copy link
Collaborator

I think the intention of using tempdir() is to avoid that the template overwrites any files in the user's directory. It is expected that the users update dir if they want to keep the dataset permanently.

@cicdguy
Copy link
Collaborator

cicdguy commented Oct 30, 2023

I think the intention of using tempdir() is to avoid that the template overwrites any files in the user's directory. It is expected that the users update dir if they want to keep the dataset permanently.

If the files do get overwritten, then they should represent an up-to-date dataset at all times if the code and content is truly reproducible. In effect, if a user does overwrite content in that directory, the content shouldn't really change.

Moreover, if a content is being written to a temporary directory, and if retention is specified, it'll lead to crowding of data on disk and could potentially lead to the user running out of disk space if this happens very frequently.

Lastly, temporary directory locations are very difficult to infer during a GitHub Actions workflow, as @dgrassellyb mentioned in #2193 (comment). This PR proposes the deterministic locations for easier and definitive inference.

@bundfussr
Copy link
Collaborator

There are two use cases for the templates.

The primary use case is providing a starting point for an ADaM script. By default it should not overwrite or add anything to the user's directory. It should be a conscious decision of the user to write something to the directory.

The secondary use case is to create the ADaM datasets in pharmaverseadam via a workflow.

I would prefer to keep the template code as it is and modify the template on the fly when running it in the workflow. Is this possible?

@cicdguy
Copy link
Collaborator

cicdguy commented Oct 30, 2023

I would prefer to keep the template code as it is and modify the template on the fly when running it in the workflow. Is this possible?

Maybe let's follow @dgrassellyb's advice here about tracking this in a separate issue. One possible solution has already been proposed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants