-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add script recommendations #2888
Changes from all commits
85f274c
fb0eb4b
be7200e
7ca3c26
22ecd56
650cf00
08ac34a
99aafc3
0107df0
7c0cf58
9233dd5
dd0ba6b
6490837
e742ed7
c438115
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -324,6 +324,73 @@ See the [Bash manual on file operators](https://tldp.org/LDP/abs/html/fto.html) | |
|
||
Alternate suggestions include using `grep -c` to search for a valid string match, or other tool which will appropriately error when the expected output is not successfully created. | ||
|
||
### Script inclusion | ||
|
||
Using module templates helps distinguish between changes made to the scientific logic within the script and those affecting the workflow-specific logic in the module. This separation improves the code's clarity and maintainability. | ||
If a module's `script:` block contains a script rather than command invocations, regardless of the language (e.g., Bash, R, Python), and the content is more than a readable length (as a rule of thumb, approximately 20 lines), it MUST be provided through a [Nextflow module template](https://www.nextflow.io/docs/latest/module.html#module-templates). | ||
|
||
:::note | ||
pinin4fjords marked this conversation as resolved.
Show resolved
Hide resolved
|
||
We recommend use of Nextflow templates as they are the most portable method of separating custom script content and execution across all execution contexts | ||
::: | ||
|
||
:::note | ||
Where script content in a module becomes particularly extensive, we strongly encourage packaging and hosting the code externally and provisioning via Conda/Docker as a standalone tool(kit). | ||
::: | ||
|
||
#### Inline script code | ||
|
||
If the script content remains at a readable length, the code MAY be embedded directly in the module without a dedicated template file. However, they should still follow the guidance content as with a template. | ||
|
||
#### Module template location | ||
|
||
The template MUST go into a directory called `templates/` in the same directory as the module `main.nf`. | ||
|
||
The template file MUST be named after the module itself with a language-appropriate file suffix. For example, the `deseq2/differential` nf-core module will use the `deseq2_differential.R`. | ||
|
||
The template file can then be referred to within the module using the template function: | ||
|
||
```nextflow | ||
script: | ||
template 'deseq2_differential.R' | ||
``` | ||
|
||
See [`deseq2/differential`](https://github.com/nf-core/modules/blob/master/modules/nf-core/deseq2/differential/main.nf#L47) for an example of a template in an nf-core pipeline. | ||
|
||
The resulting structure would look like this. | ||
|
||
```tree | ||
deseq2 | ||
└── differential | ||
├── environment.yml | ||
├── main.nf | ||
├── meta.yml | ||
├── templates | ||
│ └── deqseq2_differential.R | ||
└── tests | ||
├── main.nf.test | ||
├── main.nf.test.snap | ||
└── tags.yml | ||
``` | ||
|
||
#### Template or inline script-code contents | ||
|
||
:::warning | ||
Be aware that in any script template that Nextflow needs to be escaped in the same way you would in a standard bash `script:` block! | ||
::: | ||
|
||
The script template file or inline script code (used when at a readable length) MUST generate a `versions.yml` file in the language-appropriate way that contains versions of the base language and all relevant libraries and packages. | ||
|
||
The generated `versions.yml` MUST have the same structure as a standard nf-core module `versions.yml`. | ||
|
||
See the [`deseq2/differential` module](https://github.com/nf-core/modules/blob/4c2d06a5e79abf08ba7f04c58e39c7dad75f094d/modules/nf-core/deseq2/differential/templates/deseq_de.R#L509-L534) for an example using R. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nothing is mentioned on the Also an alternative suggestion for writing the R version:
Think it's more easier to read then:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
What do you mean? I don't follow, sorry. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the second comment @Joon-Klaps please feel free to update the moduel/example :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the mentioned example lines of code there is a small section dedicated into exporting the
However, nothing is mentioned of this in these guidelines. So is it necessary or not? I'm guessing so given it was included in the referenced lines but would also explicitly mention it then in the guidelines. Although not certain about the added value of the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah got you. I don't think that makes sense putting 'how' to generate the yaml file in the guideline per se - we only care that it exists. We can point to other examples though if it helps (feel free to make an additional PR). Maybe a dedicated tutorial page how to write such a template module would make sense due to the intricacies of the variable escaping? What do you think @pinin4fjords ? |
||
|
||
#### Stubs in templated modules | ||
|
||
A templated module MUST have a stub block in the same way as any other module. For example, using `touch` to generate empty files and versions. See [`deseq2/differential` module](https://github.com/nf-core/modules/blob/4c2d06a5e79abf08ba7f04c58e39c7dad75f094d/modules/nf-core/deseq2/differential/main.nf#L34-L49) for an example in an nf-core module. | ||
|
||
An inline command to call the version for libraries for the `versions.yml` MAY be used in this case. | ||
For an R example see [deseq2/differential](https://github.com/nf-core/modules/blob/4c2d06a5e79abf08ba7f04c58e39c7dad75f094d/modules/nf-core/deseq2/differential/main.nf#L47). | ||
|
||
### Stubs | ||
|
||
#### Stub block must exist | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Nextflow docs discourages the use of templates for non-bash scripts: https://nextflow.io/docs/latest/process.html#template
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I guess the module binaries feature requires Wave when the work directory is in object storage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the requirement of wave is why we opted not to do so as then it's not portable :/ (and use the examples from Jon)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bentsherman any chance of module binaries becoming a universal solution in future, like workflow bin? My last discussion with our lord and master suggested not, but this is is the unavoidable consequence (as far as I can tell), unless we embed massive chunks of code in the process definition, which I don't think is a better way.