Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script recommendations #2888

Merged
merged 15 commits into from
Dec 18, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 67 additions & 0 deletions sites/docs/src/content/docs/guidelines/components/modules.md
Original file line number Diff line number Diff line change
Expand Up @@ -324,6 +324,73 @@ See the [Bash manual on file operators](https://tldp.org/LDP/abs/html/fto.html)

Alternate suggestions include using `grep -c` to search for a valid string match, or other tool which will appropriately error when the expected output is not successfully created.

### Script inclusion

Using module templates helps distinguish between changes made to the scientific logic within the script and those affecting the workflow-specific logic in the module. This separation improves the code's clarity and maintainability.
If a module's `script:` block contains a script rather than command invocations, regardless of the language (e.g., Bash, R, Python), and the content is more than a readable length (as a rule of thumb, approximately 20 lines), it MUST be provided through a [Nextflow module template](https://www.nextflow.io/docs/latest/module.html#module-templates).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Nextflow docs discourages the use of templates for non-bash scripts: https://nextflow.io/docs/latest/process.html#template

Template scripts are recommended only for Bash scripts. Languages that do not prefix variables with $ (e.g. Python and R) can’t be executed directly as a template script.

The best practice for using a custom script is to embed it in the process definition at first and move it to a separate file with its own command line interface once the code matures.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I guess the module binaries feature requires Wave when the work directory is in object storage

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the requirement of wave is why we opted not to do so as then it's not portable :/ (and use the examples from Jon)

Copy link
Member Author

@pinin4fjords pinin4fjords Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bentsherman any chance of module binaries becoming a universal solution in future, like workflow bin? My last discussion with our lord and master suggested not, but this is is the unavoidable consequence (as far as I can tell), unless we embed massive chunks of code in the process definition, which I don't think is a better way.


:::note
pinin4fjords marked this conversation as resolved.
Show resolved Hide resolved
We recommend use of Nextflow templates as they are the most portable method of separating custom script content and execution across all execution contexts
:::

:::note
Where script content in a module becomes particularly extensive, we strongly encourage packaging and hosting the code externally and provisioning via Conda/Docker as a standalone tool(kit).
:::

#### Inline script code

If the script content remains at a readable length, the code MAY be embedded directly in the module without a dedicated template file. However, they should still follow the guidance content as with a template.

#### Module template location

The template MUST go into a directory called `templates/` in the same directory as the module `main.nf`.

The template file MUST be named after the module itself with a language-appropriate file suffix. For example, the `deseq2/differential` nf-core module will use the `deseq2_differential.R`.

The template file can then be referred to within the module using the template function:

```nextflow
script:
template 'deseq2_differential.R'
```

See [`deseq2/differential`](https://github.com/nf-core/modules/blob/master/modules/nf-core/deseq2/differential/main.nf#L47) for an example of a template in an nf-core pipeline.

The resulting structure would look like this.

```tree
deseq2
└── differential
├── environment.yml
├── main.nf
├── meta.yml
├── templates
│   └── deqseq2_differential.R
└── tests
├── main.nf.test
├── main.nf.test.snap
└── tags.yml
```

#### Template or inline script-code contents

:::warning
Be aware that in any script template that Nextflow needs to be escaped in the same way you would in a standard bash `script:` block!
:::

The script template file or inline script code (used when at a readable length) MUST generate a `versions.yml` file in the language-appropriate way that contains versions of the base language and all relevant libraries and packages.

The generated `versions.yml` MUST have the same structure as a standard nf-core module `versions.yml`.

See the [`deseq2/differential` module](https://github.com/nf-core/modules/blob/4c2d06a5e79abf08ba7f04c58e39c7dad75f094d/modules/nf-core/deseq2/differential/templates/deseq_de.R#L509-L534) for an example using R.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing is mentioned on the R_sessionInfo.log that is created in the template there.

Also an alternative suggestion for writing the R version:

paste(version[['major']],version[['minor']], sep = ".")

Think it's more easier to read then:

strsplit(version[['version.string']], ' ')[[1]][3]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing is mentioned on the R_sessionInfo.log that is created in the template there.

What do you mean? I don't follow, sorry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the second comment @Joon-Klaps please feel free to update the moduel/example :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the mentioned example lines of code there is a small section dedicated into exporting the R.sessionInfo().

sink(paste(opt\$output_prefix, "R_sessionInfo.log", sep = '.'))
print(sessionInfo())
sink()

However, nothing is mentioned of this in these guidelines. So is it necessary or not? I'm guessing so given it was included in the referenced lines but would also explicitly mention it then in the guidelines. Although not certain about the added value of the R.session() info.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got you. I don't think that makes sense putting 'how' to generate the yaml file in the guideline per se - we only care that it exists.

We can point to other examples though if it helps (feel free to make an additional PR). Maybe a dedicated tutorial page how to write such a template module would make sense due to the intricacies of the variable escaping? What do you think @pinin4fjords ?


#### Stubs in templated modules

A templated module MUST have a stub block in the same way as any other module. For example, using `touch` to generate empty files and versions. See [`deseq2/differential` module](https://github.com/nf-core/modules/blob/4c2d06a5e79abf08ba7f04c58e39c7dad75f094d/modules/nf-core/deseq2/differential/main.nf#L34-L49) for an example in an nf-core module.

An inline command to call the version for libraries for the `versions.yml` MAY be used in this case.
For an R example see [deseq2/differential](https://github.com/nf-core/modules/blob/4c2d06a5e79abf08ba7f04c58e39c7dad75f094d/modules/nf-core/deseq2/differential/main.nf#L47).

### Stubs

#### Stub block must exist
Expand Down
Loading