Skip to content

Commit

Permalink
Merge pull request #6 from EBI-Metagenomics/feature/restructure_outputs
Browse files Browse the repository at this point in the history
Feature/restructure outputs
  • Loading branch information
KateSakharova authored May 30, 2024
2 parents ce57e24 + 1963f75 commit 36bc2f0
Show file tree
Hide file tree
Showing 48 changed files with 1,207 additions and 476 deletions.
11 changes: 7 additions & 4 deletions .github/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,11 @@ If you're not used to this workflow with git, you can start with some [docs from

## Tests

You can optionally test your changes by running the pipeline locally. Then it is recommended to use the `debug` profile to
receive warnings about process selectors and other debug info. Example: `nextflow run . -profile debug,test,docker --outdir <OUTDIR>`.
You have the option to test your changes locally by running the pipeline. For receiving warnings about process selectors and other `debug` information, it is recommended to use the debug profile. Execute all the tests with the following command:

```bash
nf-test test --profile debug,test,docker --verbose
```

When you create a pull request with changes, [GitHub Actions](https://github.com/features/actions) will run automatic tests.
Typically, pull-requests are only fully reviewed when these tests are passing, though of course we can help out before then.
Expand All @@ -40,7 +43,7 @@ If any failures or warnings are encountered, please follow the listed URL for mo

### Pipeline tests

Each `nf-core` pipeline should be set up with a minimal set of test-data.
Each of the Microbiome Informatics pipelines should be set up with a minimal set of test-data.
`GitHub Actions` then runs the pipeline on this data to ensure that it exits successfully.
If there are any failures then the automated tests fail.
These tests are run both with the latest available version of `Nextflow` and also the minimum required version that is stated in the pipeline code.
Expand Down Expand Up @@ -82,7 +85,7 @@ Once there, use `nf-core schema build` to add to `nextflow_schema.json`.

Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/master/nf_core/pipeline-template/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels.

The process resources can be passed on to the tool dynamically within the process with the `${task.cpu}` and `${task.memory}` variables in the `script:` block.
The process resources can be passed on to the tool dynamically within the process with the `${task.cpus}` and `${task.memory}` variables in the `script:` block.

### Naming schemes

Expand Down
36 changes: 36 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
name: nf-test CI
on:
push:
branches:
- dev
pull_request:
release:
types: [published]

env:
NXF_ANSI_LOG: false
NFTEST_VER: "0.8.4"

jobs:
test:
name: Run pipeline with test data
runs-on: ubuntu-latest

steps:
- name: Check out pipeline code
uses: actions/checkout@v4

- uses: actions/setup-java@99b8673ff64fbf99d8d325f52d9a5bdedb8483e9 # v4
with:
distribution: "temurin"
java-version: "17"

- name: Setup Nextflow
uses: nf-core/setup-nextflow@v2

- name: Install nf-test
uses: nf-core/setup-nf-test@v1

- name: Run pipeline with test data
run: |
nf-test test
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,8 @@ testing*
results/

*.pyc
.pytest_cache/

assets/fetch_tool_credentials.json
assets/fetch_tool_credentials.json
.nf-test.log
.nf-test/
30 changes: 23 additions & 7 deletions .nf-core.yml
Original file line number Diff line number Diff line change
@@ -1,32 +1,48 @@
repository_type: pipeline
template:
prefix: ebi-metagenomics
skip:
- ci
- github_badges
lint:
files_exist:
- CODE_OF_CONDUCT.md
- assets/nf-core-miassembler_logo_light.png
- docs/images/nf-core-miassembler_logo_light.png
- docs/images/nf-core-miassembler_logo_dark.png
- docs/output.md
- docs/usage.md
- .github/ISSUE_TEMPLATE/config.yml
- .github/workflows/awstest.yml
- .github/workflows/awsfulltest.yml
- .github/workflows/branch.yml
- .github/workflows/ci.yml
- .github/workflows/linting_comment.yml
- .github/workflows/linting.yml
- conf/test_full.config
- lib/Utils.groovy
- lib/WorkflowMain.groovy
- lib/NfcoreTemplate.groovy
- lib/WorkflowMiassembler.groovy
- lib/nfcore_external_java_deps.jar
files_unchanged:
- CODE_OF_CONDUCT.md
- assets/nf-core-miassembler_logo_light.png
- docs/images/nf-core-miassembler_logo_light.png
- docs/images/nf-core-miassembler_logo_dark.png
- .github/ISSUE_TEMPLATE/bug_report.yml
- .github/CONTRIBUTING.md
- LICENSE
- docs/README.md
- .gitignore
multiqc_config:
- report_comment
nextflow_config:
nextflow_config: False
- params.input
- params.validationSchemaIgnoreParams
- params.custom_config_version
- params.custom_config_base
- manifest.name
- manifest.homePage
readme:
- nextflow_badge
repository_type: pipeline
template:
prefix: ebi-metagenomics
skip:
- ci
- github_badges
56 changes: 52 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@

This pipeline is still in early development. It's mostly a direct port of the mi-automation assembly generation pipeline. Some of the bespoke scripts used to remove contaminated contigs or to calculate the coverage of the assembly were replaced with tools provided by the community ([SeqKit](https://doi.org/10.1371/journal.pone.0163962) and [quast](https://doi.org/10.1093/bioinformatics/btu153) respectively).

> [!NOTE]
> This pipeline uses the nf-core template with some tweaks, but it's not part of nf-core.
## Usage

> [!WARNING]
Expand All @@ -23,12 +26,21 @@ nextflow run ebi-metagenomics/miassembler --help
Input/output options
--study_accession [string] The ENA Study secondary accession
--reads_accession [string] The ENA Run primary accession
--assembler [string] The short reads assembler (accepted: spades, metaspades, megahit) [default: metaspades for PE, megahit for SE]
--private_study [boolean] To use if the ENA study is private [default: false]
--assembler [string] The short reads assembler (accepted: spades, metaspades, megahit) [default: metaspades]
--reference_genome [string] The genome to be used to clean the assembly, the genome will be taken from the Microbiome Informatics internal
directory (accepted: chicken.fna, salmon.fna, cod.fna, pig.fna, cow.fna, mouse.fna, honeybee.fna,
rainbow_trout.fna, ...) [default: human+phiX]
--reference_genomes_folder [string] The folder with the reference genome blast indexes, defaults to the Microbiome Informatics internal directory
[default: /nfs/production/rdf/metagenomics/pipelines/prod/assembly-pipeline/blast_dbs/]
rainbow_trout.fna, rat.fna, ...)
--blast_reference_genomes_folder [string] The folder with the reference genome blast indexes, defaults to the Microbiome Informatics internal
directory.
--bwamem2_reference_genomes_folder [string] The folder with the reference genome bwa-mem2 indexes, defaults to the Microbiome Informatics internal
directory.
--remove_human_phix [boolean] Remove human and phiX reads pre assembly, and contigs matching those genomes. [default: true]
--human_phix_blast_index_name [string] Combined Human and phiX BLAST db. [default: human_phix]
--human_phix_bwamem2_index_name [string] Combined Human and phiX bwa-mem2 index. [default: human_phix]
--min_contig_length [integer] Minimum contig length filter. [default: 500]
--assembly_memory [integer] Default memory allocated for the assembly process. [default: 100]
--spades_only_assembler [boolean] Run SPAdes/metaSPAdes without the error correction step. [default: true]
--outdir [string] The output directory where the results will be saved. You have to use absolute paths to storage on Cloud
infrastructure.
--email [string] Email address for completion summary.
Expand All @@ -50,7 +62,43 @@ nextflow run ebi-metagenomics/miassembler \
--reads_accession SRR1631361
```
## Outputs
The outputs of the pipeline are organized as follows:
```
results/SRP1154
└── SRP115494
└── SRR6180
└── SRR6180434
├── assembly
│   └── metaspades
│   └── 3.15.5
│   ├── coverage
│   ├── decontamination
│   └── qc
│   ├── multiqc
│   └── quast
└── qc
├── fastp
└── fastqc

```
The nested structure based on ENA Study and Reads accessions was created to suit the Microbiome Informatics team’s needs. The benefit of this structure is that results from different runs of the same study won’t overwrite any results.
## Tests
There is a very small test data set ready to use:
```bash
nextflow run main.nf -resume -profile test,docker
```
### End to end tests
Two end-to-end tests can be launched (with megahit and metaspades) with the following command:
```bash
pytest tests/workflows/ --verbose
```
2 changes: 1 addition & 1 deletion assets/email_template.html
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@

<img src="cid:nfcorepipelinelogo">

<h1>ebi-metagenomics/miassembler v${version}</h1>
<h1>ebi-metagenomics/miassembler ${version}</h1>
<h2>Run Name: $runName</h2>

<% if (!success){
Expand Down
14 changes: 4 additions & 10 deletions assets/methods_description_template.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,27 +3,21 @@ description: "Suggested text and references to use when describing pipeline usag
section_name: "ebi-metagenomics/miassembler Methods Description"
section_href: "https://github.com/ebi-metagenomics/miassembler"
plot_type: "html"
## TODO nf-core: Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
## You inject any metadata in the Nextflow '${workflow}' object
data: |
<h4>Methods</h4>
<p>Data was processed using ebi-metagenomics/miassembler v${workflow.manifest.version} ${doi_text} of the nf-core collection of workflows (<a href="https://doi.org/10.1038/s41587-020-0439-x">Ewels <em>et al.</em>, 2020</a>), utilising reproducible software environments from the Bioconda (<a href="https://doi.org/10.1038/s41592-018-0046-7">Grüning <em>et al.</em>, 2018</a>) and Biocontainers (<a href="https://doi.org/10.1093/bioinformatics/btx192">da Veiga Leprevost <em>et al.</em>, 2017</a>) projects.</p>
<p>Data is processed using MGnify ebi-metagenomics/miassembler v${workflow.manifest.version} ${doi_text}. Supported assemblers are MEGAHIT, SPAdes and metaSPAdes (default). Single-end reads are assembled only using MEGAHIT and metatranscriptomic data only with SPAdes. Pipeline uses a set of custom functions and modules from nf-core collection (<a href="https://doi.org/10.1038/s41587-020-0439-x">Ewels <em>et al.</em>, 2020</a>), utilising reproducible software environments from the Bioconda (<a href="https://doi.org/10.1038/s41592-018-0046-7">Grüning <em>et al.</em>, 2018</a>) and Biocontainers (<a href="https://doi.org/10.1093/bioinformatics/btx192">da Veiga Leprevost <em>et al.</em>, 2017</a>) projects.</p>
<p>The pipeline was executed with Nextflow v${workflow.nextflow.version} (<a href="https://doi.org/10.1038/nbt.3820">Di Tommaso <em>et al.</em>, 2017</a>) with the following command:</p>
<pre><code>${workflow.commandLine}</code></pre>
<p>${tool_citations}</p>
<h4>References</h4>
<ul>
<li>Richardson LJ, Allen B, Baldi G, Beracochea M, Bileschi M, Burdett T, Burgin J, Caballero-Pérez J, Cochrane G, Colwell L, Curtis T, Escobar-Zepeda A, Gurbich T, Kale V, Korobeynikov A, Raj S, Rogers AB, Sakharova E, Sanchez S, Wilkinson D and Finn RD. (2023) MGnify: the microbiome sequence data analysis resource in 2023. Nucleic Acids Research. doi: <a href="https://academic.oup.com/nar/article/51/D1/D753/6880769">10.1093/nar/gkac1080</a></li>
<li>Li, D., Liu, C-M., Luo, R., Sadakane, K., and Lam, T-W. (2015). MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics. doi: <a href="https://doi.org/10.1093/bioinformatics/btv033">10.1093/bioinformatics/btv033</a></li>
<li>Prjibelski A., Antipov D., Meleshko D., Lapidus A., Korobeynikov A. (2020). Using SPAdes De Novo Assembler. Current Protocols. doi: <a href="https://doi.org/10.1002/cpbi.102">10.1002/cpbi.102</a></li>
<li>Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316-319. doi: <a href="https://doi.org/10.1038/nbt.3820">10.1038/nbt.3820</a></li>
<li>Ewels, P. A., Peltzer, A., Fillinger, S., Patel, H., Alneberg, J., Wilm, A., Garcia, M. U., Di Tommaso, P., & Nahnsen, S. (2020). The nf-core framework for community-curated bioinformatics pipelines. Nature Biotechnology, 38(3), 276-278. doi: <a href="https://doi.org/10.1038/s41587-020-0439-x">10.1038/s41587-020-0439-x</a></li>
<li>Grüning, B., Dale, R., Sjödin, A., Chapman, B. A., Rowe, J., Tomkins-Tinch, C. H., Valieris, R., Köster, J., & Bioconda Team. (2018). Bioconda: sustainable and comprehensive software distribution for the life sciences. Nature Methods, 15(7), 475–476. doi: <a href="https://doi.org/10.1038/s41592-018-0046-7">10.1038/s41592-018-0046-7</a></li>
<li>da Veiga Leprevost, F., Grüning, B. A., Alves Aflitos, S., Röst, H. L., Uszkoreit, J., Barsnes, H., Vaudel, M., Moreno, P., Gatto, L., Weber, J., Bai, M., Jimenez, R. C., Sachsenberg, T., Pfeuffer, J., Vera Alvarez, R., Griss, J., Nesvizhskii, A. I., & Perez-Riverol, Y. (2017). BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics (Oxford, England), 33(16), 2580–2582. doi: <a href="https://doi.org/10.1093/bioinformatics/btx192">10.1093/bioinformatics/btx192</a></li>
${tool_bibliography}
</ul>
<div class="alert alert-info">
<h5>Notes:</h5>
<ul>
${nodoi_text}
<li>The command above does not include parameters contained in any configs or profiles that may have been used. Ensure the config file is also uploaded with your publication!</li>
<li>You should also cite all software used within this run. Check the "Software Versions" of this report to get version information.</li>
</ul>
</div>
Binary file added assets/mgnify_logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 4 additions & 3 deletions assets/multiqc_config.yml
Original file line number Diff line number Diff line change
@@ -1,16 +1,17 @@
report_comment: >
This report has been generated by the <a href="https://github.com/ebi-metagenomics/miassembler/tree/dev" target="_blank">ebi-metagenomics/miassembler</a>
This report has been generated by the <a href="https://github.com/ebi-metagenomics/miassembler/" target="_blank">ebi-metagenomics/miassembler</a>
analysis pipeline.
report_section_order:
"ebi-metagenomics-miassembler-methods-description":
order: -1000
software_versions:
order: -1001
"ebi-metagenomics-miassembler-summary":
order: -1002

export_plots: true

skip_versions_section: true

top_modules:
- fastqc
- quast
Expand Down
3 changes: 0 additions & 3 deletions assets/samplesheet.csv

This file was deleted.

Loading

0 comments on commit 36bc2f0

Please sign in to comment.