Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some tweaks on top of the restructure_outputs branch. #7

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 7 additions & 4 deletions .github/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,11 @@ If you're not used to this workflow with git, you can start with some [docs from

## Tests

You can optionally test your changes by running the pipeline locally. Then it is recommended to use the `debug` profile to
receive warnings about process selectors and other debug info. Example: `nextflow run . -profile debug,test,docker --outdir <OUTDIR>`.
You have the option to test your changes locally by running the pipeline. For receiving warnings about process selectors and other `debug` information, it is recommended to use the debug profile. Execute all the tests with the following command:

```bash
nf-test test --profile debug,test,docker --verbose
```

When you create a pull request with changes, [GitHub Actions](https://github.com/features/actions) will run automatic tests.
Typically, pull-requests are only fully reviewed when these tests are passing, though of course we can help out before then.
Expand All @@ -40,7 +43,7 @@ If any failures or warnings are encountered, please follow the listed URL for mo

### Pipeline tests

Each `nf-core` pipeline should be set up with a minimal set of test-data.
Each of the Microbiome Informatics pipelines should be set up with a minimal set of test-data.
`GitHub Actions` then runs the pipeline on this data to ensure that it exits successfully.
If there are any failures then the automated tests fail.
These tests are run both with the latest available version of `Nextflow` and also the minimum required version that is stated in the pipeline code.
Expand Down Expand Up @@ -82,7 +85,7 @@ Once there, use `nf-core schema build` to add to `nextflow_schema.json`.

Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/master/nf_core/pipeline-template/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels.

The process resources can be passed on to the tool dynamically within the process with the `${task.cpu}` and `${task.memory}` variables in the `script:` block.
The process resources can be passed on to the tool dynamically within the process with the `${task.cpus}` and `${task.memory}` variables in the `script:` block.

### Naming schemes

Expand Down
36 changes: 36 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
name: nf-test CI
on:
push:
branches:
- dev
pull_request:
release:
types: [published]

env:
NXF_ANSI_LOG: false
NFTEST_VER: "0.8.4"

jobs:
test:
name: Run pipeline with test data
runs-on: ubuntu-latest

steps:
- name: Check out pipeline code
uses: actions/checkout@v4

- uses: actions/setup-java@99b8673ff64fbf99d8d325f52d9a5bdedb8483e9 # v4
with:
distribution: "temurin"
java-version: "17"

- name: Setup Nextflow
uses: nf-core/setup-nextflow@v2

- name: Install nf-test
uses: nf-core/setup-nf-test@v1

- name: Run pipeline with test data
run: |
nf-test test
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,8 @@ testing*
results/

*.pyc
.pytest_cache/

assets/fetch_tool_credentials.json
assets/fetch_tool_credentials.json
.nf-test.log
.nf-test/
30 changes: 23 additions & 7 deletions .nf-core.yml
Original file line number Diff line number Diff line change
@@ -1,32 +1,48 @@
repository_type: pipeline
template:
prefix: ebi-metagenomics
skip:
- ci
- github_badges
lint:
files_exist:
- CODE_OF_CONDUCT.md
- assets/nf-core-miassembler_logo_light.png
- docs/images/nf-core-miassembler_logo_light.png
- docs/images/nf-core-miassembler_logo_dark.png
- docs/output.md
- docs/usage.md
- .github/ISSUE_TEMPLATE/config.yml
- .github/workflows/awstest.yml
- .github/workflows/awsfulltest.yml
- .github/workflows/branch.yml
- .github/workflows/ci.yml
- .github/workflows/linting_comment.yml
- .github/workflows/linting.yml
- conf/test_full.config
- lib/Utils.groovy
- lib/WorkflowMain.groovy
- lib/NfcoreTemplate.groovy
- lib/WorkflowMiassembler.groovy
- lib/nfcore_external_java_deps.jar
files_unchanged:
- CODE_OF_CONDUCT.md
- assets/nf-core-miassembler_logo_light.png
- docs/images/nf-core-miassembler_logo_light.png
- docs/images/nf-core-miassembler_logo_dark.png
- .github/ISSUE_TEMPLATE/bug_report.yml
- .github/CONTRIBUTING.md
- LICENSE
- docs/README.md
- .gitignore
multiqc_config:
- report_comment
nextflow_config:
nextflow_config: False
- params.input
- params.validationSchemaIgnoreParams
- params.custom_config_version
- params.custom_config_base
- manifest.name
- manifest.homePage
readme:
- nextflow_badge
repository_type: pipeline
template:
prefix: ebi-metagenomics
skip:
- ci
- github_badges
56 changes: 52 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@

This pipeline is still in early development. It's mostly a direct port of the mi-automation assembly generation pipeline. Some of the bespoke scripts used to remove contaminated contigs or to calculate the coverage of the assembly were replaced with tools provided by the community ([SeqKit](https://doi.org/10.1371/journal.pone.0163962) and [quast](https://doi.org/10.1093/bioinformatics/btu153) respectively).

> [!NOTE]
> This pipeline uses the nf-core template with some tweaks, but it's not part of nf-core.

## Usage

> [!WARNING]
Expand All @@ -23,12 +26,21 @@ nextflow run ebi-metagenomics/miassembler --help
Input/output options
--study_accession [string] The ENA Study secondary accession
--reads_accession [string] The ENA Run primary accession
--assembler [string] The short reads assembler (accepted: spades, metaspades, megahit) [default: metaspades for PE, megahit for SE]
--private_study [boolean] To use if the ENA study is private [default: false]
--assembler [string] The short reads assembler (accepted: spades, metaspades, megahit) [default: metaspades]
--reference_genome [string] The genome to be used to clean the assembly, the genome will be taken from the Microbiome Informatics internal
directory (accepted: chicken.fna, salmon.fna, cod.fna, pig.fna, cow.fna, mouse.fna, honeybee.fna,
rainbow_trout.fna, ...) [default: human+phiX]
--reference_genomes_folder [string] The folder with the reference genome blast indexes, defaults to the Microbiome Informatics internal directory
[default: /nfs/production/rdf/metagenomics/pipelines/prod/assembly-pipeline/blast_dbs/]
rainbow_trout.fna, rat.fna, ...)
--blast_reference_genomes_folder [string] The folder with the reference genome blast indexes, defaults to the Microbiome Informatics internal
directory.
--bwamem2_reference_genomes_folder [string] The folder with the reference genome bwa-mem2 indexes, defaults to the Microbiome Informatics internal
directory.
--remove_human_phix [boolean] Remove human and phiX reads pre assembly, and contigs matching those genomes. [default: true]
--human_phix_blast_index_name [string] Combined Human and phiX BLAST db. [default: human_phix]
--human_phix_bwamem2_index_name [string] Combined Human and phiX bwa-mem2 index. [default: human_phix]
--min_contig_length [integer] Minimum contig length filter. [default: 500]
--assembly_memory [integer] Default memory allocated for the assembly process. [default: 100]
--spades_only_assembler [boolean] Run SPAdes/metaSPAdes without the error correction step. [default: true]
--outdir [string] The output directory where the results will be saved. You have to use absolute paths to storage on Cloud
infrastructure.
--email [string] Email address for completion summary.
Expand All @@ -50,7 +62,43 @@ nextflow run ebi-metagenomics/miassembler \
--reads_accession SRR1631361
```

## Outputs

The outputs of the pipeline are organized as follows:

```
results/SRP1154
└── SRP115494
└── SRR6180
└── SRR6180434
├── assembly
│   └── metaspades
│   └── 3.15.5
│   ├── coverage
│   ├── decontamination
│   └── qc
│   ├── multiqc
│   └── quast
└── qc
├── fastp
└── fastqc

```

The nested structure based on ENA Study and Reads accessions was created to suit the Microbiome Informatics team’s needs. The benefit of this structure is that results from different runs of the same study won’t overwrite any results.

## Tests

There is a very small test data set ready to use:

```bash
nextflow run main.nf -resume -profile test,docker
```

### End to end tests

Two end-to-end tests can be launched (with megahit and metaspades) with the following command:

```bash
pytest tests/workflows/ --verbose
```
2 changes: 1 addition & 1 deletion assets/email_template.html
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@

<img src="cid:nfcorepipelinelogo">

<h1>ebi-metagenomics/miassembler v${version}</h1>
<h1>ebi-metagenomics/miassembler ${version}</h1>
<h2>Run Name: $runName</h2>

<% if (!success){
Expand Down
Binary file added assets/mgnify_logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 4 additions & 3 deletions assets/multiqc_config.yml
Original file line number Diff line number Diff line change
@@ -1,16 +1,17 @@
report_comment: >
This report has been generated by the <a href="https://github.com/ebi-metagenomics/miassembler/tree/dev" target="_blank">ebi-metagenomics/miassembler</a>
This report has been generated by the <a href="https://github.com/ebi-metagenomics/miassembler/" target="_blank">ebi-metagenomics/miassembler</a>
analysis pipeline.

report_section_order:
"ebi-metagenomics-miassembler-methods-description":
order: -1000
software_versions:
order: -1001
"ebi-metagenomics-miassembler-summary":
order: -1002

export_plots: true

skip_versions_section: true

top_modules:
- fastqc
- quast
Expand Down
3 changes: 0 additions & 3 deletions assets/samplesheet.csv

This file was deleted.

Loading