-
Notifications
You must be signed in to change notification settings - Fork 7
Usage
nextflow run UPHL-BioNGS/Donut_Falls -profile singularity --sample_sheet <path to sample sheet>
There are many steps into getting an assembled bacterial genome. Donut Falls supports Nanopore sequencing of isolates with and without corresponding Illumina fastq files. For metagenomic samples, we recommend NF-core's MAG or other workflows. Our typical use-case is sequencing isolates on a GridIon and using MinKnow for basecalling and fastq.gz file generation.
---
Basic nanopore workflow
---
flowchart LR
A[isolate bacteria] --> D[sequence]
D --> E[basecalling]
E --> F[Donut Falls]
F --> G[analysis]
Final results are placed in the value of params.outdir
(default = 'donut_falls'), which can be adjusted on the command line or in an input file.
There was an attempt made to match Illumina reads to Nanopore reads in a variety of different ways, but we decided it was too difficult to maintain. Thus, a sample sheet that matches Nanopore reads with Illumina reads can be used as input.
The sample file has two required columns and two optional columns
- 'sample' designate the name used for the isolate that was sequenced
- 'fastq' designate the Nanopore fastq.gz file
- 'fastq_1' and 'fastq_2' are optional and designate the forward and reverse Illumina reads
A typical sample file with both Nanopore and Illumina reads
sample,fastq,fastq_1,fastq_2
test,nanopore.fastq.gz,illumina_1.fastq.gz,illumina_2.fastq.gz
An acceptable sample file for just Nanopore reads
sample,fastq
test,long_reads_low_depth.fastq.gz
An acceptable sample file where one sample does not have Illumina reads
sample,fastq,fastq_1,fastq_2
sample1,sample1.fastq.gz,sample1_R1.fastq.gz,sample1_R2.fastq.gz
sample2,sample2.fastq.gz,,
The default workflow should run just fine, but there are some parameters that would improve performance. Donut Falls, due to its portability, does not have many input parameters - because those would need to be inherited from the larger workflow that it was part of. Instead, ext.args can be adjusted with a config file. Admittedly, this does make customization more difficult. Sorry.
WARNING : Changing ext.args via config files will change those values for every sample in the workflow. If you have samples that need different values, they need to be run separately.
The default workflow assumes a genome size of 5M for rasusa subsampling to 150X coverage. The recommended coverage for assembly is 100X coverage, but we needed the base values to work for most use-cases. Although this works for many organisms sequenced at UPHL and in public health in general (i.e. Escherichia coli, Salmonella enterica, and even Pseudomonas aeruginosa), this may be problematic for genomes much larger (like Sorangium cellulosum with 13M bases) or smaller (like Campylobacter jejuni with 1.7M).
For these cases, we recommend adjusting the ext.args for rasusa in a config file.
process {
withName: rasusa {
ext.args = "--genome-size 8.5mb --coverage 150"
}
}
Medaka performs best when given what kind of model basecaller used. It generally has the format of {pore}_{device}_{caller variant}_{caller version}
and specified with -m
.
- Example for data from MinION R9.4.1 flowclells using the fast Guppy basecaller version 3.0.3:
'-m r941_min_fast_g303'
process {
withName: medaka {
ext.args = "-m r941_min_fast_g303"
}
}
There are currently several options available for Donut falls that are specified by 'params.assembler'.
De novo assembly of nanopore reads (with or without polishing):
Hybrid assembly (requires Illumina reads)
Assembler only have to be listed once.
Donut Falls has two profiles for "easy" command line container management.
- docker : uses Docker to manage containers in the workflow
docker.enabled = true
docker.runOptions = "-u \$(id -u):\$(id -g)"
- singularity : uses Singularity to manage containers in the workflow
singularity.enabled = true
singularity.autoMounts = true
Config files are a reproducible way to ensure that the same parameters are shared each time a workflow is run. It is common to specify paths to databases and solidify parameter values in config files.
To get a copy of a template config file with every editable parameter, run the following command
nextflow run UPHL-BioNGS/Donut_Falls --config_file true
This will create a config file named edit_me.config
into the current directory. This file can be renamed and edited without altering the original workflow. The parameters (also known as params) in this file are all preceded by //
, which indicates that they are not in use. The //
must be removed for that line to be taken into consideration from the workflow.
To use this config file during runtime, simply specify the config file with -c
on the command line.
nextflow run UPHL-BioNGS/Donut_Falls -c edit_me.config
This master config file can also be found at Donut_Falls/configs/donut_falls_template.config.
# optional: input summary file from nanopore sequencing run
params.sequencing_summary = ''
# sample sheet with information about samples and their corresponding files
params.sample_sheet = ''
# specifies which subworkflow to use (default = 'flye')
params.assembler = 'flye' and/or 'raven' and/or 'unicycler'
# where the results are saved (default = 'donut_falls'
params.outdir = 'donut_falls'
# specifies if test files should be downloaded
params.test = false
nextflow run UPHL-BioNGS/Donut_Falls -profile singularity,test
nextflow run UPHL-BioNGS/Donut_Falls -profile singularity --sample_sheet sample_sheet.csv --assembler flye
nextflow run UPHL-BioNGS/Donut_Falls -profile singularity --sample_sheet SampleSheet.csv --assembler flye,raven
Hybrid assembly with unicycler using docker to manage containers and a sample sheet named 'SampleSheet.csv'
nextflow run UPHL-BioNGS/Donut_Falls -profile docker --sample_sheet SampleSheet.csv --assembler unicycler
The config file
docker.enabled = true
docker.runOptions = "-u \$(id -u):\$(id -g)"
params.assembler = 'flye'
params.flye_options = '--meta'
params.sample_sheet = 'SampleSheet.csv'
The command line
nextflow run UPHL-BioNGS/Donut_Falls -c config.config