-
Notifications
You must be signed in to change notification settings - Fork 3
Home
Welcome to the nf-ducken wiki!
nf-ducken
is a workflow for end-to-end amplicon processing for meta-analyses.
This amplicon sequencing pipeline integrates the microbiome bioinformatics platform QIIME 2 (Bolyen et al., 2019) and the workflow manager Nextflow (Di Tommaso et al., 2017). The aim is to establish a computational pipeline to perform necessary analysis, and also adhere to principles of good data engineering practices by being:
- scalable, to many hundreds to thousands of samples
- reproducible, keeping record of software versions and environments
- automated, to minimize manual interaction and human error, and
- standardized, in order that multiple datasets may be analyzed through a uniform workflow
A Nextflow pipeline enabling high-throughput, parallelized meta-analysis of marker gene data.
No more need to manually launch one analysis step after another. With nf-ducken
, execute a single workflow that will take you through all necessary pre-processing steps:
- Data import: Import local FASTQ files (
qiime tools import
) or download from the SRA (q2-fondue
) - Optional adapter trimming:
q2-cutadapt
- Initial quality control and denoising:
q2-dada2
- Optional closed-reference OTU clustering:
q2-vsearch
- Taxonomy classification:
q2-feature-classifier
- Collapse to taxon of interest:
q2-taxa
📊 These steps are illustrated in the flowchart below.
For a detailed introduction to nf-ducken
, see the tutorial.
nf-ducken
can be run using Conda or Singularity/Docker environments. Please ensure Nextflow is installed in your base environment. Launch a Conda environment-based run using -profile conda
when running the workflow script; alternatively, launch container-based runs with Singularity or Docker using -profile docker
or -profile singularity
.
Note for users with newer Apple processors (M1/M2): Conda environments require emulation using Rosetta, due to the lack of certain packages for the ARM64 architecture otherwise available with Intel processors. Please follow the installation and setup instructions here for details.
Have the following reference files downloaded, pre-processed with RESCRIPt or from the QIIME 2 data resource center:
- Pretrained taxonomy classifier: Naive Bayes taxonomy classifier trained on SILVA 138 99% OTUs
- Reference sequences: SILVA 138 SSURef NR99 sequences
- Reference taxonomy: SILVA 138 SSURef NR99 taxonomy
If sequence files are available locally, construct an input FASTQ manifest file with the following instructions.
Generate a input configuration file (run.config
) with your desired parameters. More details on parameterization can be found in the usage docs, but required parameters are below. An example configuration file can be found here.
-
read_type
:"single"
or"paired"
; whether your FASTQs are single- or paired-end -
pipeline_type
:"import"
or"download"
; former if FASTQs are available locally, latter if FASTQs need to be downloaded from the SRA -
outdir
: File name of output directory -
fastq_manifest
: File path of manifest file -
otu_ref_file
: File path of reference sequence file -
taxonomy_ref_file
: File path of reference taxonomy file -
trained_classifier
: File path of pretrained taxonomy classifier
Run your workflow with the sample execution script:
nextflow run bokulich-lab/nf-ducken -c run.config -profile conda
-
outDir/taxonomy.qza
: Artifact containing frequencies for features collapsed to a given level (default genus). -
outDir/taxonomy.qzv
: Visualization containing frequencies for features collapsed to a given level (default genus). -
outDir/feature_table.qza
: Artifact containing table of represented features by sample. -
outDir/stats/
: Directory containing QC metrics, including FastQC, clustering statistics, denoising statistics, etc. -
outDir/trace/
: Directory containing runtime metrics with an execution report and a pipeline DAG.
- Bolyen et al., 2019, Nature Biotechnology: Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2
- Di Tommaso et al., 2017, Nature Biotechnology: Nextflow enables reproducible computational workflows