Home

Welcome to the nf-ducken wiki!

Introduction

nf-ducken is a workflow for end-to-end amplicon processing for meta-analyses.

This amplicon sequencing pipeline integrates the microbiome bioinformatics platform QIIME 2 (Bolyen et al., 2019) and the workflow manager Nextflow (Di Tommaso et al., 2017). The aim is to establish a computational pipeline to perform necessary analysis, and also adhere to principles of good data engineering practices by being:

scalable, to many hundreds to thousands of samples
reproducible, keeping record of software versions and environments
automated, to minimize manual interaction and human error, and
standardized, in order that multiple datasets may be analyzed through a uniform workflow

Summary

A Nextflow pipeline enabling high-throughput, parallelized meta-analysis of marker gene data.

No more need to manually launch one analysis step after another. With nf-ducken, execute a single workflow that will take you through all necessary pre-processing steps:

Data import: Import local FASTQ files (qiime tools import) or download from the SRA (q2-fondue)
Optional adapter trimming: q2-cutadapt
Initial quality control and denoising: q2-dada2
Optional closed-reference OTU clustering: q2-vsearch
Taxonomy classification: q2-feature-classifier
Collapse to taxon of interest: q2-taxa

📊 These steps are illustrated in the flowchart below.

Quick Start

For a detailed introduction to nf-ducken, see the tutorial.

Setup

nf-ducken can be run using Conda or Singularity/Docker environments. Please ensure Nextflow is installed in your base environment. Launch a Conda environment-based run using -profile conda when running the workflow script; alternatively, launch container-based runs with Singularity or Docker using -profile docker or -profile singularity.

Note for users with newer Apple processors (M1/M2): Conda environments require emulation using Rosetta, due to the lack of certain packages for the ARM64 architecture otherwise available with Intel processors. Please follow the installation and setup instructions here for details.

Inputs

Have the following reference files downloaded, pre-processed with RESCRIPt or from the QIIME 2 data resource center:

Pretrained taxonomy classifier: Naive Bayes taxonomy classifier trained on SILVA 138 99% OTUs
Reference sequences: SILVA 138 SSURef NR99 sequences
Reference taxonomy: SILVA 138 SSURef NR99 taxonomy

If sequence files are available locally, construct an input FASTQ manifest file with the following instructions.

Generate a input configuration file (run.config) with your desired parameters. More details on parameterization can be found in the usage docs, but required parameters are below. An example configuration file can be found here.

read_type: "single" or "paired"; whether your FASTQs are single- or paired-end
pipeline_type: "import" or "download"; former if FASTQs are available locally, latter if FASTQs need to be downloaded from the SRA
outdir: File name of output directory
fastq_manifest: File path of manifest file
otu_ref_file: File path of reference sequence file
taxonomy_ref_file: File path of reference taxonomy file
trained_classifier: File path of pretrained taxonomy classifier

Run your workflow with the sample execution script:

nextflow run bokulich-lab/nf-ducken -c run.config -profile conda

Outputs

outDir/taxonomy.qza: Artifact containing frequencies for features collapsed to a given level (default genus).
outDir/taxonomy.qzv: Visualization containing frequencies for features collapsed to a given level (default genus).
outDir/feature_table.qza: Artifact containing table of represented features by sample.
outDir/stats/: Directory containing QC metrics, including FastQC, clustering statistics, denoising statistics, etc.
outDir/trace/: Directory containing runtime metrics with an execution report and a pipeline DAG.

References

Bolyen et al., 2019, Nature Biotechnology: Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2
Di Tommaso et al., 2017, Nature Biotechnology: Nextflow enables reproducible computational workflows

Provide feedback

Saved searches

Use saved searches to filter your results more quickly