Snakemake workflow: FastQC and MultiQC of Next-generation Sequencing Data

Motivation

This is a Snakemake pipeline for quality control of Illumina next-generation sequencing data. It performs quality check using FastQC on raw fastq-files and merges fastqc reports using MultiQC.

Prerequisites

Conda is a package, dependency and environment management system that is used to install software packages and manage their dependencies. It runs on Linux, OS X and Windows, and was created for Python programs but it can package and distribute software for any language. install conda for your operating system: Linux, MacOS.
Snakemake is a workflow management system that allows to create reproducible and scalable data analyses.
FastQC is a quality control tool for high throughput sequence data.
MultiQC is a tool to aggregate bioinformatics results across many samples into a single report. Configuration The configuration file is located in config/config.yaml. This file contains paths to input files and directories, output directories, and other settings.

Usage

Clone the repository:

git clone https://github.com/kevin-wamae/fastqc-multiqc-pipeline.git

Navigate into the cloned directory using the following command:

cd fastqc-multiqc-pipeline

Create a conda environment (named fastqc-multiqc-pipeline) for the pipeline:

conda env create --file workflow/envs/environment.yaml

Activate the conda environment. This needs to be done every time you exit and restart your terminal and want re-run this pipeline:

conda activate fastqc-multiqc-pipeline

Run the pipeline with Snakemake:

snakemake --cores <number_of_cores> --use-conda

The --cores option specifies the number of cores to use, and the --use-conda option tells Snakemake to use the specified conda environments.

Finally, the config file is located in config/config.yaml.
- This file contains paths to input files and directories, output directories, and other settings such as the number of cores to use.
- You can edit this file to suit your needs.
- For example, you can change the number of cores to use by editing the extra:threads: parameter

Output

The pipeline generates HTML reports for each sample in the fastqc directory and a merged HTML report in the multiqc directory.

Dependencies

This pipeline uses conda environments to manage dependencies for each rule. The environments are defined in envs/fastqc.yaml and envs/multiqc.yaml.

Contact

Report any issues or bugs by openning an issue here or contact me via email (wamaekevin[at]gmail.com)

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
config		config
input/fastq		input/fastq
workflow		workflow
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Snakemake workflow: FastQC and MultiQC of Next-generation Sequencing Data

Motivation

Prerequisites

Usage

Output

Dependencies

Contact

About

Releases

Packages

Languages

License

kevin-wamae/snakemake-illumina-fastqc

Folders and files

Latest commit

History

Repository files navigation

Snakemake workflow: FastQC and MultiQC of Next-generation Sequencing Data

Motivation

Prerequisites

Usage

Output

Dependencies

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages