The BasicQC tutorial functionality has been moved to our new EPI2ME Labs product. Please see the EPI2ME Labs documentation at [https://labs.epi2me.io] and have a look at the product's GitHub pages at [https://github.com/epi2me-labs].
The EPI2ME Labs product provides a collection of tutorials and best-practise guidelines for processing Nanopore sequence data. The product is provided in a maintained docker container and interactive tutorials are provided through Jupyter notebooks. The Jupyter experience has been customised and provides exciting new material through interactive menus, genome browsers and more.
This repository is now unsupported and we do not recommend its use. Please contact Oxford Nanopore: [email protected] for help with your application if it is not possible to upgrade to our new resources, or we are missing key features.
The Summary Statistics and QC tutorial is intended as a functional guide to help assess the quality characteristics of a single Nanopore sequence run. This tutorial aims to enable an objective assessment of the performance of a Nanopore flowcell run and to assess the sequence characteristics to benchmark quality.
Sufficient information is provided in the tutorial such that the workflow can be tested, validated, and replicated. The tutorial is provided with an example dataset from a barcoded sequence library. The tutorial is intended to address important questions;
- how many reads (and how many gigabases) were sequenced?
- what fraction of my sequence collection is good quality?
- how are longer sequence reads represented in my sample?
- how uniform is the representation of different barcodes?
This tutorial uses the R markdown contained within this Github repository, a sequence_summary.txt
file from the Guppy base-calling software, and optionally a barcoding_summary.txt
file from Guppy barcoding as input. Example summary files are included within the repository. The result of the tutorial will be a tutorial document in html
format. This workflow can also process the sequence_summary.txt
file prepared by the albacore
base calling software.
This tutorial requires a computer running Linux (Centos7, Ubuntu 18_10, Fedora 29) - 8Gb of memory would be recommended. The tutorial has been tested on minimal server installs of these operating systems.
Other dependencies include
Conda
is required by this tutorial and orchestrates and manages the installation of other required softwareR
is a statistical analysis software and is used for the analysis and reporting of the sequence summary dataRstudio
is a graphical user interface toR
and provides much of the required reporting frameworkgit
packages for downloading the tutorial from Github repository.git-lfs
is required to download the sequence and metadata files provided with the tutorial.
- Most software dependecies are managed though
conda
, install as described at
https://conda.io/docs/install/quick.html.
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
bash
- Download Nanopore QC tutorial & example files into a folder named
QCTutorial
. This tutorial requires thegit-lfs
large file support capabilities; this should be installed first throughconda
conda install -c conda-forge git-lfs
git lfs install
git clone https://github.com/nanoporetech/ont_tutorial_basicqc.git QCTutorial
- Change working directory into the new
QCTutorial
folder
cd QCTutorial
- Install conda software dependencies with
conda env create --name BasicQC --file environment.yaml
- Initialise conda environment with
source activate BasicQC
This tutorial does not contain software that requires compilation.
In your Conda environment, and in the tutorial working directory,
- optional edit the provided
config.yaml
file to match your own study design - Render the tutorial report using the command
R --slave -e 'rmarkdown::render("Nanopore_SumStatQC_Tutorial.Rmd", "html_document")'
The provided Rmarkdown tutorial script can also be opened directly in Rstudio
rstudio Nanopore_SumStatQC_Tutorial.Rmd
The report can be prepared by "knit" from the GUI as shown in the figure
This tutorial workflow will produce a rich description of your sequence characteristics as observed from the starting sequence_summary.txt
file. Please visit the tutorial page at https://community.nanoporetech.com/knowledge/bioinformatics for more information
© 2019 Oxford Nanopore Technologies Ltd.
Bioinformatics-Tutorials are distributed by Oxford Nanopore Technologies under the terms of the MPL-2.0 license.
-
knit is the command to render an Rmarkdown file. The knitr package is used to embed code, the results of R analyses and their figures within the typeset text from the document.
-
L50 the number of sequences (or contigs etc) that are longer than, or equal to, the N50 length and therefore include half the bases of the assembly
-
N50 length such that sequences (or contigs etc) of this length or longer include half the bases of the sequence collection
-
Rmarkdown is an extension to markdown. Functional R code can be embedded in a plain-text document and subsequently rendered to other formats including the PDF format of this report.
-
QV the quality value - -log10(p) that any given base is incorrect. QV may be either at the individual base level, or may be averaged across whole sequences
-
sequencing_summary.txt a summary file describing sequence characteristics following base calling with the Guppy / Albacore software.