Skip to content

A set of scripts for running the ARTIC pipeline from fast5 ONT data to filtered SARS-CoV-2 variants and coverage plots

License

Notifications You must be signed in to change notification settings

MaestSi/Covid19_ONT_Artic

Repository files navigation

Covid19_ONT_Artic

Covid19_ONT_Artic is a set of scripts for running the ARTIC pipeline from fast5 ONT data to filtered SARS-CoV-2 variants and coverage plots.

drawing

Getting started

Prerequisites

  • Miniconda3. Tested with conda 4.9.2. which conda should return the path to the executable. If you don't have Miniconda3 installed, you could download and install it with:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod 755 Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh

Then, after completing Covid19_ONT_Artic installation, set the MINICONDA_DIR variable in config_Covid19_ONT_Artic.R to the full path to miniconda3 directory.

  • Guppy, the software for basecalling and demultiplexing provided by ONT. Tested with Guppy v4.2. If you don't have Guppy installed, choose an appropriate version and install it. For example, you could download and unpack the archive with:
wget /path/to/ont-guppy-cpu_version_of_interest.tar.gz
tar -xf ont-guppy-cpu_version_of_interest.tar.gz

A directory ont-guppy-cpu should have been created in your current directory. Then, after completing Covid19_ONT_Artic installation, set the BASECALLER_DIR variable in config_Covid19_ONT_Artic.R to the full path to ont-guppy-cpu/bin directory.

Installation

git clone https://github.com/MaestSi/Covid19_ONT_Artic.git
cd Covid19_ONT_Artic
chmod 755 *
./install.sh

A conda environment named Covid19_ONT_Artic_env is created, where seqtk, NanoFilt, Bedtools and R with packages biostrings, ggplot2 and scales are installed. Moreover, a conda environment named pycoQC_env is created, where pycoQC is installed. Then, you can open the config_Covid19_ONT_Artic.R file with a text editor and set the variables PIPELINE_DIR and MINICONDA_DIR to the value suggested by the installation step.

Usage

Launch_Covid19_ONT_Artic.sh

Usage: Launch_Covid19_ONT_Artic.sh <fast5_dir>

Input:

  • <fast5_dir>: directory containing raw fast5 files

Outputs (saved in <fast5_dir>_analysis/analysis):

  • logfile.txt: log file describing the progress of the workflow
  • <sample_name> directory containing:
    • Files generated by ARTIC pipeline
    • <sample_name>_linear.dat: file containing average coverage across regions defined in BED_COV file
    • <sample_name>.dat: file containing average coverage in log scale across regions defined in BED_COV file
    • <sample_name>_PrimerCov.png: plot representing average coverage in log scale across regions defined in BED_COV file
    • <sample_name>.pass_intersected_BED_VAR.vcf: file obtained intersecting <sample_name>.pass.vcf.gz file and variants of interest defined in BED_VAR file
    • <sample_name>_NOT_GENOTYPED_VAR.bed: file containing variants of interest defined in BED_VAR file which could not be genotyped

Outputs (saved in <fast5_dir>_analysis/qc):

  • Read length distributions and pycoQC report

Outputs (saved in <fast5_dir>_analysis/basecalling):

  • Temporary files for basecalling

Outputs (saved in <fast5_dir>_analysis/preprocessing):

  • Temporary files for demultiplexing, filtering based on read length and adapters trimming

Auxiliary scripts

In the following, auxiliary scripts run by Launch_Covid19_ONT_Artic.sh are listed. These scripts should not be called directly.

Covid19_ONT_Artic.R

Note: script run by Launch_Covid19_ONT_Artic.sh.

config_Covid19_ONT_Artic.R

Note: configuration script, must be modified before running Launch_Covid19_ONT_Artic.sh.

subsample_fast5.sh

Note: script run by Covid19_ONT_Artic.R if do_subsampling_flag variable is set to 1 in config_Covid19_ONT_Artic.R.

Testing data

Example data for sample SP1, the first case in Sao Paulo (Brasil), can be downloaded from here, and their analysis should result in the following coverage plot.

drawing

Citation

If this tool is useful for your work, please consider citing our manuscript.

Maestri S., Grosso V., Alfano M., Lavezzari D., Piubelli C., Bisoffi Z., Rossato M., Delledonne M., STArS (STrain-Amplicon-Seq), a targeted Nanopore sequencing workflow for SARS-CoV-2 diagnostics and genotyping, Biology Methods and Protocols, 2022;, bpac020, https://doi.org/10.1093/biomethods/bpac020

About

A set of scripts for running the ARTIC pipeline from fast5 ONT data to filtered SARS-CoV-2 variants and coverage plots

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published