Covid19_ONT_Artic is a set of scripts for running the ARTIC pipeline from fast5 ONT data to filtered SARS-CoV-2 variants and coverage plots.
Prerequisites
- Miniconda3.
Tested with conda 4.9.2.
which conda
should return the path to the executable. If you don't have Miniconda3 installed, you could download and install it with:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod 755 Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh
Then, after completing Covid19_ONT_Artic installation, set the MINICONDA_DIR variable in config_Covid19_ONT_Artic.R to the full path to miniconda3 directory.
- Guppy, the software for basecalling and demultiplexing provided by ONT. Tested with Guppy v4.2. If you don't have Guppy installed, choose an appropriate version and install it. For example, you could download and unpack the archive with:
wget /path/to/ont-guppy-cpu_version_of_interest.tar.gz
tar -xf ont-guppy-cpu_version_of_interest.tar.gz
A directory ont-guppy-cpu should have been created in your current directory. Then, after completing Covid19_ONT_Artic installation, set the BASECALLER_DIR variable in config_Covid19_ONT_Artic.R to the full path to ont-guppy-cpu/bin directory.
Installation
git clone https://github.com/MaestSi/Covid19_ONT_Artic.git
cd Covid19_ONT_Artic
chmod 755 *
./install.sh
A conda environment named Covid19_ONT_Artic_env is created, where seqtk, NanoFilt, Bedtools and R with packages biostrings, ggplot2 and scales are installed. Moreover, a conda environment named pycoQC_env is created, where pycoQC is installed. Then, you can open the config_Covid19_ONT_Artic.R file with a text editor and set the variables PIPELINE_DIR and MINICONDA_DIR to the value suggested by the installation step.
Launch_Covid19_ONT_Artic.sh
Usage: Launch_Covid19_ONT_Artic.sh <fast5_dir>
Input:
- <fast5_dir>: directory containing raw fast5 files
Outputs (saved in <fast5_dir>_analysis/analysis):
- logfile.txt: log file describing the progress of the workflow
- <sample_name> directory containing:
- Files generated by ARTIC pipeline
- <sample_name>_linear.dat: file containing average coverage across regions defined in BED_COV file
- <sample_name>.dat: file containing average coverage in log scale across regions defined in BED_COV file
- <sample_name>_PrimerCov.png: plot representing average coverage in log scale across regions defined in BED_COV file
- <sample_name>.pass_intersected_BED_VAR.vcf: file obtained intersecting <sample_name>.pass.vcf.gz file and variants of interest defined in BED_VAR file
- <sample_name>_NOT_GENOTYPED_VAR.bed: file containing variants of interest defined in BED_VAR file which could not be genotyped
Outputs (saved in <fast5_dir>_analysis/qc):
- Read length distributions and pycoQC report
Outputs (saved in <fast5_dir>_analysis/basecalling):
- Temporary files for basecalling
Outputs (saved in <fast5_dir>_analysis/preprocessing):
- Temporary files for demultiplexing, filtering based on read length and adapters trimming
In the following, auxiliary scripts run by Launch_Covid19_ONT_Artic.sh are listed. These scripts should not be called directly.
Covid19_ONT_Artic.R
Note: script run by Launch_Covid19_ONT_Artic.sh.
config_Covid19_ONT_Artic.R
Note: configuration script, must be modified before running Launch_Covid19_ONT_Artic.sh.
subsample_fast5.sh
Note: script run by Covid19_ONT_Artic.R if do_subsampling_flag variable is set to 1 in config_Covid19_ONT_Artic.R.
Example data for sample SP1, the first case in Sao Paulo (Brasil), can be downloaded from here, and their analysis should result in the following coverage plot.
If this tool is useful for your work, please consider citing our manuscript.
Maestri S., Grosso V., Alfano M., Lavezzari D., Piubelli C., Bisoffi Z., Rossato M., Delledonne M., STArS (STrain-Amplicon-Seq), a targeted Nanopore sequencing workflow for SARS-CoV-2 diagnostics and genotyping, Biology Methods and Protocols, 2022;, bpac020, https://doi.org/10.1093/biomethods/bpac020