Skip to content
This repository has been archived by the owner on Feb 19, 2024. It is now read-only.
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: MaestSi/ONTrack
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v1.4.0
Choose a base ref
...
head repository: MaestSi/ONTrack
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: master
Choose a head ref

Commits on Sep 11, 2019

  1. Update MinION_mobile_lab.R

    MaestSi authored Sep 11, 2019
    Copy the full SHA
    b130e48 View commit details
  2. Update version.txt

    MaestSi authored Sep 11, 2019
    Copy the full SHA
    c7374ee View commit details

Commits on Sep 12, 2019

  1. Copy the full SHA
    b11ede3 View commit details

Commits on Sep 21, 2019

  1. Copy the full SHA
    3bbd41e View commit details
  2. Update version.txt

    MaestSi authored Sep 21, 2019
    Copy the full SHA
    24031a4 View commit details

Commits on Sep 26, 2019

  1. Update README.md

    MaestSi authored Sep 26, 2019
    Copy the full SHA
    ac7a7ef View commit details

Commits on Oct 23, 2019

  1. Update install.sh

    MaestSi authored Oct 23, 2019
    Copy the full SHA
    2fa7be4 View commit details

Commits on Dec 18, 2019

  1. Copy the full SHA
    c41fcac View commit details
  2. Copy the full SHA
    906186b View commit details
  3. Update README.md

    MaestSi authored Dec 18, 2019
    Copy the full SHA
    1247e4d View commit details
  4. Update Sanger_check.sh

    MaestSi authored Dec 18, 2019
    Copy the full SHA
    5681c71 View commit details
  5. Update Sanger_check.sh

    MaestSi authored Dec 18, 2019
    Copy the full SHA
    3656ff1 View commit details
  6. Copy the full SHA
    fe47635 View commit details
  7. Copy the full SHA
    8af48a8 View commit details
  8. Update MetatONTrack.sh

    MaestSi authored Dec 18, 2019
    Copy the full SHA
    7c68a78 View commit details
  9. Update README.md

    MaestSi authored Dec 18, 2019
    Copy the full SHA
    d0f7edb View commit details

Commits on Jan 8, 2020

  1. Update install.sh

    MaestSi authored Jan 8, 2020
    Copy the full SHA
    0af0bda View commit details

Commits on Jan 11, 2020

  1. Update README.md

    MaestSi authored Jan 11, 2020
    Copy the full SHA
    7c6f9d3 View commit details
  2. Copy the full SHA
    ae3a2be View commit details
  3. Update install.sh

    MaestSi authored Jan 11, 2020
    Copy the full SHA
    53d56b5 View commit details
  4. Copy the full SHA
    ecb2263 View commit details
  5. Update install.sh

    MaestSi authored Jan 11, 2020
    Copy the full SHA
    c2c032b View commit details
  6. Copy the full SHA
    03d2933 View commit details

Commits on Jan 23, 2020

  1. Update install.sh

    MaestSi authored Jan 23, 2020
    Copy the full SHA
    dbd42bc View commit details

Commits on Apr 5, 2020

  1. Copy the full SHA
    8e2bebd View commit details
  2. Update README.md

    MaestSi authored Apr 5, 2020
    Copy the full SHA
    bbfebed View commit details

Commits on Apr 10, 2020

  1. Minor logging bugfix

    MaestSi authored Apr 10, 2020
    Copy the full SHA
    e9ccbbf View commit details

Commits on Jul 7, 2020

  1. Update README.md

    MaestSi authored Jul 7, 2020
    Copy the full SHA
    030ce8b View commit details

Commits on Oct 21, 2020

  1. Update README.md

    MaestSi authored Oct 21, 2020
    Copy the full SHA
    00cfcd4 View commit details

Commits on Dec 9, 2020

  1. Copy the full SHA
    2626dc0 View commit details

Commits on Jan 8, 2021

  1. Update install.sh

    MaestSi authored Jan 8, 2021
    Copy the full SHA
    c7ea07c View commit details

Commits on Apr 17, 2021

  1. Create LICENSE

    MaestSi authored Apr 17, 2021
    Copy the full SHA
    f4d345c View commit details
  2. Merge pull request #5 from MaestSi/add-license-1

    Create LICENSE
    MaestSi authored Apr 17, 2021
    Copy the full SHA
    453685a View commit details

Commits on Apr 28, 2021

  1. Update ONTrack.R

    MaestSi authored Apr 28, 2021
    Copy the full SHA
    26c538d View commit details

Commits on May 9, 2021

  1. Update README.md

    MaestSi authored May 9, 2021
    Copy the full SHA
    8e9a6fa View commit details
  2. Add files via upload

    MaestSi authored May 9, 2021
    Copy the full SHA
    c0ebfc8 View commit details

Commits on Jun 6, 2021

  1. Update decONT.sh

    MaestSi authored Jun 6, 2021
    Copy the full SHA
    7a4b826 View commit details

Commits on Jun 12, 2021

  1. Update MetatONTrack.sh

    MaestSi authored Jun 12, 2021
    Copy the full SHA
    ca0b604 View commit details
  2. Update README.md

    MaestSi authored Jun 12, 2021
    Copy the full SHA
    98c1f2c View commit details

Commits on Jun 26, 2021

  1. Add files via upload

    MaestSi authored Jun 26, 2021
    Copy the full SHA
    2cae4e7 View commit details

Commits on Sep 22, 2021

  1. Update README.md

    MaestSi authored Sep 22, 2021
    Copy the full SHA
    211d430 View commit details

Commits on Dec 9, 2021

  1. Update install.sh

    MaestSi authored Dec 9, 2021
    Copy the full SHA
    dff58a8 View commit details
  2. Update install.sh

    MaestSi authored Dec 9, 2021
    Copy the full SHA
    2b18784 View commit details

Commits on Mar 25, 2022

  1. Added info for docker image

    MaestSi authored Mar 25, 2022
    Copy the full SHA
    8d1dba7 View commit details
  2. Copy the full SHA
    24118cb View commit details

Commits on Mar 29, 2022

  1. Add Dockerfile

    MaestSi authored Mar 29, 2022
    Copy the full SHA
    880b727 View commit details
  2. Delete Dockerfile

    MaestSi authored Mar 29, 2022
    Copy the full SHA
    153526b View commit details
  3. Add Dockerfile

    MaestSi authored Mar 29, 2022
    Copy the full SHA
    b6c4350 View commit details

Commits on Apr 4, 2022

  1. Update README.md

    MaestSi authored Apr 4, 2022
    Copy the full SHA
    7adbbcd View commit details

Commits on Apr 27, 2022

  1. Copy the full SHA
    6a48d3b View commit details
Showing with 808 additions and 87 deletions.
  1. +2 −2 Calculate_error_rate.sh
  2. +2 −2 Calculate_mapping_rate.sh
  3. +41 −0 Dockerfile
  4. BIN Figures/ONTrack_pipeline_flowchart.png
  5. +674 −0 LICENSE
  6. +5 −5 MetatONTrack.sh
  7. +31 −27 MinION_mobile_lab.R
  8. +9 −3 ONTrack.R
  9. +27 −12 README.md
  10. +1 −1 Sanger_check.sh
  11. +5 −8 config_MinION_mobile_lab.R
  12. +3 −21 decONT.sh
  13. +7 −5 install.sh
  14. +1 −1 version.txt
4 changes: 2 additions & 2 deletions Calculate_error_rate.sh
Original file line number Diff line number Diff line change
@@ -19,8 +19,8 @@
READS=$1
REFERENCE=$2

MINIMAP2=/path/to/minimap2
SAMTOOLS=/path/to/samtools
MINIMAP2=minimap2 #specify full path if you want to use a version of the program that is not in your PATH
SAMTOOLS=samtools #specify full path if you want to use a version of the program that is not in your PATH

SAMPLE_NAME=$(echo $(basename $READS) | sed 's/\.fast.//g')
WORKING_DIR=$(dirname $(realpath $READS))
4 changes: 2 additions & 2 deletions Calculate_mapping_rate.sh
Original file line number Diff line number Diff line change
@@ -20,8 +20,8 @@ READS=$1
DRAFT_READS=$2
CONTIG=$3

MINIMAP2=/path/to/minimap2
SAMTOOLS=/path/to/samtools
MINIMAP2=minimap2 #specify full path if you want to use a version of the program that is not in your PATH
SAMTOOLS=samtools #specify full path if you want to use a version of the program that is not in your PATH

wdir=$(realpath $(pwd))
reads_full=$wdir"/"$READS
41 changes: 41 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
FROM continuumio/miniconda3

########### set variables
ENV DEBIAN_FRONTEND noninteractive

########## generate working directories
RUN mkdir /home/tools

######### dependencies
RUN apt-get update -qq \
&& apt-get install -y \
build-essential \
wget \
unzip \
bzip2 \
git \
libidn11* \
nano \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

############################################################ install ONTrack
WORKDIR /home/tools/

RUN git clone https://github.com/MaestSi/ONTrack.git
WORKDIR /home/tools/ONTrack
RUN chmod 755 *

RUN sed -i 's/PIPELINE_DIR <- .*/PIPELINE_DIR <- \"\/home\/tools\/ONTrack\/\"/' config_MinION_mobile_lab.R
RUN sed -i 's/MINICONDA_DIR <- .*/MINICONDA_DIR <- \"\/opt\/conda\/\"/' config_MinION_mobile_lab.R

RUN conda config --add channels bioconda && \
conda config --add channels anaconda && \
conda config --add channels r && \
conda config --add channels conda-forge
RUN conda create -n ONTrack_env -c bioconda bioconductor-biostrings
RUN conda install -n ONTrack_env python blast emboss vsearch seqtk mafft minimap2 samtools=1.15 nanopolish bedtools ncurses ont_vbz_hdf_plugin
ENV HDF5_PLUGIN_PATH=/opt/conda/envs/ONTrack_env/hdf5/lib/plugin
RUN /opt/conda/envs/ONTrack_env/bin/pip install pycoQC

WORKDIR /home/
Binary file modified Figures/ONTrack_pipeline_flowchart.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
674 changes: 674 additions & 0 deletions LICENSE

Large diffs are not rendered by default.

10 changes: 5 additions & 5 deletions MetatONTrack.sh
Original file line number Diff line number Diff line change
@@ -29,8 +29,8 @@ FASTQ_READS_FULL=$(realpath $FASTQ_READS)
######################################################################
THREADS=8
DB=/path/to/NCBI_Blast-indexed_database #e.g. PRJNA33175 BioProject for Bacterial 16S
BLASTN=/path/to/blastn
SEQTK=/path/to/seqtk
BLASTN=blastn
SEQTK=seqtk
######################################################################

SAMPLE_NAME=$(echo $(basename $FASTQ_READS_FULL) | sed 's/\.fastq.*//')
@@ -39,13 +39,13 @@ OUTPUT_DIR=$WORKING_DIR"/MetatONTrack_output"
LOGS_DIR=$WORKING_DIR"/MetatONTrack_output_logs"

$SEQTK seq -A $FASTQ_READS_FULL > $WORKING_DIR"/"$SAMPLE_NAME".fasta"
$BLASTN -db $DB -query $WORKING_DIR"/"$SAMPLE_NAME".fasta" -num_threads $THREADS -outfmt "6 qseqid evalue salltitles" -max_target_seqs 1 -perc_identity 0.77 -qcov_hsp_perc 0.3 \
$BLASTN -db $DB -query $WORKING_DIR"/"$SAMPLE_NAME".fasta" -num_threads $THREADS -outfmt "6 qseqid evalue salltitles" -max_target_seqs 1 -perc_identity 0.85 -qcov_hsp_perc 0.8 \
> $WORKING_DIR"/"$SAMPLE_NAME"_top_Blast_hits_tmp.txt"
cat $WORKING_DIR"/"$SAMPLE_NAME"_top_Blast_hits_tmp.txt" | sort -u -k1,1 -s > $WORKING_DIR"/"$SAMPLE_NAME"_top_Blast_hits.txt"
rm $WORKING_DIR"/"$SAMPLE_NAME"_top_Blast_hits_tmp.txt"
cat $WORKING_DIR"/"$SAMPLE_NAME"_top_Blast_hits.txt" | cut -f3 | sort | uniq -c | sort -nr > $WORKING_DIR"/"$SAMPLE_NAME"_Blast_taxa_counts.txt"
cat $WORKING_DIR"/"$SAMPLE_NAME"_top_Blast_hits.txt" | cut -f3 | cut -d" " -f2,3 | sort | uniq -c | sort -nr > $WORKING_DIR"/"$SAMPLE_NAME"_Blast_species_counts.txt"
cat $WORKING_DIR"/"$SAMPLE_NAME"_top_Blast_hits.txt" | cut -f3 | cut -d" " -f2 | sort | uniq -c | sort -nr > $WORKING_DIR"/"$SAMPLE_NAME"_Blast_genera_counts.txt"
cat $WORKING_DIR"/"$SAMPLE_NAME"_top_Blast_hits.txt" | cut -f3 | cut -d" " -f1,2 | sort | uniq -c | sort -nr > $WORKING_DIR"/"$SAMPLE_NAME"_Blast_species_counts.txt"
cat $WORKING_DIR"/"$SAMPLE_NAME"_top_Blast_hits.txt" | cut -f3 | cut -d" " -f1 | sort | uniq -c | sort -nr > $WORKING_DIR"/"$SAMPLE_NAME"_Blast_genera_counts.txt"
cat $WORKING_DIR"/"$SAMPLE_NAME"_Blast_species_counts.txt" | awk -v var="$MIN_READS" '{if ($1 > var) {print $2" "$3}}' > $WORKING_DIR"/"$SAMPLE_NAME"_detected_species.txt"

mkdir $OUTPUT_DIR
58 changes: 31 additions & 27 deletions MinION_mobile_lab.R
Original file line number Diff line number Diff line change
@@ -39,6 +39,10 @@ if (!exists("fast_basecalling_flag") || flowcell == "FLO-MIN107") {
fast_basecalling_flag <- 0
}

if (!exists("require_two_barcodes_flag")) {
require_two_barcodes_flag <- 0
}

if (!exists("amplicon_length")) {
amplicon_length <- 710
}
@@ -55,10 +59,6 @@ if (!exists("pair_strands_flag") || flowcell != "FLO-MIN107") {
pair_strands_flag <- 0
}

if (!exists("disable_porechop_demu_flag")) {
disable_porechop_demu_flag <- 0
}

if (do_subsampling_flag == 1) {
#d2 is the directory which is going to include processed reads
d2 <- paste0(dirname(d1_tmp), "/", basename(d1_tmp), "_", num_fast5_files, "_subsampled_fast5_files_analysis")
@@ -88,7 +88,7 @@ if (pair_strands_flag == 1) {

demultiplexer <- paste0(BASECALLER_DIR, "/guppy_barcoder")

basecaller_version <- system(paste0(basecaller, " --version"), intern = TRUE)
basecaller_version <- system(paste0(basecaller, " --version"), intern = TRUE)[1]

if (!dir.exists(d2)) {
dir.create(d2)
@@ -138,13 +138,12 @@ if (!dir.exists(d2)) {
cat(text = "Basecalling model: high-accuracy", file = logfile, sep = "\n", append = TRUE)
cat(text = "Basecalling model: high-accuracy", sep = "\n")
}
cat(text = paste0("Demultiplexing is going to be performed by guppy_barcoder after basecalling"), file = logfile, sep = ", ", append = TRUE)
cat(text = paste0("Demultiplexing is going to be performed by guppy_barcoder after basecalling"), sep = ", ")
if (disable_porechop_demu_flag == 1) {
cat(text = "\n", file = logfile, append = TRUE)
cat(text = "\n")
cat(text = paste0("The second round of demultiplexing by Porechop is going to be be skipped"), file = logfile, sep = ", ", append = TRUE)
cat(text = paste0("The second round of demultiplexing by Porechop is going to be be skipped"), sep = ", ")
if (require_two_barcodes_flag == 1) {
cat(text = paste0("Demultiplexing is going to be performed by guppy_barcoder after basecalling, keeping only reads with barcodes at both ends of the read"), file = logfile, sep = ", ", append = TRUE)
cat(text = paste0("Demultiplexing is going to be performed by guppy_barcoder after basecalling, keeping only reads with barcodes at both ends of the read"), sep = ", ")
} else {
cat(text = paste0("Demultiplexing is going to be performed by guppy_barcoder after basecalling"), file = logfile, sep = ", ", append = TRUE)
cat(text = paste0("Demultiplexing is going to be performed by guppy_barcoder after basecalling"), sep = ", ")
}
cat(text = "\n", file = logfile, append = TRUE)
cat(text = "\n")
@@ -167,9 +166,9 @@ cat(text = paste0("Basecalling started at ", date()), sep = "\n")

num_threads_caller <- round(num_threads/4)
if (fast_basecalling_flag == 1) {
system(paste0(basecaller, " -r -i ", d1, " --cpu_threads_per_caller ", num_threads_caller, " --num_callers 4", " -c dna_r9.4.1_450bps_fast.cfg --hp_correct TRUE --fast5_out -s ", d2_basecalling, " --disable_pings"))
system(paste0(basecaller, " -r -i ", d1, " --cpu_threads_per_caller ", num_threads_caller, " --num_callers 4", " -c dna_r9.4.1_450bps_fast.cfg --fast5_out -s ", d2_basecalling, " --disable_pings"))
} else {
system(paste0(basecaller, " -r -i ", d1, " --cpu_threads_per_caller ", num_threads_caller, " --num_callers 4", " --flowcell ", flowcell, " --kit ", kit, " --hp_correct TRUE --fast5_out -s ", d2_basecalling, " --disable_pings"))
system(paste0(basecaller, " -r -i ", d1, " --cpu_threads_per_caller ", num_threads_caller, " --num_callers 4", " --flowcell ", flowcell, " --kit ", kit, " --fast5_out -s ", d2_basecalling, " --disable_pings"))
}

if (pair_strands_flag == 1) {
@@ -184,7 +183,11 @@ cat(text = "\n")

cat(text = paste0("Demultiplexing started at ", date()), file = logfile, sep = "\n", append = TRUE)
cat(text = paste0("Demultiplexing started at ", date()), sep = "\n")
system(paste0(demultiplexer, " -r -i ", d2_basecalling, " -t ", num_threads, " -s ", d2_preprocessing, " --barcode_kits \"", paste0(barcode_kits, collapse = " "), "\"", " --kit ", kit))
if (require_two_barcodes_flag == 1) {
system(paste0(demultiplexer, " -r -i ", d2_basecalling, " -t ", num_threads, " -s ", d2_preprocessing, " --enable_trim_barcodes --require_barcodes_both_ends --barcode_kits \"", paste0(barcode_kits, collapse = " "), "\""))
} else {
system(paste0(demultiplexer, " -r -i ", d2_basecalling, " -t ", num_threads, " -s ", d2_preprocessing, " --enable_trim_barcodes --barcode_kits \"", paste0(barcode_kits, collapse = " "), "\""))
}
cat(text = paste0("Demultiplexing finished at ", date()), file = logfile, sep = "\n", append = TRUE)
cat(text = paste0("Demultiplexing finished at ", date()), sep = "\n")
cat(text = "\n", file = logfile, append = TRUE)
@@ -222,25 +225,17 @@ if (pair_strands_flag == 1) {
system(paste0(PYCOQC, " -f ", d2_basecalling, "/sequencing_summary.txt -b ", d2_preprocessing, "/barcoding_summary.txt -o ", d2, "/qc/pycoQC_report.html"))
}
demu_files <- list.files(path = d2_preprocessing, pattern = "BC", full.names = TRUE)

for (i in 1:length(demu_files)) {
BC_val_curr <- substr(x = basename(demu_files[i]), start = 3, stop = 4)
if (paste0("BC", BC_val_curr) %in% BC_int) {
cat(text = paste0("Now trimming adapters with Porechop for sample BC", BC_val_curr), file = logfile, sep = "\n", append = TRUE)
cat(text = paste0("Now trimming adapters with Porechop for sample BC", BC_val_curr), sep = "\n")
if (disable_porechop_demu_flag == 1) {
system(paste0("mkdir ", d2_preprocessing, "/BC", BC_val_curr, "_porechop_dir_tmp"))
system(paste0(PORECHOP, " -i ", d2_preprocessing, "/BC", BC_val_curr, "_tmp1.fastq -o ", d2_preprocessing, "/BC", BC_val_curr, "_porechop_dir_tmp/BC", BC_val_curr, ".fastq --extra_end_trim ", primers_length))
} else {
system(paste0(PORECHOP, " -i ", d2_preprocessing, "/BC", BC_val_curr, "_tmp1.fastq -b ", d2_preprocessing, "/BC", BC_val_curr, "_porechop_dir_tmp --require_two_barcodes"))
}
fastq_file_curr <- list.files(path = paste0(d2_preprocessing, "/BC", BC_val_curr, "_porechop_dir_tmp"), pattern = paste0("BC", BC_val_curr, "\\.fastq"), full.names = TRUE)
fastq_file_curr <- list.files(path = paste0(d2_preprocessing), pattern = paste0("BC", BC_val_curr, "_tmp1\\.fastq"), full.names = TRUE)
if (length(fastq_file_curr) == 0) {
BC_int <- setdiff(BC_int, paste0("BC", BC_val_curr))
BC_trash <- c(BC_trash, paste0("BC", BC_val_curr))
next
}
system(paste0("cp ", d2_preprocessing, "/BC", BC_val_curr, "_porechop_dir_tmp/BC", BC_val_curr, ".fastq ", d2_preprocessing, "/BC", BC_val_curr, "_tmp2.fastq"))
system(paste0(SEQTK, " seq -A ", d2_preprocessing, "/BC", BC_val_curr, "_tmp2.fastq > ", d2_preprocessing, "/BC", BC_val_curr, "_tmp1.fasta"))
system(paste0(SEQTK, " seq -A ", d2_preprocessing, "/BC", BC_val_curr, "_tmp1.fastq > ", d2_preprocessing, "/BC", BC_val_curr, "_tmp1.fasta"))
sequences <- readDNAStringSet(paste0(d2_preprocessing, "/BC", BC_val_curr, "_tmp1.fasta"), "fasta")
ws <- width(sequences)
read_length <- ws
@@ -257,8 +252,17 @@ for (i in 1:length(demu_files)) {
read_length_ok <- ws_ok
cat(text = paste0("Now filtering out reads shorter than ", sprintf("%.0f", lb), " and longer than ", sprintf("%.0f", ub), " bp for sample BC", BC_val_curr), file = logfile, sep = "\n", append = TRUE)
cat(text = paste0("Now filtering out reads shorter than ", sprintf("%.0f", lb), " and longer than ", sprintf("%.0f", ub), " bp for sample BC", BC_val_curr), sep = "\n")
system(paste0("cat ", d2_preprocessing, "/BC", BC_val_curr, "_tmp2.fastq | ", remove_long_short, " ", lb, " ", ub, " > ", d3, "/BC", BC_val_curr, ".fastq"))
system(paste0("cat ", d2_preprocessing, "/BC", BC_val_curr, "_tmp1.fastq | ", remove_long_short, " ", lb, " ", ub, " > ", d3, "/BC", BC_val_curr, ".fastq"))
system(paste0(SEQTK, " seq -A ", d3, "/BC", BC_val_curr, ".fastq > ", d3, "/BC", BC_val_curr, ".fasta"))
if (length(grep(x = readLines(paste0( d3, "/BC", BC_val_curr, ".fasta")), pattern = "^>")) < 2) {
cat(text = paste0("WARNING: skipping sample BC", BC_val_curr, ", since no reads survived the length filtering!"), file = logfile, sep = "\n", append = TRUE)
cat(text = paste0("WARNING: skipping sample BC", BC_val_curr, ", since no reads survived the length filtering!"), sep = "\n")
cat(text = "\n", file = logfile, append = TRUE)
cat(text = "\n")
system(paste0("rm ", d3, "/BC", BC_val_curr, ".fastq"))
system(paste0("rm ", d3, "/BC", BC_val_curr, ".fasta"))
next
}
cat(text = paste0("Mean read length for sample BC", BC_val_curr, " after filtering: ", sprintf("%.0f", mean(ws_ok)), " (", sprintf("%.0f", sd(ws_ok)), ")"), file = logfile, sep = "\n", append = TRUE)
cat(text = paste0("Mean read length for sample BC", BC_val_curr, " after filtering: ", sprintf("%.0f", mean(ws_ok)), " (", sprintf("%.0f", sd(ws_ok)), ")"), sep = "\n")
png(paste0(d2, "/qc/hist_BC", BC_val_curr, "_unfiltered.png"))
12 changes: 9 additions & 3 deletions ONTrack.R
Original file line number Diff line number Diff line change
@@ -123,7 +123,13 @@ for (i in 1:length(fasta_files)) {
num_reads_mac <- as.double(system(paste0("cat ", decont_fasta, " | grep \"^>\" | wc -l"), intern=TRUE))
target_reads_contig <- 200
target_reads_polishing <- 200


if (num_reads_mac < 3) {
cat(text = paste0("WARNING: Only ", num_reads_mac, " reads available for sample ", sample_name, "; skipping"), sep = "\n")
cat(text = paste0("WARNING: Only ", num_reads_mac, " reads available for sample ", sample_name, "; skipping"), file = logfile, sep = "\n", append = TRUE)
next
}

if (num_reads_mac < target_reads_contig) {
target_reads_contig <- num_reads_mac
target_reads_polishing <- num_reads_mac
@@ -248,13 +254,13 @@ for (i in 1:length(fasta_files)) {
if (do_blast_flag ==1 ) {
cat(text = paste0("BLASTing consensus for sample ", sample_name, " and saving results to ", basename(blast_results)), sep = "\n")
if (ONtoBAR_compatibility != 1) {
system(paste0(BLASTN, " -db ", NTDB, " -query ", final_contig, " > ", blast_results))
system(paste0(BLASTN, " -num_threads ", num_threads, " -db ", NTDB, " -query ", final_contig, " > ", blast_results))
} else {
blast_results_alt_sort <- paste0(sample_dir, "/", sample_name, "_sorted_by_perc_id.blastn.txt")
blast_results_raw <- paste0(sample_dir, "/", sample_name, ".blastn_output.txt")
system(paste0(BLASTN, " -db ", NTDB, " -query ", final_contig, " > ", blast_results_raw))
cat(text = paste0("BLASTing consensus for sample ", sample_name, " and saving results to ", basename(blast_results)), sep = "\n")
system(paste0(BLASTN, " -db ", NTDB, " -query ", final_contig, " > ", blast_results_raw))
system(paste0(BLASTN, " -num_threads ", num_threads, " -db ", NTDB, " -query ", final_contig, " > ", blast_results_raw))
input_file <- file(blast_results_raw, open = "r")
input <- readLines(input_file)
close(input_file)
39 changes: 27 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
*ONTrack is no longer mantained. Please use [ONTrack2](https://github.com/MaestSi/ONTrack2) pipeline instead.*

# ONTrack

@@ -23,28 +24,31 @@ chmod 755 Miniconda3-latest-Linux-x86_64.sh

Then, after completing _ONTrack_ installation, set the _MINICONDA_DIR_ variable in **config_MinION_mobile_lab.R** to the full path to miniconda3 directory.

* Guppy, the software for basecalling and demultiplexing provided by ONT. Tested with Guppy v2.3 (ONTrack-v1.0), Guppy v3.0 (ONTrack-v1.1) and Guppy v3.1.
* Guppy, the software for basecalling and demultiplexing provided by ONT. Tested with Guppy v5.0.
If you don't have [Guppy](https://community.nanoporetech.com/downloads) installed, choose an appropriate version and install it.
For example, you could download and unpack the archive with:
```
wget https://mirror.oxfordnanoportal.com/software/analysis/ont-guppy-cpu_version_of_interest.tar.gz
wget /path/to/ont-guppy-cpu_version_of_interest.tar.gz
tar -xf ont-guppy-cpu_version_of_interest.tar.gz
```
A directory _ont-guppy-cpu_ should have been created in your current directory.
Then, after completing _ONTrack_ installation, set the _BASECALLER_DIR_ variable in **config_MinION_mobile_lab.R** to the full path to _ont-guppy-cpu/bin_ directory.

* NCBI nt database (optional, in case you want to perform a local Blast analysis of your consensus sequences).

For downloading the database (~65 GB):
For downloading the database (~210 GB):

```
mkdir NCBI_nt_db
cd NCBI_nt_db
echo `date +%Y-%m-%d` > download_date.txt
wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/nt*
targz_files=$(find . | grep \.tar\.gz$ | sed 's/\.\///g')
for f in $targz_files; do tar -xzvf $f; done
rm $targz_files
targz_files=$(find . | grep "\\.tar\\.gz$")
for f in $targz_files; do
tar -xzvf $f;
rm $f;
rm $f".md5";
done
```

Then, after completing the _ONTrack_ installation, set the _NTDB_ variable in **config_MinION_mobile_lab.R** to the full path to NCBI_nt_db/nt
@@ -58,7 +62,13 @@ chmod 755 *
./install.sh
```

A conda environment named _ONTrack_env_ is created, where blast, emboss, vsearch, seqtk, mafft, porechop, minimap2, samtools, nanopolish, bedtools, pycoQC and R with package Biostrings are installed.
Otherwise, you can download a docker image with:

```
docker pull maestsi/ontrack:latest
```

A conda environment named _ONTrack_env_ is created, where blast, emboss, vsearch, seqtk, mafft, minimap2, samtools, nanopolish, bedtools, pycoQC and R with package Biostrings are installed.
Then, you can open the **config_MinION_mobile_lab.R** file with a text editor and set the variables _PIPELINE_DIR_ and _MINICONDA_DIR_ to the value suggested by the installation step.

## Overview
@@ -81,7 +91,7 @@ Usage: Rscript ONTrack.R \<home_dir\> \<fast5_dir\> \<sequencing_summary.txt\>
Note: Activate the virtual environment with ```source activate ONTrack_env``` before running. The script is run by **MinION_mobile_lab.R**, but can be also run as a main script if you have already basecalled and demultiplexed your sequences. If less than 200 reads are available after contaminants removal, a warning message is printed out, but still a consensus sequence is produced.

Inputs:
* \<home_dir\>: directory containing fastq and fasta files for each sample
* \<home_dir\>: directory containing fastq and fasta files for each sample named BC\<numbers\>.fast*
* \<fast5_dir\>: directory containing raw fast5 files for nanopolish polishing, optional
* \<sequencing_summary.txt\>: sequencing summary file generated during base-calling, used to speed-up polishing, optional

@@ -144,7 +154,7 @@ Note: script run by _ONTrack.R_ for clustering reads at 70% identity and keeping

Usage: Sanger_check.sh \<consensus dir\> \<sanger dir\>

Note: set _BLASTN_ variable to blastn executable inside the script; sample name should contain the sample id (e.g. BC01)
Note: Activate the virtual environment with ```source activate ONTrack_env``` before running; sample name should contain the sample id (e.g. BC01)

Inputs:
* \<consensus dir\>: directory containing files "sample_name".contigs.fasta obtained with the _ONTrack_ pipeline
@@ -158,7 +168,7 @@ Output (saved in \<contigs dir\>):

Usage: Calculate_mapping_rate.sh \<reads\> \<draft reads\> \<consensus sequence\>

Note: set _MINIMAP2_ and _SAMTOOLS_ variables to minimap2 and samtools executables inside the script
Note: Activate the virtual environment with ```source activate ONTrack_env``` before running.

Inputs:
* \<reads\>: MinION reads in fastq or fasta format
@@ -172,7 +182,7 @@ Output (saved in current directory):

Usage: Calculate_error_rate.sh \<reads\> \<reference\>

Note: set _MINIMAP2_ and _SAMTOOLS_ variables to minimap2 and samtools executables inside the script
Note: Activate the virtual environment with ```source activate ONTrack_env``` before running.

Inputs:
* \<reads\>: MinION reads in fastq or fasta format
@@ -215,7 +225,8 @@ The **MetatONTrack.sh** script reproduces what the EPI2ME 16S workflow does, bla

Usage: MetatONTrack.sh \<fastq reads\> \<min num reads\>

Note: set _BLASTN_, _SEQTK_ and _DB_ variables to blastn, seqtk executables and to an NCBI Blast-indexed database respectively inside the script; if using NCBI nt database, change _cut -d" " -f2,3_ (line 47) and _cut -d" " -f2_ (line 48) to _cut -d" " -f1,2_ and _cut -d" " -f1_ respectively
Note: Activate the virtual environment with ```source activate ONTrack_env``` before running.
Set _DB_ variable to an NCBI Blast-indexed database inside the script.

Inputs:
* \<fastq reads\>: MinION fastq reads from a meta-barcoding experiment
@@ -231,6 +242,10 @@ If this tool is useful for your work, please consider citing our [manuscript](ht

Maestri S, Cosentino E, Paterno M, Freitag H, Garces JM, Marcolungo L, Alfano M, Njunjić I, Schilthuizen M, Slik F, Menegon M, Rossato M, Delledonne M. A Rapid and Accurate MinION-Based Workflow for Tracking Species Biodiversity in the Field. Genes. 2019; 10(6):468.

For further information and insights into pipeline development, please have a look at my [doctoral thesis](https://iris.univr.it/retrieve/handle/11562/1042782/205364/PhD_thesis_Simone_Maestri.pdf).

Maestri, S (2021). Development of novel bioinformatic pipelines for MinION-based DNA barcoding (Doctoral thesis, Università degli Studi di Verona, Verona, Italy). Retrieved from https://iris.univr.it/retrieve/handle/11562/1042782/205364/.

## Side notes

As a real-life _Pokédex_, the workflow described in our [manuscript](https://www.mdpi.com/2073-4425/10/6/468) will facilitate tracking biodiversity in remote and biodiversity-rich areas. For instance, during a [Taxon Expedition](https://taxonexpeditions.com/) to Borneo, our analysis confirmed the novelty of a [beetle](https://www.theguardian.com/science/2018/apr/30/new-beetle-species-named-after-leonardo-dicaprio) species named after Leonardo DiCaprio.
2 changes: 1 addition & 1 deletion Sanger_check.sh
Original file line number Diff line number Diff line change
@@ -21,7 +21,7 @@
CONTIGS_DIR=$1
SANGER_DIR=$2

BLASTN=/path/to/blastn
BLASTN=blastn #specify full path if you want to use a version of the program that is not in your PATH

contigs_files=$(find $CONTIGS_DIR -maxdepth 1 | grep "\\.contigs\\.fasta")
sanger_files=$(find $SANGER_DIR -maxdepth 1 | grep "reference.*\\.fasta")
13 changes: 5 additions & 8 deletions config_MinION_mobile_lab.R
Original file line number Diff line number Diff line change
@@ -21,7 +21,7 @@
#if do_subsampling_flag <- 1, subsampling of num_fast5_files fast5 files is performed; otherwise set do_subsampling_flag <- 0
do_subsampling_flag <- 0
#num_fast5_files is the number of fast5 files to be subsampled/analysed (if do_subsampling_flag <- 1)
num_fast5_files <- 25
num_fast5_files <- 50000
#BC_int are the barcodes used in the experiment
#BC_int <- c("BC01", "BC02", "BC03", "BC04", "BC05", "BC06", "BC07", "BC08", "BC09", "BC10", "BC11", "BC12")
BC_int <- c("BC01", "BC02", "BC03", "BC04", "BC05", "BC06", "BC07", "BC08", "BC09", "BC10", "BC11", "BC12")
@@ -34,22 +34,22 @@ kit <- "SQK-LSK109"
flowcell <- "FLO-MIN106"
#fast_basecalling_flag <- 1 if you want to use the fast basecalling algorithm; otherwise set fast_basecalling_flag <- 0 if you want to use the accurate but slow one (FLO-MIN106 only)
fast_basecalling_flag <- 1
#require_two_barcodes_flag <- 1 if you want to keep only reads with a barcode (tag) at both ends of the read; otherwise set require_two_barcodes_flag <- 0
require_two_barcodes_flag <- 0
#pair_strands_flag <- 1 if, in case a 1d2 kit and FLO-MIN107 flow-cell have been used, you want to perform 1d2 basecalling; otherwise set pair_strands_flag <- 0
pair_strands_flag <- 0
#save_space_flag <- 1 if you want temporary files to be automatically deleted; otherwise set save_space_flag <- 0
save_space_flag <- 0
#set the maximum number of threads to be used
num_threads <- 30
#set a mean amplicon length [bp]
amplicon_length <- 700
amplicon_length <- 710
#fixed_lenfil_flag <- 1 if you want to keep reads in the range [amplicon_length - lenfil_tol/2; amplicon_length + lenfil_tol/2]; otherwise set fixed_lenfil_flag <- 1 if you want to keep reads in the range [mean_length -2*sd; mean_length + 2*sd] where mean_length and sd are evaluated on a sample basis
fixed_lenfil_flag <- 0
#if fixed_lenfil_flag <- 1, lenfil_tol [bp] is the size of the window centered in amplicon_length for reads to be kept
lenfil_tol <- 300
#set primers length [bp]
primers_length <- 25
#if disable_porechop_demu_flag <- 1 porechop is only used for adapters trimming and not for doing a second round of demultiplexing; otherwise set disable_porechop_demu_flag <- 0
disable_porechop_demu_flag <- 0
#do_blast_flag <- 1 if you want to perform blast analysis of consensus sequences; otherwise set do_blast_flag <- 0
do_blast_flag <- 1
#do_clustering_flag <- 1 if you want to perform preliminary clustering for getting rid of contaminants; otherwise set do_clustering_flag <- 0
@@ -66,7 +66,7 @@ MINICONDA_DIR <- "/path/to/miniconda3"
BASECALLER_DIR <- "/path/to/ont-guppy-cpu/bin/"
#NCBI nt database
NTDB <- "/path/to/NCBI_nt_db/nt"
########################################################################################################
############ End of user editable region ###############################################################
#load BioStrings package
suppressMessages(library(Biostrings))
#path to ONTrack.R
@@ -77,7 +77,6 @@ DECONT <- paste0(PIPELINE_DIR, "/decONT.sh")
remove_long_short <- paste0(PIPELINE_DIR, "/remove_long_short.pl")
#path to subsample fast5
subsample_fast5 <- paste0(PIPELINE_DIR, "/subsample_fast5.sh")
#########################################################################################################
#MAFFT
MAFFT <- paste0(MINICONDA_DIR, "/envs/ONTrack_env/bin/mafft")
#VSEARCH
@@ -94,7 +93,5 @@ SEQTK <- paste0(MINICONDA_DIR, "/envs/ONTrack_env/bin/seqtk")
MINIMAP2 <- paste0(MINICONDA_DIR, "/envs/ONTrack_env/bin/minimap2")
#SAMTOOLS
SAMTOOLS <- paste0(MINICONDA_DIR, "/envs/ONTrack_env/bin/samtools")
#PORECHOP
PORECHOP <- paste0(MINICONDA_DIR, "/envs/ONTrack_env/bin/porechop")
#PYCOQC
PYCOQC <- paste0(MINICONDA_DIR, "/envs/ONTrack_env/bin/pycoQC")
24 changes: 3 additions & 21 deletions decONT.sh
Original file line number Diff line number Diff line change
@@ -48,24 +48,6 @@ mkdir $tmp_dir
mv $wdir"/"$sample_id"_ids_mac.txt" $tmp_dir
mv $wdir"/consensus_"$sample_id".fasta" $tmp_dir

clusters_0=$(find $wdir -maxdepth 1 -mindepth 1 | grep $prefix_bn"0")
clusters_1=$(find $wdir -maxdepth 1 -mindepth 1 | grep $prefix_bn"1")
clusters_2=$(find $wdir -maxdepth 1 -mindepth 1 | grep $prefix_bn"2")
clusters_3=$(find $wdir -maxdepth 1 -mindepth 1 | grep $prefix_bn"3")
clusters_4=$(find $wdir -maxdepth 1 -mindepth 1 | grep $prefix_bn"4")
clusters_5=$(find $wdir -maxdepth 1 -mindepth 1 | grep $prefix_bn"5")
clusters_6=$(find $wdir -maxdepth 1 -mindepth 1 | grep $prefix_bn"6")
clusters_7=$(find $wdir -maxdepth 1 -mindepth 1 | grep $prefix_bn"7")
clusters_8=$(find $wdir -maxdepth 1 -mindepth 1 | grep $prefix_bn"8")
clusters_9=$(find $wdir -maxdepth 1 -mindepth 1 | grep $prefix_bn"9")

mv $clusters_0 $tmp_dir
mv $clusters_1 $tmp_dir
mv $clusters_2 $tmp_dir
mv $clusters_3 $tmp_dir
mv $clusters_4 $tmp_dir
mv $clusters_5 $tmp_dir
mv $clusters_6 $tmp_dir
mv $clusters_7 $tmp_dir
mv $clusters_8 $tmp_dir
mv $clusters_9 $tmp_dir
for cluster in $(find $wdir -maxdepth 1 -mindepth 1 | grep $prefix_bn); do
mv $cluster $tmp_dir;
done
12 changes: 7 additions & 5 deletions install.sh
Original file line number Diff line number Diff line change
@@ -21,13 +21,15 @@
PIPELINE_DIR=$(realpath $( dirname "${BASH_SOURCE[0]}" ))
MINICONDA_DIR=$(which conda | sed 's/bin.*$//')
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --add channels anaconda
conda config --add channels r
conda create -n ONTrack_env python=3.6 blast emboss vsearch seqtk mafft porechop minimap2 samtools nanopolish bedtools r bioconductor-biostrings
conda config --add channels conda-forge
conda create -n ONTrack_env -c bioconda bioconductor-biostrings
conda install -n ONTrack_env python blast emboss vsearch seqtk mafft minimap2 samtools=1.15 nanopolish bedtools ncurses
source activate ONTrack_env
pip install pycoQC
echo -e "\n"
echo "Modify variables PIPELINE_DIR and MINICONDA_DIR in config_MinION_mobile_lab.R"
echo -e "PIPELINE_DIR: $PIPELINE_DIR"
echo -e "MINICONDA_DIR: $MINICONDA_DIR \n"

echo -e "PIPELINE_DIR <- \"$PIPELINE_DIR\""
echo -e "MINICONDA_DIR <- \"$MINICONDA_DIR\""
echo -e "\n"
2 changes: 1 addition & 1 deletion version.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
ONTrack-v1.4.0
ONTrack-v1.5.0