ONToHap

ONToHap is an ONT-based pipeline for performing haplotype phasing and for evaluating haplotype phasing accuracy for amplicon data, supporting multiple aligners and phasers. It currently supports aligners BWA and Minimap2, and phasers WhatsHap, Hapchat and HapCUT2.

Getting started

Prerequisites

Miniconda3. Tested with conda 4.8.13. which conda should return the path to the executable. If you don't have Miniconda3 installed, you could download and install it with:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod 755 Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh

Installation

git clone https://github.com/MaestSi/ONToHap.git
cd ONToHap
chmod 755 *
./install.sh

A conda environment named ONToHap_env is created, where seqtk, minimap2, bwa, whatshap, hapcut2, samtools and R are installed. Then, you can open the config_ONToHap.R file with a text editor and set the variables PIPELINE_DIR and SEQTK to the value suggested by the installation step.

Accuracy test overview

Usage

The ONToHap pipeline can be used either to phase variants stored in a VCF file using ONT long reads or to evaluate the accuracy of variant phasing tools, comparing the obtained results with a ground-truth phase. In both cases, the first step of the pipeline requires you to open the config_ONToHap.R file with a text editor and to modify it according to your preferences.

Note: since the pipeline is meant to work with reads generated from amplicons, all phased variants are assumed to belong to the same haplo-block.

Launch_ONToHap.sh

Usage: Launch_ONToHap.sh -f <fastq_reads> -u <unphased_vcf> -r <reference_fasta> -o <output_dir>

Inputs:

<fastq reads>: fastq file containing ONT reads for one sample
<unphased_vcf>: VCF file storing variants to be phased
<reference_fasta>: fasta file containing the sequence corresponding to the region under study
<output_dir>: output directory where results are going to be stored

Outputs:

<sample_name>_ONToHap_results: folder containing file consensus_haplotype.vcf storing phased variants and folders with reads subsampled at each iteration and corresponding VCF files

Launch_ONToHap_accuracy_test.sh

Usage: Launch_ONToHap_accuracy_test.sh -f <fastq_reads> -u <unphased_vcf> -p <ground_truth_phased_vcf> -r <reference_fasta> -o <output_dir>

Inputs:

<fastq reads>: fastq file containing ONT reads for one sample
<unphased_vcf>: VCF file storing variants to be phased
<ground_truth_phased_vcf>: VCF file storing ground-truth phased variants, used for evaluating accuracy of ONT-based phasing
<reference_fasta>: fasta file containing the sequence corresponding to the region under study
<output_dir>: output directory where results are going to be stored

Outputs:

<sample_name>_ONToHap_results: folder containing file Report_<aligner>_<phaser>_<num_reads>_reads_<num_iterations>_iterations.txt storing phasing accuracy and folders with reads subsampled at each iteration and corresponding VCF files

Plotting results

After running Launch_ONToHap_accuracy_test.sh, you may be interested in plotting phasing accuracy results. For this purpose, you may use Plot_phasing_accuracy_tests.R script as a starting point. This is how the accuracy of reconstructing the full haplotype for different number of input reads may look like.

If you are also interested in understanding which are the most problematic variants to be phased, you may want to plot the phasing accuracy split by variant too.

Citation

If this tool is useful for your work, please consider citing our manuscript.

Maestri, S.; Maturo, M.G.; Cosentino, E.; Marcolungo, L.; Iadarola, B.; Fortunati, E.; Rossato, M.; Delledonne, M. A Long-Read Sequencing Approach for Direct Haplotype Phasing in Clinical Settings. Int. J. Mol. Sci. 2020, 21, 9177.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
Figures		Figures
Call_phase_variants_ONT.sh		Call_phase_variants_ONT.sh
LICENSE		LICENSE
Launch_ONToHap.sh		Launch_ONToHap.sh
Launch_ONToHap_accuracy_test.sh		Launch_ONToHap_accuracy_test.sh
ONToHap.R		ONToHap.R
ONToHap_accuracy_test.R		ONToHap_accuracy_test.R
Plot_phasing_accuracy_tests.R		Plot_phasing_accuracy_tests.R
README.md		README.md
combine_iterations.R		combine_iterations.R
combine_phasers.R		combine_phasers.R
config_ONToHap.R		config_ONToHap.R
evaluate_accuracy.R		evaluate_accuracy.R
evaluate_expected_accuracy.R		evaluate_expected_accuracy.R
install.sh		install.sh
phase_reads.sh		phase_reads.sh
tools.sh		tools.sh
version.txt		version.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ONToHap

Getting started

Accuracy test overview

Usage

Plotting results

Citation

About

Releases 1

Packages

Languages

License

MaestSi/ONToHap

Folders and files

Latest commit

History

Repository files navigation

ONToHap

Getting started

Accuracy test overview

Usage

Plotting results

Citation

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages