MosaicViewer_HTT

MosaicViewer_HTT is a pipeline for schematic visualization of alleles with somatic mosaicism. Due to mosaicism, long sequencing reads can not be collapsed into an accurate consensus sequence. Therefore, only repeat annotation of each single read can be performed. MosaicViewer_HTT integrates tool for performing repeat annotation of noisy long reads, performs alignment to left and right flanking regions, and generates "simplified" reads, for easier identification of alternative motifs in IGV visualization. The pipeline has only been used for HTT alleles, but its applicability can be extended with minor modification.

Getting started

Prerequisites

Miniconda3. Tested with conda 4.10.3. which conda should return the path to the executable. If you don't have Miniconda3 installed, you could download and install it with:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod 755 Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh

A fastq file containing reads from one sample.
A fasta file containing reference sequence (e.g. hg38)
Coordinates of flanking regions (e.g. regions flanking the repeat)

Installation

git clone https://github.com/MaestSi/MosaicViewer_HTT.git
cd MosaicViewer_HTT
chmod 755 *
./install.sh

A conda environment named MosaicViewer_env is created, where seqtk, minimap2, samtools, NoiseCancellingRepeatFinder, BBMap and R with package Biostrings are installed. Another conda environment named NanoFilt_env is created, where NanoFilt is installed. Then, you can open the config_MosaicViewer.sh file with a text editor and set the variables PIPELINE_DIR and MINICONDA_DIR to the value suggested by the installation step.

Usage

As a first step, open the config_MosaicViewer.sh file with a text editor and set all the variables. Depending on the reference coordinates set in the file, in-silico PCR primers and flanking regions for performing left or right alignment are extracted.

MosaicViewer.sh

Usage: ./MosaicViewer.sh

Note: the file config_MosaicViewer.sh should be in the same directory. It currently supports CAG, CGG and CAA repeat motifs.

Outputs:

$SAMPLE_NAME"_trimmed_"$SIDE".bam": bam file containing expanded reads aligned to $GENE_NAME"_masked_reference_"$SIDE".fasta"
$SAMPLE_NAME"_trimmed_simplified_"$SIDE"_final.bam": bam file containing simplified version of expanded reads aligned to $GENE_NAME"_masked_reference_"$SIDE".fasta", where the sequence of each identified repeat has been replaced with a single repeated nucleotide (CAG -> C; CGG -> GGG; CAA -> AAA; other -> N)
Other temporary files

Results visualization

For example, this is how the right alignment of trimmed reads (with or without colouring based on annotated repeats) would look like.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Figures		Figures
LICENSE		LICENSE
MosaicViewer.sh		MosaicViewer.sh
README.md		README.md
config_MosaicViewer.sh		config_MosaicViewer.sh
install.sh		install.sh
simplify_reads.R		simplify_reads.R
version.txt		version.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MosaicViewer_HTT

Getting started

Usage

Results visualization

About

Releases

Packages

Languages

License

MaestSi/MosaicViewer_HTT

Folders and files

Latest commit

History

Repository files navigation

MosaicViewer_HTT

Getting started

Usage

Results visualization

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages