Skip to content

A pipeline for schematic IGV visualization of HTT alleles with somatic mosaicism

License

Notifications You must be signed in to change notification settings

MaestSi/MosaicViewer_HTT

Repository files navigation

MosaicViewer_HTT

MosaicViewer_HTT is a pipeline for schematic visualization of alleles with somatic mosaicism. Due to mosaicism, long sequencing reads can not be collapsed into an accurate consensus sequence. Therefore, only repeat annotation of each single read can be performed. MosaicViewer_HTT integrates tool for performing repeat annotation of noisy long reads, performs alignment to left and right flanking regions, and generates "simplified" reads, for easier identification of alternative motifs in IGV visualization. The pipeline has only been used for HTT alleles, but its applicability can be extended with minor modification.

drawing

Getting started

Prerequisites

  • Miniconda3. Tested with conda 4.10.3. which conda should return the path to the executable. If you don't have Miniconda3 installed, you could download and install it with:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod 755 Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh
  • A fastq file containing reads from one sample.
  • A fasta file containing reference sequence (e.g. hg38)
  • Coordinates of flanking regions (e.g. regions flanking the repeat)

Installation

git clone https://github.com/MaestSi/MosaicViewer_HTT.git
cd MosaicViewer_HTT
chmod 755 *
./install.sh

A conda environment named MosaicViewer_env is created, where seqtk, minimap2, samtools, NoiseCancellingRepeatFinder, BBMap and R with package Biostrings are installed. Another conda environment named NanoFilt_env is created, where NanoFilt is installed. Then, you can open the config_MosaicViewer.sh file with a text editor and set the variables PIPELINE_DIR and MINICONDA_DIR to the value suggested by the installation step.

Usage

As a first step, open the config_MosaicViewer.sh file with a text editor and set all the variables. Depending on the reference coordinates set in the file, in-silico PCR primers and flanking regions for performing left or right alignment are extracted.

drawing

MosaicViewer.sh

Usage: ./MosaicViewer.sh

Note: the file config_MosaicViewer.sh should be in the same directory. It currently supports CAG, CGG and CAA repeat motifs.

Outputs:

  • $SAMPLE_NAME"_trimmed_"$SIDE".bam": bam file containing expanded reads aligned to $GENE_NAME"_masked_reference_"$SIDE".fasta"
  • $SAMPLE_NAME"_trimmed_simplified_"$SIDE"_final.bam": bam file containing simplified version of expanded reads aligned to $GENE_NAME"_masked_reference_"$SIDE".fasta", where the sequence of each identified repeat has been replaced with a single repeated nucleotide (CAG -> C; CGG -> GGG; CAA -> AAA; other -> N)
  • Other temporary files

Results visualization

For example, this is how the right alignment of trimmed reads (with or without colouring based on annotated repeats) would look like.

drawing

About

A pipeline for schematic IGV visualization of HTT alleles with somatic mosaicism

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published