Skip to content

Latest commit

 

History

History
59 lines (49 loc) · 1.59 KB

README.md

File metadata and controls

59 lines (49 loc) · 1.59 KB

trim_isoseq_polyA

This is a program to trim the polyA tails of DNA sequences in a FastQ format, using HMM.

Prerequisite

  • C++11
  • Boost
  • pthread
  • Zlib
  • BZib2
    Zlib and BZlib2 are needed if you need to support gzip/bzip2 compressed input files.

Install

mkdir build && \
cd build && \
cmake ../ -DBOOST_ROOT=PATH_TO_YOUR_BOOST_ROOT_DIR -DCMAKE_BUILD_TYPE=Release && \
make
Executable `trim_isoseq_polyA` will be saved in directory trim_isoseq_polyA/bin.
Please replace PATH_TO_YOUR_BOOST_ROOT_DIR with your own boost root directory.
e.g., -DBOOST_ROOT=~/mylib/boost/boost_1_60_0/

With native support for gzip/bzip2 compressed input

mkdir build && \
cd build && \
cmake ../ -DBOOST_ROOT=PATH_TO_YOUR_BOOST_ROOT_DIR -DCMAKE_BUILD_TYPE=Release \
    -DSUPPORT_COMPRESSED_INPUT=ON && \
make

Build with unit tests

mkdir buildwtest && \
cd buildwtest && \
cmake ../ -DBOOST_ROOT=PATH_TO_YOUR_BOOST_ROOT_DIR -DTrimIsoseqPolyA_build_tests=ON && \
make && make test

Usage

To process Iso-Seq classfy output

trim_isoseq_polyA -i isoseq.flnc.fq -t 8 > isoseq.flnc.atrim.fq 2> isoseq.flnc.atrim.log

To process generic fasta files

trim_isoseq_polyA -i input.fq -t 8 -G > input.atrim.fq 2> input.atrim.log

input.atrim.fq file contain the fasta entries with polyA trimmed, based on a default HMM model trained with PacBio data.

input.atrim.log is a tab file with length of polyA been trimmed.

To visualize polyA (colored red when visualized by cat)

trim_isoseq_polyA -i isoseq.flnc.fq -t 8 -c 2>/dev/null