Genomes assembled from Nanopore data are imperfect. Even at 99.9% consensus accuracy there remains ~1 error per 1000 bp. Given a typical bacterial gene is ~1000 bp, this means 1 error per gene. The most common error mode in Nanopore is indels, usually around homopolymer regions. This results in a frame-shift in the gene, causing annotation tools like Prokka to find two partial genes instead of one intact gene.
Ideally one would also sequence with Illumina and polish the assembly, as
Illumina data does not suffer from homopolymer issues. However, if you are
unable to do that, polyfix
is designed to take your draft Nanopore
assembly and compare it to one or more "trusted reference" genomes, and
polish out likely homopolymer errors to remove the frame-shifts.
% polyfix --version
polyfix 0.0.1
% polyfix --help
conda install -c conda-forge -c bioconda -c defaults polyfix # COMING SOON
Install HomeBrew (Mac OS X) or LinuxBrew (Linux).
brew install brewsci/bio/polyfix # COMING SOON
This will install the latest version direct from Github.
You'll need to add the polyfix bin
directory to your $PATH
,
and also ensure all the dependencies are installed.
cd $HOME
git clone https://github.com/tseemann/polyfix.git
$HOME/polyfix/bin/polyfix --help
perl
>= 5.26
polyfix is free software, released under the GPL 3.0.
Please submit suggestions and bug reports to the Issue Tracker