Canu v1.9
These are release notes for Canu version 1.9, which was released on November 4th, 2019. Canu is specialized for assembly of single-molecule high-noise sequences. Full documentation can be found at http://canu.readthedocs.org/.
This release provides a stable, tested, and documented version of the software. The binary distributions should work on any relatively recent version of the respective OS and are the recommended way to install Canu. The source code distribution contains everything you need to create a binary distribution for your own specific OS.
Citation
- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Research. (2017).
- Koren S, Rhie A, Walenz BP, Dilthey AT, Bickhart DM, Kingan SB, Hiendleder S, Williams JL, Smith TPL, Phillippy AM. De novo assembly of haplotype-resolved genomes with trio binning. Nature Biotechnology. (2018).
Minimum Requirements
- 8GB minimum memory; 16GB strongly suggested
- Perl 5.12.0, or File::Path 2.08
- Java SE 8
- GCC 4.5 (for compilation only); GCC 7 or newer strongly recommended
- macOS 10.10 Yosemite (for macOS/Darwin binaries only)
- gnuplot 5.2 (optional, for generating diagnostic graphs)
Installation
Users can download Canu as source code or as pre-compiled binaries. The binary distribution is the recommended install method, assuming it is available for your platform. The source code package needs to be compiled and installed before it can be used.
To install from a binary distribution (recommended installation method):
tar -xJf canu-1.9.*.tar.xz
To install from source code (the file can be named either canu-v1.9.tar.gz
or just v1.9.tar.gz
, depending on how it is downloaded):
gunzip -dc canu-v1.9.tar.gz | tar -xf -
cd canu-1.9/src
make -j 8
cd ..
In both cases, canu is installed in directory canu-1.9/-, for example, canu-1.9/Linux-amd64. You can run the assembler with:
canu-1.9/*/bin/canu
Changes
This release includes several major bug fixes and improves repeat separation and consensus quality for assemblies.
Canu v1.9 IS NOT compatible with assemblies started with any previous version.
- Preliminary support for HiFi data using option '-pacbio-hifi'. This will skip the correction and trimming phases, set options for high quality reads.
- Improved detection of indel errors in overlaps used for creating contigs. Fix several errors that all but disabled detection of errors in these overlaps.
- Fix an error in consensus generation that was effectively disabling consensus on large contigs.
- Significantly improve speed of reading overlaps during, for example, trimming.
- Trim 'N' bases at either end of a read (as they tended to obscure true overlaps), and treat 'N' bases in the middle of a read as don't-care matches during consensus.
- Support for the DNAnexus platform.
- Output file 'contigs.gfa' was removed because it was misleading.
- Parameter 'saveOverlaps': By default, the 'correction' and 'trimming' overlap stores are removed when they are no longer needed. Set saveOverlaps=true to retain them.
- Parameter 'purgeOverlaps': Controls when to remove intermediate overlap data: never, normal (when all overlaps are loaded into an overlap store, default), aggressive (as soon as safely possible), dangerous (as soon as possible, even if it's unsafe).
- Parameter 'gridEngineResourceOption': A combination of gridEngineThreadsOption and gridEngineMemoryOption, useful for grid schedulers that use one option for requesting both memory and CPUs.
- Parameter 'hapUnknownFraction': Don't include 'unassigned' reads in the haplotype assemblies if they amount to less than some fraction of the total reads. Default 0.05 (5%).
- Option '-haplotype': Will stop Canu after haplotyped reads are generated. No assemblies will be started.
Bug Fixes
- A variety of bug fixes that nobody will really care about (unless your assembly crashed, in which case you already know it's fixed) and will be tedious to list, so they aren't listed.
Known Issues
See the issues page for up-to date open issues, or to report a problem.
- Large memory usage and runtime for long reads (e.g., Nanopore) when using the
overlapper=ovl
algorithm, and during Overlap Error Adjustment. The-fast
option enables a significantly faster algorithm, but may produce slightly less contiguous assemblies on genomes larger than 1 Gbp in size. It is recommended for nanopore genomes smaller than 1 Gbp. - No support for trio binning of HiFi data. As a workaround, specify the HiFi data as -pacbio-raw and run only the haplotyping step (-haplotype) followed by assembly of the partitioned reads.
See the FAQ for many suggestions, including suggestions for specific data types, e.g., Nanopore r9 reads.
Legal
Canu is derived from Celera Assembler and includes code from many other projects. Most, but not all, of the code is GPL licensed. See the README.licenses file and individual source code files for details.