Canu v1.7.1
These are release notes for Canu version 1.7.1, which was released on June 18th, 2018. Canu is specialized for assembly of single-molecule high-noise sequences. Full documentation can be found at http://canu.readthedocs.org/.
This release provides a stable, tested, and documented version of the software. The binary distributions should work on any relatively recent version of the respective OS. The source code distribution contains everything you need to create a binary distribution for your own specific OS.
Citation
- Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Research. (2017).
- Koren S, Rhie A, Walenz BP, Dilthey AT, Bickhart DM, Kingan SB, Hiendleder S, Williams JL, Smith TPL, Phillippy AM. Complete assembly of parental haplotypes with trio binning. Biorxiv. (2018).
Minimum Requirements
- Perl 5.12.0, or File::Path 2.08
- Java SE 8
- GCC 4.5 (for compilation only); GCC 6 recommended
- macOS 10.10 Yosemite (for macOS/Darwin binaries only)
- gnuplot 5.2 (optional, for generating diagnostic graphs)
Installation
Users can download Canu as source code or as pre-compiled binaries. The source code package needs to be compiled and installed before it can be used. The binary distributions need only be unpacked, but they are not available for all platforms.
To install from source code (the file can be named either canu-v1.7.1.tar.gz
or just v1.7.1.tar.gz
, depending on how it is downloaded):
gunzip -dc canu-v1.7.1.tar.gz | tar -xf -
cd canu-1.7.1/src
make -j 8
cd ..
To install from a binary distribution:
xz -dc canu-1.7.1.*.tar.xz |tar -xf -
In both cases, canu is installed in directory canu-1.7.1/-, for example, canu-1.7.1/Linux-amd64. You can run the assembler with:
canu-1.7.1/*/bin/canu
Changes
This release contains only bug fixes made since Canu v1.7 was released. No featrues were added or removed.
Canu v1.7.1 is compatible with assemblies started with Canu v1.7.
Canu v1.7 and v1.7.1 ARE NOT compatible with assemblies started with Canu v1.6.
Bug Fixes
*Fix many bogart issues, including the dreaded "Assertion `cnt > 0' failed". Issues #930, #874, #873, #844, #718, #546. Backported from 6f3c375.
*Fix Read Error Detection (RED) configuration to prevent single-read jobs. Issues #935, #854, #831, #815. Backported from eeef601.
*Fix excessive memory usage when loading evalues into the ovlStore. Issues #956, #758, #755. Backported from 858eff8.
*Fix a (potential) performance problem when computing overlaps for large assemblies: don't set a one-size-fits-all ovlHashBits, base it on the genome size. Backported from a580131.
*Fix a compilation error with GCC 8. Issue #927. Backported from f251336.
Known Issues
*Downloads before 22 June 2018 incorrectly reported the version as "1.7".
See the issues page for up-to date open issues, or to report a problem.
- Large memory usage and runtime for long reads (e.g., Nanopore) when using the
overlapper=ovl
algorithm, and during Overlap Error Adjustment. The-fast
option enables a significantly faster algorithm, but may produce slightly less contiguous assemblies on genomes larger than 1 Gbp in size. It is recommended for nanopore genomes smaller than 1 Gbp. - TrioCanu is not yet optimized for memory usage, as a result it requires higher than default memory for large genomes, the options
gridOptionsExecutive="--mem=250g" griodOptionsMeryl='--partition=largemem --mem=1000g'
(or the equivalent memory request on your grid) should be sufficient for a 3 Gbp genome. - Bubbles are not captured in the contig graph, but are included in the unitig graph. No attempt at marking bubbles is made.
See the FAQ for many suggestions, including suggestions for specific data types, e.g., Nanopore r9 reads.
Legal
Canu is derived from Celera Assembler and includes code from many other projects. Most, but not all, of the code is GPL licensed. See the README.licenses file and individual source code files for details.