1.5.20240305
Pre-release
Pre-release
Nanopore's flow cells R10.4 have forced our hands and we really needed to adjust to the higher quality reads.
This is a pre-release for which all features have not been tested or finalized, but should work. We've been struggling to find test datasets, and it is time consuming to generate our own.
Things that needed adjusting
- Circlator hated us and failed 80% of the time (or more). We have switched out Circlator with DNAAPLER. This has made life better.
- Donut Falls is now a single file. This should make it easier to use as a subworkflow elsewhere (we never did finish the documentation about using git submodules). More information can be found here : https://github.com/UPHL-BioNGS/Donut_Falls/wiki/Linking
- Even though this file does not adhere to the NF-Core template, the labels and basic process template should be compatible with nf-core's config files. They follow a lot of the same logic.
- Filtlong has been replaced with fastp and rasusa. We were noticing irregularities with filtlong (i.e. we would specify coverage, but then not achieve the desired coverage, or we would specify read length but still have too-short reads go through). Fastp is now used as a hard cutoff: nanopore reads must be larger than 1,000 bases and have a quality score greater than Q12 to pass. This should not be a problem for R10.4 runs. Rasusa will then subsample these reads to 150X coverage (based on 5M genome size). We have some example config files for smaller genomes. We recommend on running unicycler on "older" datasets.
- The tests have changed. Many of the old datasets won't pass the fastp filter.
- Medaka has been reduced down to 1 CPU. We were noticing irregularities with medaka polishing at higher cpu usage.
- Multiple assemblers can be specified at once. We were finding that we needed to run flye and unicycler on most of our samples, so this simplified things for us.
- We needed more QC metrics, specifically we needed coverage estimates on plasmids. It helps tells if they are real or not.
- POLCA has been replaced with pypolca. It seems to perform as well, is faster, and the container is smaller.
We dropped support for the following:
- Trycycler : removing reads that don't conform wasn't best practice and we were getting a proliferation of plasmids.
- Dragonflye : the current version won't run on our local system (likely due to some perl weirdness) and we are wary of supporting options that we can't test.
- Miniasm : if Flye, Unicycler, or Raven couldn't close a genome, miniasm couldn't close it either