From 082883e98bd77139afa54868ca269fe3de49c3b0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Armin=20To=CC=88pfer?= Date: Mon, 9 Sep 2019 12:25:05 +0200 Subject: [PATCH] Version 1.10.0 * Output N barcodes per subdirectory with `--files-per-directory N` and output splitting * BioSample awareness for XML input and split output and allow ignoring them with `--ignore-biosamples` * Increase `--window-size-mult` to `3` to allow longer spacers * Do not report no adapter hits as too short inserts * Increase `--guess` barcode score to `75` if `--peek-guess --ccs` are combined * Enable double demux of CCS data * Print run time, CPU time, and peak memory consumption with `--log-level INFO` * New CLI UX --- README.md | 40 +++++++++++++++++++++++++++++++++++++--- 1 file changed, 37 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 605578e..1bbaef0 100644 --- a/README.md +++ b/README.md @@ -115,7 +115,7 @@ to use `--no-pbi`, omit the pbi index file, to minimize time to result. ## Input data Input data is either raw unaligned subreads, straight from a Sequel, or -unaligned CCS reads, generated by [CCS 2](https://github.com/PacificBiosciences/unanimity); +unaligned CCS reads, generated by [CCS](https://github.com/PacificBiosciences/ccs); both in the PacBio enhanced BAM format. If you want to demux RSII data, first use SMRT Link or bax2bam to convert h5 to BAM. In addition, a `datastore.json` with one file entry, either a SubreadSet or ConsensusReadSet, is also allowed. @@ -462,7 +462,7 @@ there is no way to distinguish between pairs `bc1001--bc1002` and `bc1002--bc100 Score and tag per subread, instead per ZMW. ### `--window-size-mult` -The candidate region size multiplier: `barcode_length * multiplier`, default `1.5`. +The candidate region size multiplier: `barcode_length * multiplier`, default `3`. Optionally, you can specify the region size in base pairs with `--window-size-bp`. ### Alignment options @@ -986,9 +986,43 @@ false positives. * Calls barcodes per barcode region and does not enforce adapter coupling * Nice reports for QC +### Can I remove PCR primers after demultiplexing? +Yes! After demultiplexing, just lima on the output again with your PCR primer(s). + +### Can I limit the output files per directory? +If you use output BAM splitting, it can happen that you get a lot of output files. +Using `--files-per-directory N` creates subdirectories and outputs at most `N` +barcodes per directory. + +### `--peek-guess` does not work with XML input! +If your input XML file contains ``, lima will deactivate barcode +inference via `--peek-guess` and only output barcodes specified in this section. +The assumption is that you know exactly which barcodes have been used and need no +inference. If this assumption is wrong, like the barcodes in the XML are wrong, +you can either just use BAM as input or use `--ignore-biosamples`. + +### Help, I get `ERROR: Could not find matching barcodes!` +If you happen to get following error message + + ERROR: Could not find matching barcodes! Check that the set of barcodes contains the used sequences and the correct mode has been selected: same or different. + +then your XML input contains BioSamples with different barcode names than the +provided `barcode.fasta` file. Please check that you've used the correct +barcodes. You can ignore barcodes specified in the XML with `--ignore-biosamples`. + + ## Full Changelog - * **1.9.0**: + * **1.10.0**: + * Output N barcodes per subdirectory with `--files-per-directory N` and output splitting + * BioSample awareness for XML input and split output and allow ignoring them with `--ignore-biosamples` + * Increase `--window-size-mult` to `3` to allow longer spacers + * Do not report no adapter hits as too short inserts + * Increase `--guess` barcode score to `75` if `--peek-guess --ccs` are combined + * Enable double demux of CCS data + * Print run time, CPU time, and peak memory consumption with `--log-level INFO` + * New CLI UX + * 1.9.0: * Add `--bad-adapter-ratio` to remove ZMWs with molecularly missing adapters * Fix rare case, where a read only matches one barcode and not a single alternative * Fix `--no-bam` to automatically omit pbi