From 5cc846f75b309cc8ad5e554c17a7f33842eecdd0 Mon Sep 17 00:00:00 2001 From: Mike Jackson Date: Thu, 31 Oct 2019 05:52:24 -0700 Subject: [PATCH] Clarified Selecting the representative read documentation A user queried the use of quality in which read is kept upon deduplication (#261) > your selection procedure does not seem to take into > account read sequencing quality (only mapping quality). > > In other words, if 2 reads have the same high score > mapping quality (i.e. unique mappers), one being long and > with good base scores, the other short with errors, it will > select randomly among these, too, right? The response was: > Yes, that's correct. Updated umi-tools/dedup.py "Selecting the representative read" comment section to clarify that the read quality is not used. --- umi_tools/dedup.py | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/umi_tools/dedup.py b/umi_tools/dedup.py index bb48fd1e..cd3fb78f 100644 --- a/umi_tools/dedup.py +++ b/umi_tools/dedup.py @@ -23,7 +23,10 @@ 1. The read with the lowest number of mapping coordinates (see ``--multimapping-detection-method`` option) -2. The read with the highest mapping quality +2. The read with the highest mapping quality. Note that this is not +the read sequencing quality and that if two reads have the same +mapping quality then one will be picked at random regardless of the +read quality. Otherwise a read is chosen at random.