-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rescue using partial hits, even in non-MCS mode #476
base: base-hash
Are you sure you want to change the base?
Conversation
Comparing accuracy (paf) - sim38f79881 Version 0.15.0
Average difference se: +0.2454 Comparing accuracy (paf) - sim58f79881 Version 0.15.0
Average difference se: +0.9486 |
Here are some initial runtime measurements. Comparing user time in seconds (paf) - sim58f79881 Version 0.15.0
|
Here’s a comparison between v0.15.0 using Comparing accuracy (paf) - sim3
Average difference se: -0.0467 |
f230178
to
0026eba
Compare
Really nice -- great idea @Itolstoganov ! Am I interpreting it correctly that:
Also, because of this very interesting result, should we now benchmark three versions in the paper:
|
(To clarify: A while ago, you mentioned that Ivan tried this approach in order to speed up MCS mode - at least that is how I understood it at the time. That is where I got it from.)
Yes. (But more accurately, we wouldn’t even have a "pure" non-MCS mode anymore.)
My first thought is that this is too much - we already have too many plots. The difference wouldn’t be visible anyway (maybe except for read length 50). We should focus on one MCS approach and present that. Maybe add a sentence or perhaps a single plot that shows what the difference is in results between approach 1 and 2. Alternatively, maybe we can find a combined approach that is (nearly) as fast as the first approach and as accurate as the second.
Yes, but we could also choose to abandon approach 1 entirely and change (However, for the next release, we would definitely want to enable MCS approach 1 by default, but we could do that afterwards.) |
Ok, I agree.
You mean the opposite, right? That is, "as fast as the second approach and as accurate as the first"?
hmm, you mean "definitely want to enable MCS approach 2 by default", right? because according to this PR (which is approach 2), approach 2 is both faster and more accurate. Approach 1 (the "previous" mcs in the paper) still has the accuracy-speed tradeoff, making it a more subjective decition. |
Yeah, I mixed them up.
Yes, same mixup. |
Credit for this idea goes to @Itolstoganov.
PR #472 changes the index so that the first syncmer is always chosen as the base. This improves accuracy in MCS, but reduces mapping rate (and thus accuracy) in non-MCS mode. Apparently, some reads with many errors no longer get any hits at all.
However, we can easily detect when a query doesn’t get any hits and then - even if MCS isn’t explicitly enabled - just use the method we already have of looking up partial seeds, as a different kind of MCS rescue.
To be clear, only using partial seeds when no regular hits can be found was an idea by @Itolstoganov.
This actually works much better than I would have expected: Both accuracy and speed get improve!
Details below.