-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments regarding speedup to O(n+m+s^2)
#5
Comments
O(n+m+s^2)
The log factor in Myers' O(ND) paper actually comes from the least common ancestor data structure rather than the suffix tree. However, since publishing the preprint, I had discovered that the Landau-Vishkin algorithm uses a similar approach. Considering that it didn't provide a practical speed up, I think I will probably remove that aspect from the manuscript. |
They write:
and they basically repeat this later, so my understanding is still that the
Right makes sense. Nethertheless, do you have an idea how large |
Oh, nevermind then, I must be misremembering. My expectation is that the tradeoff has less to do with N or S than with the properties of the sequences. Because the constants are so much higher on the suffix tree algorithm, there would only be a substantial speed-up if there are many long matches that can be reduced to O(1) queries. In high entropy sequences, the lengths of spurious matches are very short (expected length ~4/3 in uniform random sequences). To have many long matches, the sequences need to have a high rate of internal repetition. |
The section on the speedup to
O(n+m+s^2)
should have a remark on whether you are assuming a finite or infinite alphabet.Implicitly in the context of pairwise alignment, I would assume a finite alphabet.
However, The Myers'86 paper already writes that building the suffix tree takes linear time for finite alphabets, giving the required
O(N+M+D^2)
in their terms. They don't explicitly mention this (I think), because they are supposedly doing LCS in the context of diff algorithms, where lines of code actually give an infinite alphabet.Similarly, Landau-Vishkin'89 writes that the complexity is
O(nk)
for finding all matches with up tok
mistakes of a pattern in a text for finite alphabets, reduced fromO(nm)
by using suffix trees as well. Their paper is not about global alignment, but applying their result to Ukkonen'85s global alignment algorithm directly gives theO(n+m+s^2)
runtime.The text was updated successfully, but these errors were encountered: