-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs: blastn does search the full database every time #207
Comments
It's hard to share a benchmark in a comment, but: Using a small blastn -query rep_seq/dna-sequences.fasta -db ../dbs/ncbi_16S_db/16S_ribosomal_RNA \
-outfmt '6' -max_target_seqs 10 -perc_identity ?? > blast_output_??.txt We would expect poor results for
And what's different once we request >83% similar? git diff --no-index blast_output_50.txt blast_output_83.txt We are missing just 4 hits... all of which are <83% similar. |
https://www.ncbi.nlm.nih.gov/books/NBK279684/#_appendices_Outline_of_the_BLAST_process_
This exhaustive search explain why blast has a slow reputation compared to usearch's fail-fast heuristics |
Hi @colinbrislawn , So the parameter description given here is based on this behavior. This might have quietly been changed in the past 6 years, but at the time this was evidently seen by the devs as a feature, not a bug, and such a change probably would have made a bit more noise, but maybe I was just not paying enough attention 😁 Do you have any evidence (e.g., release notes from NCBI?) that this behavior has changed? |
None. I'm a surprised as you! Maybe in response to that 2019 paper? It may not be searching the full database. I first noticed when I went to sort blast results and discovered that the hits were already sorted by something. When I dropped edit: My interpretation is that yes, the full database is being searched and |
I think that's the case for usearch / vsearch, but not for blast
The text was updated successfully, but these errors were encountered: