Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible shortcuts for fast filter #6

Closed
abstractqqq opened this issue Feb 7, 2024 · 3 comments
Closed

Possible shortcuts for fast filter #6

abstractqqq opened this issue Feb 7, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@abstractqqq
Copy link

I wonder if it is possible to have a function that returns early when the levenshtein distance is gauranteed to be < an input k, and returns a boolean. I think this might speed up queries that only ask for words that are close but do not care exactly how close they are.

Any thoughts? Thanks in advance.

@maxbachmann maxbachmann added the enhancement New feature or request label Feb 8, 2024
@maxbachmann
Copy link
Member

maxbachmann commented Feb 8, 2024

I did think about adding these in Python a while ago: rapidfuzz/RapidFuzz#188.

The main reason I didn't do so yet is that I haven't really seen any cases, where:

  1. this is the wanted behaviour. In most cases you still want to know the similarity. One case I could think of is something like deduplication.
  2. the score_cutoff is relatively high. For a low score_cutoff this wouldn't really help, since we would essentially need to calculate pretty much the complete distance anyway. In fact the additional check of the exit condition would cause slowdowns when we can't exit early.

Do you have an example where this is useful?

@abstractqqq
Copy link
Author

No on second thought I think you are right. The cases I have all have a small cutoff and may not worth the early return

@maxbachmann
Copy link
Member

For small cutoffs there I already use Ukkonens algorithm according to https://www.sciencedirect.com/science/article/pii/S0019995885800462 which helps significantly when comparing long texts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants