-
-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: fuzzy matching completions #671
Conversation
e5e6fd7
to
43e103c
Compare
This looks great - thank you so much! As you noted, there aren't unit tests for the completers, but there are functional/snapshot tests here:
It would be great to tweak that test to test fuzzy matches, in addition to exact matches; in particular I want to make sure that we don't just append the match to the buffer, but replace the input characters. The snapshot tests can be a little bit of a pain, so let me know if you can't figure it out or don't want to, and I can take it from here. Little more info here: https://harlequin.sh/docs/contributing/index#inspecting-and-updating-snapshot-tests
I think to address that, fuzzy matches should be ordered better -- right now they are listed alphabetically; would be good to order fuzzy matches by edit distance, or order all matches by:
A final optimization related to the ordering -- it might make sense to only compute fuzzy matches if the length of the list of exact and startswith matches is below some threshold. i.e., no reason to fuzzy match on a single character input that already returns dozens of results. Let me know if you want to contribute the next iteration here. Thanks so much for getting this started! |
I am happy to contribute the new version! (will get it done at some point this week) |
Well in despite the tests being broken for me RN ...
I implemented this idea with some minor variations ... I is Exact > starswith > fuzzy and I am sorting the fuzzy matches by length. This results in "more fuzzy" matches generally going last (which I feel is a good approximation). I am also skipping the fuzzy matching if there are already more than 20 direct matches. Also took the freedom to add an unit test for it. I bundled the completions from a dummy database with just the 'iris' dataset added to it and stored the LMK what you think and if you have any ideas on how to fix the testing issue. Full traceback after killing a pytest session
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a typo in your ordering, otherwise this looks good to me.
re: tests, you probably need to run poetry install --sync
, since a lot of dependencies were updated with the latest commits.
I'll take the unit test, although it would be better if the .json
was smaller (like a dozen or two values) and therefore easier to maintain and tweak to test the exact behavior you're looking for.
# Sort in ascending length. | ||
# I am assuming here that more insertions are less likely to be | ||
# the "right" match. | ||
matches.sort(key=lambda c: len(c.match_val), reverse=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch
After nuking the env it works fine again! 571ad25 Adds the review suggestions and updates the .ambr files. Also noticed it now shows a "Tx: Auto" in several of the snapshots. Which I don't believe are related with my PR. Thanks a lot for the feedback! |
The Tx: Auto is added when using python 3.12+ with SQLite. For developing and testing I use 3.8 (soon 3.9) |
@tconbeer I Updated the snapshots to be on py 3.10 (<=3.9 does not install in m1 macs ... no version of |
the last few failures are from snapshots that aren't running on macs; I can help clean up those final snapshots in the next few days |
Remaining failures appear to be due to a sqlite3 version mismatch (causing the completers to return a different number of keywords). ("works on my machine"). Going to ship this. Thanks so much for your contribution, @jspaezp !! |
Wohoo! Thank you very much for helping throughout the process (and for the amazing project)! It has been an absolute pleasure @tconbeer |
Closes #601
What are the key elements of this solution?
startswith
is used, if I have a column named "team_sales", typing "sales" is not a match, since it does not share a prefix. This adds support for it.Why did you design your solution this way? Did you assess any alternatives? Are there tradeoffs?
Does this PR require a change to Harlequin's docs?
Did you add or update tests for this change?
Please complete the following checklist:
CHANGELOG.md
, under the[Unreleased]
section heading. That entry references the issue closed by this PR.