Fuzzy searching #4339
Replies: 4 comments
-
I'll add the little bit that I know to try to explain why The Git2 crate's text search vector (minus the Readme) looks like:
You'll notice that the values for I like the idea of a hybrid approach, but I'd be curious about how that would affect the query speed. This would be done by ordering on a function of the ts_rank_cd result and the other pieces(all-time downloads, etc). I'm curious if Diesel can do this. Alternatively, in this case, #1266 would have included the title into the ranking as all trigrams of the search were in the package title. This could still miss relevant searches though, so it's not a catch-all. As another alternative, you could search by keyword |
Beta Was this translation helpful? Give feedback.
-
Yes, it can.
I'm happy to experiment. Can you give me some specific queries you'd like tried?
It is. https://crates.io/keywords/git. I'm definitely open to suggestions for better exposing that. |
Beta Was this translation helpful? Give feedback.
-
Probably related: A search for "ssh" only returns the probably most mature |
Beta Was this translation helpful? Give feedback.
-
Another example is searching for "same file" does not seem to find There are also some similar closed issues which don't have links here: #2746 #3450 |
Beta Was this translation helpful? Give feedback.
-
I recently searched crates.io for "git" using the default sorting of relevance. I expected to find the git2 crate, but instead found the git crate.
Below is a screenshot of the exact match currently found when searching with https://crates.io/search?q=git.
Searching for "git2" directly with https://crates.io/search?q=git2 produces the desired result with an exact match.
I took a look at the source for crates.io briefly last night and it looks like the search controller uses the PostgreSQL
ts_rank_cd
text search function for the default search. I'm not familiar enough with the Cover Density Ranking algorithm to explain why or whether this produces the results above, but that might be a starting point in digging deeper into this.Relevance seems like a tricky term here. The default search probably does produce the most relevant package from a text similarity standpoint, but not necessarily to me as a programmer looking for a git library to use. Maybe a hybrid approach that considers text relevance, all-time downloads, and recent downloads would produce something closer to what I expected.
Beta Was this translation helpful? Give feedback.
All reactions