-
Notifications
You must be signed in to change notification settings - Fork 2
Track‐query matching
When you search for a song on a music service, you expect to see the song you're looking for as the first result. But what if you misspell the song title? Or what if there are many songs with the same title from different artists?
It's not realistic to expect a user to specify the exact song and artist every time they search, and music services understandably spend a lot of time and effort on making sure that users can find the song they're looking for. Blanco should be no different.
Before Release 0.4.5, Blanco used the following logic to check whether a human search query (say, mona lisa
) closely matched a song title and artist (say, Mona Lisa Dominic Fike
):
from difflib import get_close_matches
def check_similarity(actual: str, candidate: str) -> float:
"""
Checks the similarity between two strings. Meant for comparing
song titles and artists with search results.
:param actual: The actual string.
:param candidate: The candidate string, i.e. from a search result.
:return: A float between 0 and 1, where 1 is a perfect match.
"""
actual_words = set(actual.lower().split(' '))
candidate_words = set(candidate.lower().split(' '))
intersection = actual_words.intersection(candidate_words)
difference = actual_words.difference(candidate_words)
# Include close matches
for word in difference:
close_matches = get_close_matches(word, candidate_words, cutoff=0.8)
if len(close_matches) > 0:
intersection.add(close_matches[0])
return len(intersection) / len(actual_words)
The actual
variable contains the raw search query from the user, and the candidate
variable is expected to be a string of the format <song title> <artist>
.
The gist of the algorithm is this:
- Turn the strings lowercase.
- Split the actual string (the search query from the user) and candidate string (a search result from Spotify) into sets of words.
- Find the intersection of the two sets, i.e., the words that are in both sets.
- Find the difference of the two sets, i.e., the words that are in the actual set but not the candidate set.
- If there are any words in the difference set that are close matches to words in the candidate set, add them to the intersection set.
This worked pretty well for the most part, but it had two glaring problems:
- It didn't take into account the words that were in the candidate set but not the actual set.
- It didn't take into account the ranking from Spotify, which more than likely accounts for popularity and trends.
For example, if the user searched for nightstand
and the top search results were NIGHTSTAND
by Kxllswxtch
and Nightstand
by Justus Bennetts
, the algorithm would return a similarity score of 1.0 for both, even though the two songs are completely different. In fact, it will return a similarity score of 1.0 for any song with the word nightstand
in the title, regardless of the artist.
Blanco releases 0.4.5 and newer build upon check_similarity()
above, using smarter logic to match human queries against search results. The new algorithm is as follows:
from thefuzz import fuzz
def check_similarity_weighted(actual: str, candidate: str, candidate_rank: int) -> int:
"""
Checks the similarity between two strings using a weighted average
of a given similarity score and the results of multiple fuzzy string
matching algorithms. Meant for refining search results that are
already ranked.
:param actual: The actual string.
:param candidate: The candidate string, i.e. from a search result.
:param candidate_rank: The rank of the candidate, from 0 to 100.
:return: An integer from 0 to 100, where 100 is the closest match.
"""
naive = check_similarity(actual, candidate) * 100
tsr = fuzz.token_set_ratio(actual, candidate)
tsor = fuzz.token_sort_ratio(actual, candidate)
ptsr = fuzz.partial_token_sort_ratio(actual, candidate)
return int(
(naive * 0.7) +
(tsr * 0.12) +
(candidate_rank * 0.08) +
(tsor * 0.06) +
(ptsr * 0.04)
)
Like the old algorithm, the actual
variable contains the raw search query from the user, and the candidate
variable is expected to be a string of the format <song title> <artist>
.
The algorithm still takes into account how much of the user's query is present in the search result, but it now also takes into account the ranking from Spotify and the results of a few fuzzy string matching algorithms (namely, Token Set Ratio, Token Sort Ratio, and Partial Token Sort Ratio). All of these factors are combined into a weighted average.
This release updates the threshold used in check_similarity()
from 0.8
to 0.9
and changes the candidate_rank
values passed to check_similarity_weighted()
.
If Spotify gives us 10 results, Blanco generates the weights [100, 90, ..., 10]
for the results in order, which will then be used as candidate_rank
values. This linear scale, however, removes much of the information that Spotify's ranking gives implicitly. If multiple tracks are returned for a query, and each track varies significantly (the title could be completely different for example), this variation is not communicated by weights that differ by a constant value.
Release 0.4.7 changes the weights to an exponential scale using a factor of 0.8. That is, the first weight remains at 100, then 100 * 0.8 = 80
, then 100 * 0.8 * 0.8 = 64
, and so on. This gives each result significantly more weight than the result after it.
In the test results below, the original index
column refers to the index of the song in Spotify's search API results, and the candidate (title + artist)
column refers to the candidate search result from Spotify. The weighted
column refers to the similarity score returned by check_similarity_weighted()
, and the difflib
column refers to the similarity score returned by check_similarity()
.
Spotify rank
is just 100 - (original index * 10)
(so that the first result has a rank of 100). The partial token sort ratio
, token set ratio
, and token sort ratio
columns refer to the results of the fuzzy string matching algorithms from thefuzz
.
The first table in each section is meant to simulate a vague search query, and the second table is meant to simulate a more specific search query. The rows are then sorted by the weighted
values. In production, Blanco picks the result with the top weighted
value.
As you will see, the ranking is still not perfect, but it's a lot better than if we were to rank using difflib
alone.
Search results for "cherry wine"
original index | candidate (title + artist) | weighted | difflib | spotify rank | partial token sort ratio | token set ratio | token sort ratio |
---|---|---|---|---|---|---|---|
0 | Cherry Wine grentperez | 97 | 100 | 100 | 78 | 100 | 67 |
1 | Cherry Wine - Live Hozier | 95 | 100 | 80 | 78 | 100 | 65 |
2 | Cherry Wine - Live Hozier | 94 | 100 | 64 | 78 | 100 | 65 |
3 | Cherry Wine Nas Amy Winehouse | 92 | 100 | 51 | 73 | 100 | 55 |
5 | cherry wine Zachary Knowles | 91 | 100 | 32 | 80 | 100 | 58 |
4 | Cherry Wine - Live from Spotify SXSW ... | 90 | 100 | 40 | 64 | 100 | 39 |
6 | Cherry Wine Jasmine Thompson | 90 | 100 | 26 | 78 | 100 | 56 |
9 | Cherry Wine - Live Hozier | 90 | 100 | 13 | 78 | 100 | 65 |
7 | Cherry Waves Deftones | 51 | 50 | 20 | 80 | 71 | 56 |
8 | Like Real People Do Hozier | 9 | 0 | 16 | 55 | 32 | 32 |
Search results for "cherry wine hozier"
original index | candidate (title + artist) | weighted | difflib | spotify rank | partial token sort ratio | token set ratio | token sort ratio |
---|---|---|---|---|---|---|---|
0 | Cherry Wine - Live Hozier | 98 | 100 | 100 | 89 | 100 | 88 |
1 | Cherry Wine - Live Hozier | 97 | 100 | 80 | 89 | 100 | 88 |
4 | Cherry Wine - Live Hozier | 94 | 100 | 40 | 89 | 100 | 88 |
2 | Cherry Wine - Live from Spotify SXSW ... | 93 | 100 | 64 | 72 | 100 | 56 |
7 | Cherry Wine - Live Hozier | 92 | 100 | 20 | 89 | 100 | 88 |
5 | Cherry Wine - Live in Greystones, Cou... | 89 | 100 | 32 | 67 | 100 | 44 |
3 | Cherry Wine grentperez | 66 | 66 | 51 | 64 | 76 | 70 |
9 | Francesca Hozier | 37 | 33 | 13 | 62 | 59 | 53 |
6 | Like Real People Do Hozier | 36 | 33 | 26 | 62 | 50 | 45 |
8 | Autumn Tree Milo Greene | 7 | 0 | 16 | 34 | 24 | 24 |
Search results for "nightstand"
original index | candidate (title + artist) | weighted | difflib | spotify rank | partial token sort ratio | token set ratio | token sort ratio |
---|---|---|---|---|---|---|---|
0 | NIGHTSTAND Kxllswxtch | 97 | 100 | 100 | 100 | 100 | 65 |
1 | Nightstand Justus Bennetts | 95 | 100 | 80 | 100 | 100 | 56 |
2 | Nightstand Dev Lemons | 95 | 100 | 64 | 100 | 100 | 65 |
4 | Nightstand Lil Candy Paint | 92 | 100 | 40 | 100 | 100 | 56 |
5 | Nightstand dethcaps | 92 | 100 | 32 | 100 | 100 | 69 |
6 | Nightstand YNG Martyr YNG ONE | 91 | 100 | 26 | 100 | 100 | 51 |
7 | Nightstand Dev Lemons | 91 | 100 | 20 | 100 | 100 | 65 |
8 | Nightstand K. Michelle | 91 | 100 | 16 | 100 | 100 | 65 |
3 | One Night Standards Ashley McBryde | 14 | 0 | 51 | 60 | 45 | 45 |
9 | Night Stalkers Megadeth ICE-T | 11 | 0 | 13 | 80 | 41 | 41 |
Search results for "nightstand kxllswxtch"
original index | candidate (title + artist) | weighted | difflib | spotify rank | partial token sort ratio | token set ratio | token sort ratio |
---|---|---|---|---|---|---|---|
0 | NIGHTSTAND Kxllswxtch | 100 | 100 | 100 | 100 | 100 | 100 |
2 | Nightstand Justus Bennetts | 54 | 50 | 64 | 73 | 65 | 55 |
6 | Nightstand Dev Lemons | 51 | 50 | 26 | 73 | 65 | 62 |
1 | Disaster KSLV Noh | 12 | 0 | 80 | 31 | 26 | 26 |
3 | Final Stage KSLV Noh | 12 | 0 | 51 | 42 | 39 | 39 |
4 | Nightstalker RBX Krayzie Bone | 12 | 0 | 40 | 48 | 40 | 40 |
5 | Psycho KSLV Noh | 10 | 0 | 32 | 40 | 33 | 33 |
9 | Blindspot LXST CXNTURY | 9 | 0 | 13 | 38 | 37 | 37 |
8 | Walk In Lxzt | 8 | 0 | 16 | 35 | 30 | 30 |
7 | Disaster - Slowed + Reverb KSLV Noh | 7 | 0 | 20 | 29 | 27 | 27 |
Search results for "mona lisa"
original index | candidate (title + artist) | weighted | difflib | spotify rank | partial token sort ratio | token set ratio | token sort ratio |
---|---|---|---|---|---|---|---|
1 | Mona Lisa Dominic Fike | 95 | 100 | 80 | 100 | 100 | 58 |
0 | Mona Lisa (Spider-Man: Across the Spi... | 94 | 100 | 100 | 78 | 100 | 27 |
3 | mona lisa mxmtoon | 94 | 100 | 51 | 100 | 100 | 69 |
2 | Mona Lisa (feat. Kendrick Lamar) Lil ... | 92 | 100 | 64 | 100 | 100 | 29 |
4 | Mona Lisa Brentrambo LUCKI | 91 | 100 | 40 | 80 | 100 | 51 |
6 | Mona Lisa, Mona Lisa FINNEAS | 91 | 100 | 26 | 100 | 100 | 50 |
5 | The Ballad of Mona Lisa Panic! At The... | 90 | 100 | 32 | 100 | 100 | 35 |
8 | Mona Lisa Nat King Cole | 90 | 100 | 16 | 100 | 100 | 56 |
9 | Mona Lisa ONLY1 THEORY | 90 | 100 | 13 | 100 | 100 | 58 |
7 | Mona Lisas And Mad Hatters Elton John | 49 | 50 | 20 | 78 | 62 | 39 |
Search results for "mona lisa dominic fike"
original index | candidate (title + artist) | weighted | difflib | spotify rank | partial token sort ratio | token set ratio | token sort ratio |
---|---|---|---|---|---|---|---|
1 | Mona Lisa Dominic Fike | 98 | 100 | 80 | 100 | 100 | 100 |
0 | Mona Lisa (Spider-Man: Across the Spi... | 97 | 100 | 100 | 91 | 100 | 56 |
2 | Bodies Dominic Fike | 55 | 50 | 64 | 77 | 77 | 59 |
4 | Mama's Boy Dominic Fike | 54 | 50 | 40 | 78 | 71 | 71 |
3 | Mona Lisa (feat. Kendrick Lamar) Lil ... | 51 | 50 | 51 | 59 | 58 | 45 |
8 | Phone Numbers Dominic Fike Kenny Beats | 51 | 50 | 16 | 68 | 71 | 57 |
6 | Think Fast (feat. Weezer) Dominic Fik... | 50 | 50 | 26 | 63 | 71 | 47 |
9 | mona lisa mxmtoon | 49 | 50 | 13 | 69 | 69 | 46 |
7 | Mona Lisa val Peter Fenn Tray Haggerty | 48 | 50 | 20 | 59 | 58 | 43 |
5 | Monalisa Lojay Sarz Chris Brown | 12 | 0 | 32 | 55 | 45 | 45 |
Search results for "forever sophie"
original index | candidate (title + artist) | weighted | difflib | spotify rank | partial token sort ratio | token set ratio | token sort ratio |
---|---|---|---|---|---|---|---|
0 | Forever - SOPHIE Remix FLETCHER SOPHIE | 96 | 100 | 100 | 73 | 100 | 56 |
3 | Forever in your arms Sophie Nichols | 92 | 100 | 51 | 64 | 100 | 57 |
5 | Forever Yours Sophie Deas | 92 | 100 | 32 | 100 | 100 | 72 |
8 | I Want A Mom (That Will Last Forever)... | 87 | 100 | 16 | 64 | 100 | 31 |
1 | Darkness Forever - Sophie's Version S... | 59 | 50 | 80 | 71 | 100 | 47 |
2 | Faceshopping SOPHIE | 53 | 50 | 64 | 67 | 60 | 55 |
4 | Sophie Bear's Den | 51 | 50 | 40 | 67 | 60 | 58 |
6 | Darkness Forever Soccer Mommy | 50 | 50 | 26 | 64 | 67 | 51 |
7 | Infatuation SOPHIE | 50 | 50 | 20 | 70 | 60 | 56 |
9 | Not Okay - Alone Mix SOPHIE | 48 | 50 | 13 | 67 | 60 | 46 |
Search results for "forever sophie remix"
original index | candidate (title + artist) | weighted | difflib | spotify rank | partial token sort ratio | token set ratio | token sort ratio |
---|---|---|---|---|---|---|---|
0 | Forever - SOPHIE Remix FLETCHER SOPHIE | 98 | 100 | 100 | 100 | 100 | 71 |
1 | Forever - SOPHIE Remix FLETCHER SOPHIE | 96 | 100 | 80 | 100 | 100 | 71 |
3 | Forever - SOPHIE Remix FLETCHER SOPHIE | 94 | 100 | 51 | 100 | 100 | 71 |
5 | Forever - SOPHIE Remix FLETCHER SOPHIE | 92 | 100 | 32 | 100 | 100 | 71 |
7 | Forever - SOPHIE Remix FLETCHER SOPHIE | 91 | 100 | 20 | 100 | 100 | 71 |
9 | Forever - SOPHIE Remix FLETCHER SOPHIE | 91 | 100 | 13 | 100 | 100 | 71 |
4 | Forever (Solomun Remix) Weval Solomun | 66 | 66 | 40 | 89 | 79 | 62 |
2 | Forever Wavey | 43 | 33 | 64 | 78 | 70 | 55 |
6 | Forever Trit95 | 40 | 33 | 26 | 78 | 67 | 59 |
8 | forever - Slowed & Reverb L0WS | 38 | 33 | 16 | 59 | 61 | 61 |
Search results for "she charles"
original index | candidate (title + artist) | weighted | difflib | spotify rank | partial token sort ratio | token set ratio | token sort ratio |
---|---|---|---|---|---|---|---|
1 | She Charles Aznavour Bryan Ferry | 94 | 100 | 80 | 82 | 100 | 51 |
2 | She - Tous les visages de l’amour Cha... | 91 | 100 | 64 | 82 | 100 | 25 |
4 | She (Tous Les Visages De L’Amour) Cha... | 90 | 100 | 40 | 82 | 100 | 25 |
0 | She Cares Styx | 59 | 50 | 100 | 90 | 72 | 72 |
7 | Jesse Charles Wesley Godwin | 52 | 50 | 20 | 84 | 78 | 53 |
3 | She Cares Patrick Dorgan | 51 | 50 | 51 | 71 | 51 | 51 |
6 | She Dennis van Aarssen | 48 | 50 | 26 | 59 | 48 | 48 |
9 | She Elvis Costello | 48 | 50 | 13 | 63 | 55 | 55 |
5 | She Changes the Weather Swim Deep | 47 | 50 | 32 | 67 | 43 | 41 |
8 | She Changes Your Mind Copeland | 46 | 50 | 16 | 63 | 44 | 44 |
Search results for "she charles aznavour"
original index | candidate (title + artist) | weighted | difflib | spotify rank | partial token sort ratio | token set ratio | token sort ratio |
---|---|---|---|---|---|---|---|
0 | She (Tous Les Visages De L’Amour) Cha... | 96 | 100 | 100 | 90 | 100 | 41 |
1 | She Charles Aznavour Bryan Ferry | 95 | 100 | 80 | 70 | 100 | 77 |
2 | She - Tous les visages de l’amour Cha... | 93 | 100 | 64 | 90 | 100 | 41 |
5 | She Charles Aznavour Herbert Kretzmer... | 92 | 100 | 32 | 92 | 100 | 62 |
8 | She Engelbert Humperdinck Charles Azn... | 91 | 100 | 16 | 95 | 100 | 65 |
6 | She - She / English Version 1 - Theme... | 89 | 100 | 26 | 90 | 100 | 37 |
4 | Emmenez-moi Charles Aznavour | 68 | 66 | 40 | 95 | 89 | 75 |
7 | Venecia sin ti - Que c'est triste Ven... | 65 | 66 | 20 | 90 | 89 | 51 |
9 | Mes emmerdes - Remastered 2014 Charle... | 65 | 66 | 13 | 90 | 89 | 58 |
3 | She Dennis van Aarssen | 38 | 33 | 51 | 53 | 48 | 48 |
Search results for "run away with me"
original index | candidate (title + artist) | weighted | difflib | spotify rank | partial token sort ratio | token set ratio | token sort ratio |
---|---|---|---|---|---|---|---|
0 | Run Away With Me Carly Rae Jepsen | 97 | 100 | 100 | 81 | 100 | 65 |
1 | Run Away With Me Carly Rae Jepsen | 95 | 100 | 80 | 81 | 100 | 65 |
2 | Run Away With Me Cold War Kids | 94 | 100 | 64 | 75 | 100 | 70 |
3 | Run away with me Mingginyu | 93 | 100 | 51 | 74 | 100 | 76 |
4 | Run Away With Me Ben Fankhauser | 92 | 100 | 40 | 86 | 100 | 68 |
6 | Run Away with Me Michael Arden | 91 | 100 | 26 | 79 | 100 | 70 |
8 | Run Away With Me Paradise Blossom | 90 | 100 | 16 | 77 | 100 | 65 |
9 | Run Away with Me - Live Aaron Tveit | 89 | 100 | 13 | 69 | 100 | 65 |
5 | Run Away with You Big & Rich | 72 | 75 | 32 | 62 | 90 | 67 |
7 | Run Away With You Logan Crosby | 71 | 75 | 20 | 69 | 90 | 61 |
Search results for "run away with me carly rae jepsen"
original index | candidate (title + artist) | weighted | difflib | spotify rank | partial token sort ratio | token set ratio | token sort ratio |
---|---|---|---|---|---|---|---|
0 | Run Away With Me Carly Rae Jepsen | 100 | 100 | 100 | 100 | 100 | 100 |
1 | Run Away With Me Carly Rae Jepsen | 98 | 100 | 80 | 100 | 100 | 100 |
2 | Run Away With Me - ASTR Remix Carly R... | 95 | 100 | 64 | 82 | 100 | 86 |
4 | Run Away With Me - Y2K Remix Carly Ra... | 94 | 100 | 40 | 88 | 100 | 87 |
9 | Run Away With Me Carly Rae Jepsen | 93 | 100 | 13 | 100 | 100 | 100 |
6 | Run Away With Me - Patrick Stump Remi... | 91 | 100 | 26 | 81 | 100 | 66 |
8 | Run Away With Me - Cyril Hahn Remix C... | 91 | 100 | 16 | 74 | 100 | 80 |
3 | Kollage Carly Rae Jepsen | 50 | 42 | 51 | 75 | 80 | 63 |
5 | Kamikaze Carly Rae Jepsen | 49 | 42 | 32 | 78 | 78 | 66 |
7 | The Loneliest Time (feat. Rufus Wainw... | 44 | 42 | 20 | 61 | 65 | 49 |
Search results for "the dress"
original index | candidate (title + artist) | weighted | difflib | spotify rank | partial token sort ratio | token set ratio | token sort ratio |
---|---|---|---|---|---|---|---|
0 | The Dress Dijon | 98 | 100 | 100 | 100 | 100 | 75 |
1 | The Dress Looks Nice on You Sufjan St... | 93 | 100 | 80 | 80 | 100 | 35 |
3 | The Dress 猫 シ Corp. luxury elite | 91 | 100 | 51 | 78 | 100 | 45 |
5 | The Dress The Dudes | 91 | 100 | 32 | 80 | 100 | 64 |
7 | The Dress Alan Menken | 90 | 100 | 20 | 78 | 100 | 60 |
9 | The Dress Blonde Redhead | 89 | 100 | 13 | 78 | 100 | 55 |
8 | The Dress That My Mother Wore - Demo ... | 87 | 100 | 16 | 67 | 100 | 22 |
2 | Girl Anachronism The Dresden Dolls | 51 | 50 | 64 | 67 | 50 | 37 |
6 | Dress Taylor Swift | 51 | 50 | 26 | 80 | 71 | 52 |
4 | Skin Dijon | 8 | 0 | 40 | 27 | 21 | 21 |
Search results for "the dress dijon"
original index | candidate (title + artist) | weighted | difflib | spotify rank | partial token sort ratio | token set ratio | token sort ratio |
---|---|---|---|---|---|---|---|
0 | The Dress Dijon | 100 | 100 | 100 | 100 | 100 | 100 |
6 | The Dress Alan Menken | 63 | 66 | 26 | 64 | 75 | 61 |
9 | The Dress Blonde Redhead | 63 | 66 | 13 | 67 | 75 | 62 |
8 | The Dress The Dudes | 62 | 66 | 16 | 60 | 75 | 53 |
4 | jesse Dijon | 43 | 33 | 40 | 82 | 77 | 77 |
1 | Dijon Antoine Stavelot | 41 | 33 | 80 | 62 | 54 | 49 |
2 | Dijon Angel 'Pocho' Gatti Orchestra | 39 | 33 | 64 | 47 | 50 | 46 |
3 | Dress Down Kaoru Akimoto | 39 | 33 | 51 | 67 | 51 | 51 |
5 | Magic Loop DJDS Dijon | 37 | 33 | 32 | 69 | 50 | 50 |
7 | Spanish Ladies The Dreadnoughts | 35 | 33 | 20 | 55 | 48 | 48 |