Skip to content

Commit

Permalink
Sort candidate items before getting similarity scores for consistent …
Browse files Browse the repository at this point in the history
…results

You'll notice some of the snapshots here have changed. The fuzzy match scores for these items have not changed. What changed was the order of candidate items passed into the similarity scorer once group by metrics were added to the list of candidate items. The change in order for these snapshots is due to ties in score that are returned in the order they came in. Sort inputs to ensure consistent results in the future.
  • Loading branch information
courtneyholcomb committed May 13, 2024
1 parent c838681 commit 223b0c1
Show file tree
Hide file tree
Showing 5 changed files with 9 additions and 5 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,10 @@ def top_fuzzy_matches(
Return scores from -1 -> 0 inclusive.
"""
# In the case of a tie in score, items will be returned in the order they were passed in.
# Sort candidate item inputs first for consistent results.
sorted_candidate_items = sorted(candidate_items)

scored_items = []

# Rank choices by edit distance score.
Expand All @@ -31,7 +35,7 @@ def top_fuzzy_matches(
rapidfuzz.process.extract(
# This scorer seems to return the best results.
item,
list(candidate_items),
sorted_candidate_items,
limit=max_matches,
scorer=rapidfuzz.fuzz.token_set_ratio,
),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@ Error #1:
[
"Dimension('listing__capacity_latest')",
"TimeDimension('listing__created_at', 'day')",
"TimeDimension('listing__ds', 'day')",
"Dimension('listing__is_lux_latest')",
"TimeDimension('listing__ds', 'day')",
"TimeDimension('user__created_at', 'day')",
"TimeDimension('user__ds_latest', 'day')",
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@ Error #1:
[
"Dimension('listing__capacity_latest')",
"TimeDimension('listing__created_at', 'day')",
"TimeDimension('listing__ds', 'day')",
"Dimension('listing__is_lux_latest')",
"TimeDimension('listing__ds', 'day')",
"Dimension('listing__country_latest')",
"TimeDimension('user__created_at', 'day')",
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Error #1:
The given input does not exactly match any known metrics.

Suggestions:
['bookings', 'booking_fees', 'booking_value', 'instant_bookings', 'booking_payments', 'max_booking_value']
['bookings', 'booking_fees', 'booking_value', 'booking_payments', 'instant_bookings', 'booking_value_p99']

Query Input:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Error #1:
'listing__lux_listing',
'listing__is_lux_latest',
'listing__country_latest',
'listing__created_at__day',
'listing__capacity_latest',
]

Query Input:
Expand Down

0 comments on commit 223b0c1

Please sign in to comment.