Skip to content

Commit

Permalink
rephrase one paragraph based on feedback, minor formatting changes
Browse files Browse the repository at this point in the history
Signed-off-by: wrigleyDan <[email protected]>
  • Loading branch information
wrigleyDan committed Dec 17, 2024
1 parent 2f9f403 commit 62a5ee7
Showing 1 changed file with 7 additions and 8 deletions.
15 changes: 7 additions & 8 deletions _posts/2024-12-xx-hybrid-search-optimization.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,19 +116,18 @@ These are the results running the test set of both query sets independently:
| NDCG@10 | 0.24 | 0.23 |
| Precision@10 | 0.27 | 0.24 |

We applied an 80/20 split on the query sets to have a training and test dataset for the upcoming optimization steps. For the baseline we used the test set to calculate the search metrics. Every optimization step uses the 80% training part of the query and the 20% test part for calculating and comparing the search metrics.
We applied an 80/20 split on the query sets to arrange for a training and test dataset. Every optimization step uses the queries of the training set whereas search metrics are calculated and compared for the test set. For the baseline, we calculated the metrics for the test set only since there is no actual training going on.

These numbers are now the starting point for our optimization journey. We want to maximize these metrics and see how far we get when looking for the best global hybrid search configuration in the next step.

## Identifying the best hybrid search configuration

With that starting point we can set off to explore the parameter space that hybrid search offers us. Our global hybrid search optimization notebook tries out 66 parameter combinations for hybrid search with the following set:

* Normalization technique: [l2, min_max]
* Combination technique: [arithmetic_mean, harmonic_mean, geometric_mean]
* Keyword search weight: [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
* Neural search weight: [1.0, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 0.0]

* Normalization technique: [`l2`, `min_max`]
* Combination technique: [`arithmetic_mean`, `harmonic_mean`, `geometric_mean`]
* Keyword search weight: [`0.0`, `0.1`, `0.2`, `0.3`, `0.4`, `0.5`, `0.6`, `0.7`, `0.8`, `0.9`, `1.0`]
* Neural search weight: [`1.0`, `0.9`, `0.8`, `0.7`, `0.6`, `0.5`, `0.4`, `0.3`, `0.2`, `0.1`, `0.0`]

Neural and keyword search weights always add up to 1.0, so a keyword search weight of 0.1 automatically comes with a neural search weight of 0.9, a keyword search weight of 0.2 comes with a neural search weight of 0.8, etc.

Check warning on line 132 in _posts/2024-12-xx-hybrid-search-optimization.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.LatinismsElimination] Using 'etc.' is unnecessary. Remove. Raw Output: {"message": "[OpenSearch.LatinismsElimination] Using 'etc.' is unnecessary. Remove.", "location": {"path": "_posts/2024-12-xx-hybrid-search-optimization.md", "range": {"start": {"line": 132, "column": 220}}}, "severity": "WARNING"}

Expand Down Expand Up @@ -172,9 +171,9 @@ Here is a template of the temporary search pipelines we use for our hybrid searc
}
```

norm is the variable for the normalization technique, combi the variable for the combination technique, keywordness is the keyword search weight and neuralness is the neural search weight.
`norm` is the variable for the normalization technique, `combi` the variable for the combination technique, `keywordness` is the keyword search weight and `neuralness` is the neural search weight.

The neural part of the hybrid query is searching in a field with embeddings that were created based on the title of a product with the model all-MiniLM-L6-v2:
The neural part of the hybrid query is searching in a field with embeddings that were created based on the title of a product with the model `all-MiniLM-L6-v2`:

```
{
Expand Down

0 comments on commit 62a5ee7

Please sign in to comment.