-
-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"custom_street" synonyms should be applied to name.* and phrase.* indices #449
Conversation
Oh nice, yeah, this duplication has always been annoying for us, and has lead to much confusion for Pelias users. |
I'm happy to merge this now, but the thing is that the @Joxit 👍 or 👎 ? |
Yep, I'm ok with this 😄 this will help me for future builds ! |
Do we need this PR after #453? I don't think so, but just wanted to confirm. Feel free to close if it's no longer needed. |
oh... actually... the multiplexer still doesn't list |
I'd actually have preferred to delete I didn't drop support for I'm not going to do any further work on this topic but I'd be happy to review and merge a PR from a developer who is actively using custom synonyms and can test it out on their setup. |
…nd phrase.* indices
d4fceaf
to
f14061f
Compare
This issue came up via a client today, I've modernized the code as per #449 (comment) and rebased it against In addition to including I'd like to solicit some feedback before merging this, I suspect it will not have any issues besides the pervasive 'viral synonyms' issue related to multi-word synonyms that we inherited from Lucene and have always had to deal with. A good example of where this is relevant is "MLK" vs "Martin Luther King" vs "Martin Luther King Jr" which is a common synonym in the United States but also unique enough in a global context to possibly not include in the core Pelias synonyms files. |
@Joxit this got put on hold when I was doing the recent synonyms work but is open for testing and merging now. It's slightly different what what you tested last time because I've rebased it against master to bring in those changes. We don't use custom synonyms so any feedback you could provide would be helpful to confirm it's good to go. I was hoping that it would be able to replace Joxit@514fb02 |
In fact, since #453 includes all the synonyms I need, I am no longer using my fork (I did a full build yesterday). I might need the custom street again someday 🤷♂️ |
I confirmed this works as expected by adding the following line to
And then running an elyzer --index pelias --analyzer peliasPhrase 'MLK Boulevard'
CHAR_FILTER: punctuation
{0:MLK Boulevard}
CHAR_FILTER: nfkc_normalizer
{0:MLK Boulevard}
TOKENIZER: peliasTokenizer
{0:MLK} {1:Boulevard}
TOKEN_FILTER: lowercase
{0:mlk} {1:boulevard}
TOKEN_FILTER: trim
{0:mlk} {1:boulevard}
TOKEN_FILTER: remove_duplicate_spaces
{0:mlk} {1:boulevard}
TOKEN_FILTER: synonyms/custom_name/multiword
{0:mlk} {1:boulevard}
TOKEN_FILTER: synonyms/custom_street/multiword
{0:mlk,martin,ml,m} {1:boulevard,luther,king,l} {2:king,king}
TOKEN_FILTER: synonyms/custom_admin/multiword
{0:mlk,martin,ml,m} {1:boulevard,luther,king,l} {2:king,king}
TOKEN_FILTER: name_synonyms_multiplexer
{0:mlk,martin,ml,mall,mill,m} {1:boulevard,bd,blvd,bde,blv,bl,blvde,blvrd,boulavard,boul,boulv,bvd,boulevarde,bld,luther,king,kg,l} {2:king,kg}
TOKEN_FILTER: icu_folding
{0:mlk,martin,ml,mall,mill,m} {1:boulevard,bd,blvd,bde,blv,bl,blvde,blvrd,boulavard,boul,boulv,bvd,boulevarde,bld,luther,king,kg,l} {2:king,kg}
TOKEN_FILTER: remove_ordinals
{0:mlk,martin,ml,mall,mill,m} {1:boulevard,bd,blvd,bde,blv,bl,blvde,blvrd,boulavard,boul,boulv,bvd,boulevarde,bld,luther,king,kg,l} {2:king,kg}
TOKEN_FILTER: unique_only_same_position
{0:mlk,martin,ml,mall,mill,m} {1:boulevard,bd,blvd,bde,blv,bl,blvde,blvrd,boulavard,boul,boulv,bvd,boulevarde,bld,luther,king,kg,l} {2:king,kg}
TOKEN_FILTER: notnull
{0:mlk,martin,ml,mall,mill,m} {1:boulevard,bd,blvd,bde,blv,bl,blvde,blvrd,boulavard,boul,boulv,bvd,boulevarde,bld,luther,king,kg,l} {2:king,kg}
TOKEN_FILTER: flatten_graph
{0:mlk,martin,ml,mall,mill,m} {1:boulevard,bd,blvd,bde,blv,bl,blvde,blvrd,boulavard,boul,boulv,bvd,boulevarde,bld,luther,king,kg,l} {2:king,kg} |
The above also highlights the issue with multi-word synonyms where |
The inverse is true for when the data provider gives us elyzer --index pelias --analyzer peliasPhrase 'Martin Luther King Boulevard'
CHAR_FILTER: punctuation
{0:Martin Luther King Boulevard}
CHAR_FILTER: nfkc_normalizer
{0:Martin Luther King Boulevard}
TOKENIZER: peliasTokenizer
{0:Martin} {1:Luther} {2:King} {3:Boulevard}
TOKEN_FILTER: lowercase
{0:martin} {1:luther} {2:king} {3:boulevard}
TOKEN_FILTER: trim
{0:martin} {1:luther} {2:king} {3:boulevard}
TOKEN_FILTER: remove_duplicate_spaces
{0:martin} {1:luther} {2:king} {3:boulevard}
TOKEN_FILTER: synonyms/custom_name/multiword
{0:martin} {1:luther} {2:king} {3:boulevard}
TOKEN_FILTER: synonyms/custom_street/multiword
{0:martin,ml,m,mlk} {1:luther,king,l} {2:king,king} {3:boulevard}
TOKEN_FILTER: synonyms/custom_admin/multiword
{0:martin,ml,m,mlk} {1:luther,king,l} {2:king,king} {3:boulevard}
TOKEN_FILTER: name_synonyms_multiplexer
{0:martin,ml,mall,mill,m,mlk} {1:luther,king,kg,l} {2:king,kg} {3:boulevard,bd,blvd,bde,blv,bl,blvde,blvrd,boulavard,boul,boulv,bvd,boulevarde,bld}
TOKEN_FILTER: icu_folding
{0:martin,ml,mall,mill,m,mlk} {1:luther,king,kg,l} {2:king,kg} {3:boulevard,bd,blvd,bde,blv,bl,blvde,blvrd,boulavard,boul,boulv,bvd,boulevarde,bld}
TOKEN_FILTER: remove_ordinals
{0:martin,ml,mall,mill,m,mlk} {1:luther,king,kg,l} {2:king,kg} {3:boulevard,bd,blvd,bde,blv,bl,blvde,blvrd,boulavard,boul,boulv,bvd,boulevarde,bld}
TOKEN_FILTER: unique_only_same_position
{0:martin,ml,mall,mill,m,mlk} {1:luther,king,kg,l} {2:king,kg} {3:boulevard,bd,blvd,bde,blv,bl,blvde,blvrd,boulavard,boul,boulv,bvd,boulevarde,bld}
TOKEN_FILTER: notnull
{0:martin,ml,mall,mill,m,mlk} {1:luther,king,kg,l} {2:king,kg} {3:boulevard,bd,blvd,bde,blv,bl,blvde,blvrd,boulavard,boul,boulv,bvd,boulevarde,bld}
TOKEN_FILTER: flatten_graph
{0:martin,ml,mall,mill,m,mlk} {1:luther,king,kg,l} {2:king,kg} {3:boulevard,bd,blvd,bde,blv,bl,blvde,blvrd,boulavard,boul,boulv,bvd,boulevarde,bld} In this case |
As mentioned in #446 (comment)
If developers extend the synonyms lists by editing
synonyms/custom_street.txt
the synonyms are only being applied to theaddress_parts.street
field and not toname.*
&phrase.*
indices.The problem with this is that we use the
name.*
&phrase.*
indices as top-levelMUST
conditions in most of our queries so adding custom street synonyms has very little effect.The current workaround is for developer to manually add the synonyms filter to the indices as per:
Joxit@514fb02
I think this was an oversight at the time and worth fixing.