You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
There are some Chinese characters that are not detected when searching for titles.
there are kanji characters in Japanese, in this case, the Chinese is “少女终末旅行” and the Japanese is “少女終末旅行”, there is only one word “終”, and there are many other examples like this, although they don't come up very often in practice, if you want to find examples, for example, “伤物语” and “伤物語” (語).
There are very many similar examples
千年女優and千年女优,未来日記and未来日记,魔女の宅急便and魔女的宅急便,狼と香辛料and狼与香辛料
these seem to be searching fine.:
电波女与青春男and電波女と青春男, 血界戦線and血界战线, 虚構推理and虚构推理
Or狼与香辛料and狼と香辛料 where there is only a difference of one Japanese character
In case of chinese tokenization there is a need to set a field level indicator for typesense that the field contains chinese characters. The problem at hand could be solved if we do language detection, and if the title contains chinese characters, it would go in a new typesense field, which would have the "ch" flag to tell typesense that it should use chinese tokenization, fixing this issue afaik.
So we can try to fix it on our end, but I have no idea if this would work :worrythink: https://threads.typesense.org/2L156
As reported here on Discord: https://discord.com/channels/460491088004907029/732509662280417360/1286222187304456256
Summary:
There are some Chinese characters that are not detected when searching for titles.
Noted by kimigaiiwuyi on discord
Noted by @pushrbx
This is something that would require further triage. If anyone knows the best way to handle this, please feel to contribute.
The text was updated successfully, but these errors were encountered: