Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new analyzer and tokenizer ik_max_word_char #854

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

hamo
Copy link

@hamo hamo commented Jan 18, 2021

添加一个新的分词策略 max_word_char,在这个分词策略下,单字也会单独分成一个词,解决分词以后单字召回的问题

@frsweety
Copy link

好,正想有一个这样的功能

@hamo
Copy link
Author

hamo commented Jan 18, 2021

{
"tokens" : [
{
"token" : "中华人民共和国",
"start_offset" : 0,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "中华人民",
"start_offset" : 0,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "中华",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "中",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_CHAR",
"position" : 3
},
{
"token" : "华人",
"start_offset" : 1,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 4
},
{
"token" : "华",
"start_offset" : 1,
"end_offset" : 2,
"type" : "CN_CHAR",
"position" : 5
},
{
"token" : "人民共和国",
"start_offset" : 2,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 6
},
{
"token" : "人民",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 7
},
{
"token" : "人",
"start_offset" : 2,
"end_offset" : 3,
"type" : "CN_CHAR",
"position" : 8
},
{
"token" : "民",
"start_offset" : 3,
"end_offset" : 4,
"type" : "CN_CHAR",
"position" : 9
},
{
"token" : "共和国",
"start_offset" : 4,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 10
},
{
"token" : "共和",
"start_offset" : 4,
"end_offset" : 6,
"type" : "CN_WORD",
"position" : 11
},
{
"token" : "共",
"start_offset" : 4,
"end_offset" : 5,
"type" : "CN_CHAR",
"position" : 12
},
{
"token" : "和",
"start_offset" : 5,
"end_offset" : 6,
"type" : "CN_CHAR",
"position" : 13
},
{
"token" : "国",
"start_offset" : 6,
"end_offset" : 7,
"type" : "CN_CHAR",
"position" : 14
}
]
}

@frsweety
Copy link

这个分支在哪里可以下载

@hamo
Copy link
Author

hamo commented Jan 19, 2021

这个分支在哪里可以下载

https://github.com/hamo/elasticsearch-analysis-ik/tree/ik_max_word_char

@frsweety
Copy link

这个分支在哪里可以下载

https://github.com/hamo/elasticsearch-analysis-ik/tree/ik_max_word_char
谢谢

@frsweety
Copy link

frsweety commented Feb 3, 2021

这个分支在哪里可以下载

https://github.com/hamo/elasticsearch-analysis-ik/tree/ik_max_word_char
谢谢

大佬,请问您有没有编译好的7.10.1版本的提供下载,我是搞.NET的,不是很方便生成。如果您方便的话,可不可以提供一份7.10.1版本的给我

@hamo
Copy link
Author

hamo commented Feb 3, 2021

这个分支在哪里可以下载

https://github.com/hamo/elasticsearch-analysis-ik/tree/ik_max_word_char
谢谢

大佬,请问您有没有编译好的7.10.1版本的提供下载,我是搞.NET的,不是很方便生成。如果您方便的话,可不可以提供一份7.10.1版本的给我

留邮箱,我发给你

@frsweety
Copy link

frsweety commented Feb 4, 2021

这个分支在哪里可以下载

https://github.com/hamo/elasticsearch-analysis-ik/tree/ik_max_word_char
谢谢

大佬,请问您有没有编译好的7.10.1版本的提供下载,我是搞.NET的,不是很方便生成。如果您方便的话,可不可以提供一份7.10.1版本的给我

留邮箱,我发给你

[email protected] 谢谢了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants