We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
比如“哈尔滨市” 用Complex和Simple模式都会只会分出来“哈尔滨市” 而不能分出来“哈尔滨市”和“哈尔滨”,用MaxWord分出来了“哈”,“尔”,“滨”,“市”,这个要怎么解决呢?感谢作者。
The text was updated successfully, but these errors were encountered:
然后用户搜索“哈尔滨”的时候 里面有“哈尔滨市”的文章就不会出现在结果里面
Sorry, something went wrong.
@liuxm6 花了点时间研究词库加载逻辑,发现当自定义的dicPath没有正确被load时(或者没有定义dicPath),会加载默认的3个词库:chars.dic、units.dic、words.dic,路径位于mmseg4j-core-1.10.2.jar文件内:mmseg4j-core-1.10.2.jar!\data*。而words.dic已经定义了"哈尔滨市"(35887行)是一个完整词汇,且35884行定义了“哈尔滨”也是一个完整词汇,所以出现你提的问题。
解决:download mmseg4j-core源码,删除35887行,用Complex和Simple模式可以正确分词了,自行install释出jar包。
为方便调试,我提供一个完成修改的jar包,链接
No branches or pull requests
比如“哈尔滨市” 用Complex和Simple模式都会只会分出来“哈尔滨市” 而不能分出来“哈尔滨市”和“哈尔滨”,用MaxWord分出来了“哈”,“尔”,“滨”,“市”,这个要怎么解决呢?感谢作者。
The text was updated successfully, but these errors were encountered: