Skip to content

Commit

Permalink
Update AnalyzeContext.java
Browse files Browse the repository at this point in the history
使用ik_smart切分 金力泰合同审批 切分的结果是(金  力  泰  合同  审批)但是使用ik_max_word切分结果是(金  力  泰合  合同  审批 批),这样就存在搜索(金力泰  金力泰合同审批) 搜索不到的情况,查看源码发现泰未在字典中,泰合  合同在字典中,导致smart切分消歧的时候按照逆向概率高的规则忽略了泰合,输出结果泰就单独切分了,可以在输出结果时判断下 字典中无单字,但是词元冲突了,切分出相交词元的前一个词元中的单字,这样就能解决这个问题
  • Loading branch information
kepmov authored Nov 21, 2018
1 parent 9495315 commit 9873489
Showing 1 changed file with 9 additions and 0 deletions.
9 changes: 9 additions & 0 deletions src/main/java/org/wltea/analyzer/core/AnalyzeContext.java
Original file line number Diff line number Diff line change
Expand Up @@ -267,6 +267,15 @@ void outputToResult(){
Lexeme l = path.pollFirst();
while(l != null){
this.results.add(l);
//字典中无单字,但是词元冲突了,切分出相交词元的前一个词元中的单字
int innerIndex = index + 1;
for (; innerIndex < index + l.getLength(); innerIndex++) {
Lexeme innerL = path.peekFirst();
if (innerL != null && innerIndex == innerL.getBegin()) {
this.outputSingleCJK(innerIndex - 1);
}
}

//将index移至lexeme后
index = l.getBegin() + l.getLength();
l = path.pollFirst();
Expand Down

0 comments on commit 9873489

Please sign in to comment.