ik分词问题

Elasticsearch | 作者 pengwei | 发布于2017年05月05日 | 阅读数：3420

在使用elasticsearch过程中发现中文分词有时候不正确，

如：对“莫西”“莫西林” 分词结果分别为

“莫西”分词结果：
{
"tokens": [
{
"token": "莫西",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 0
},
{
"token": "莫",
"start_offset": 0,
"end_offset": 1,
"type": "CN_WORD",
"position": 1
},
{
"token": "西",
"start_offset": 1,
"end_offset": 2,
"type": "CN_CHAR",
"position": 2
}
]
}

“莫西林”分词结果：
{
"tokens": [
{
"token": "莫西",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 0
},
{
"token": "莫",
"start_offset": 0,
"end_offset": 1,
"type": "CN_WORD",
"position": 1
},
{
"token": "西林",
"start_offset": 1,
"end_offset": 3,
"type": "CN_WORD",
"position": 2
}
]
}

为什么“莫西林”的分词结果没完全包含“莫西”的分词结果？

试了“中华人民共和国”"中华人民"是可以的

@medcl

0 个回复

要回复问题请先登录或注册

ik分词问题

0 个回复

发起人

活动推荐

相关问题

问题状态

ik分词问题

与内容相关的链接

0 个回复

发起人

活动推荐

相关问题

问题状态