愚者求师之过,智者从师之长。

IK分词自定义分词中英文混合时不生效

Elasticsearch | 作者 tj646 | 发布于2019年07月05日 | 阅读数:3803

es版本6.2.4
启动的时候显示自定义分词配置加载成功了。分词文件也是utf8没有BOM
比如自定义分词里添加"Dj小光"

post
{
"text": "Dj小光全中文国粤语逼格Club音乐精选突然Dj阿龙",
"tokenizer": "ik_smart"
}

结果:

"tokens": [
{
"token": "dj",
"start_offset": 0,
"end_offset": 2,
"type": "ENGLISH",
"position": 0
}
,
{
"token": "小光",
"start_offset": 2,
"end_offset": 4,
"type": "CN_WORD",
"position": 1
}
,
{
"token": "全中文",
"start_offset": 4,
"end_offset": 7,
"type": "CN_WORD",
"position": 2
}
,
{
"token": "国",
"start_offset": 7,
"end_offset": 8,
"type": "CN_CHAR",
"position": 3
}
,
{
"token": "粤语",
"start_offset": 8,
"end_offset": 10,
"type": "CN_WORD",
"position": 4
}
,
{
"token": "逼",
"start_offset": 10,
"end_offset": 11,
"type": "CN_CHAR",
"position": 5
}
,
{
"token": "格",
"start_offset": 11,
"end_offset": 12,
"type": "CN_CHAR",
"position": 6
}

自定义分词没有生效,请问什么原因?还是我使用的有问题?
已邀请:

要回复问题请先登录注册