es版本6.2.4
启动的时候显示自定义分词配置加载成功了。分词文件也是utf8没有BOM
比如自定义分词里添加"Dj小光"
post
{
"text": "Dj小光全中文国粤语逼格Club音乐精选突然Dj阿龙",
"tokenizer": "ik_smart"
}
结果:
"tokens": [
{
"token": "dj",
"start_offset": 0,
"end_offset": 2,
"type": "ENGLISH",
"position": 0
}
,
{
"token": "小光",
"start_offset": 2,
"end_offset": 4,
"type": "CN_WORD",
"position": 1
}
,
{
"token": "全中文",
"start_offset": 4,
"end_offset": 7,
"type": "CN_WORD",
"position": 2
}
,
{
"token": "国",
"start_offset": 7,
"end_offset": 8,
"type": "CN_CHAR",
"position": 3
}
,
{
"token": "粤语",
"start_offset": 8,
"end_offset": 10,
"type": "CN_WORD",
"position": 4
}
,
{
"token": "逼",
"start_offset": 10,
"end_offset": 11,
"type": "CN_CHAR",
"position": 5
}
,
{
"token": "格",
"start_offset": 11,
"end_offset": 12,
"type": "CN_CHAR",
"position": 6
}
自定义分词没有生效,请问什么原因?还是我使用的有问题?
启动的时候显示自定义分词配置加载成功了。分词文件也是utf8没有BOM
比如自定义分词里添加"Dj小光"
post
{
"text": "Dj小光全中文国粤语逼格Club音乐精选突然Dj阿龙",
"tokenizer": "ik_smart"
}
结果:
"tokens": [
{
"token": "dj",
"start_offset": 0,
"end_offset": 2,
"type": "ENGLISH",
"position": 0
}
,
{
"token": "小光",
"start_offset": 2,
"end_offset": 4,
"type": "CN_WORD",
"position": 1
}
,
{
"token": "全中文",
"start_offset": 4,
"end_offset": 7,
"type": "CN_WORD",
"position": 2
}
,
{
"token": "国",
"start_offset": 7,
"end_offset": 8,
"type": "CN_CHAR",
"position": 3
}
,
{
"token": "粤语",
"start_offset": 8,
"end_offset": 10,
"type": "CN_WORD",
"position": 4
}
,
{
"token": "逼",
"start_offset": 10,
"end_offset": 11,
"type": "CN_CHAR",
"position": 5
}
,
{
"token": "格",
"start_offset": 11,
"end_offset": 12,
"type": "CN_CHAR",
"position": 6
}
自定义分词没有生效,请问什么原因?还是我使用的有问题?
0 个回复