使用 man ascii 来查看 ASCII 表。

6.2.2版本的同义词加上去后高亮有混乱

Elasticsearch | 作者 hezhiqiang | 发布于2020年02月06日 | 阅读数:1788

同义词是这样的:英文绘本,原版绘本,英语绘本,英文图画书

我搜索英文绘本,结果高亮是这样的:2-8岁英文原版绘本大推荐
高亮的内容为:英文原版绘本 推荐

索引的配置是这样的:
PUT _template/template_default
{
"index_patterns": ["*"],
"order" : 0,
"settings": {
"index": {
"analysis": {
"analyzer": {
"ik_syno": {
"filter": ["my_synonym_filter"],
"tokenizer": "ik_max_word",
"type": "custom"
},
"ik_syno_smart": {
"filter": ["my_synonym_filter"],
"tokenizer": "ik_smart",
"type": "custom"
}
},
"filter": {
"my_synonym_filter": {
"synonyms_path": "synonyms.txt",
"type": "synonym"
}
}
}
}
}
}

分词测试是这样的:
POST /viw_experience/_analyze
{
"analyzer": "ik_syno",
"text": "2-8岁英文原版绘本大推荐"
}
分词结果:
{
"tokens": [
{
"token": "2-8",
"start_offset": 0,
"end_offset": 3,
"type": "LETTER",
"position": 0
},
{
"token": "8岁",
"start_offset": 2,
"end_offset": 4,
"type": "CN_WORD",
"position": 1
},
{
"token": "英文原版绘本",
"start_offset": 4,
"end_offset": 10,
"type": "CN_WORD",
"position": 2
},
{
"token": "英文原版",
"start_offset": 4,
"end_offset": 8,
"type": "CN_WORD",
"position": 3
},
{
"token": "英文",
"start_offset": 4,
"end_offset": 6,
"type": "CN_WORD",
"position": 4
},
{
"token": "英语",
"start_offset": 4,
"end_offset": 6,
"type": "SYNONYM",
"position": 4
},
{
"token": "原版绘本",
"start_offset": 6,
"end_offset": 10,
"type": "CN_WORD",
"position": 5
},
{
"token": "英文绘本",
"start_offset": 6,
"end_offset": 10,
"type": "SYNONYM",
"position": 5
},
{
"token": "英语绘本",
"start_offset": 6,
"end_offset": 10,
"type": "SYNONYM",
"position": 5
},
{
"token": "英文",
"start_offset": 6,
"end_offset": 10,
"type": "SYNONYM",
"position": 5
},
{
"token": "原版",
"start_offset": 6,
"end_offset": 8,
"type": "CN_WORD",
"position": 6
},
{
"token": "英文绘",
"start_offset": 6,
"end_offset": 8,
"type": "SYNONYM",
"position": 6
},
{
"token": "英语",
"start_offset": 6,
"end_offset": 8,
"type": "SYNONYM",
"position": 6
},
{
"token": "图画书",
"start_offset": 6,
"end_offset": 8,
"type": "SYNONYM",
"position": 6
},
{
"token": "绘本",
"start_offset": 8,
"end_offset": 10,
"type": "CN_WORD",
"position": 7
},
{
"token": "英文",
"start_offset": 8,
"end_offset": 10,
"type": "SYNONYM",
"position": 7
},
{
"token": "绘本",
"start_offset": 8,
"end_offset": 10,
"type": "SYNONYM",
"position": 7
},
{
"token": "图画",
"start_offset": 8,
"end_offset": 10,
"type": "SYNONYM",
"position": 7
},
{
"token": "推荐",
"start_offset": 11,
"end_offset": 13,
"type": "CN_WORD",
"position": 8
},
{
"token": "绘本",
"start_offset": 11,
"end_offset": 13,
"type": "SYNONYM",
"position": 8
}
]
}
最后的这个分词是不对的:
    {
"token": "绘本",
"start_offset": 11,
"end_offset": 13,
"type": "SYNONYM",
"position": 8
}
已邀请:

hezhiqiang

赞同来自:

解决了,改用synonym_graph就好了~

要回复问题请先登录注册