分词匹对，分值问题，如何提高精确度

Elasticsearch | 作者 zp9755 | 发布于2021年12月14日 | 阅读数：1111

使用ik_smart，用matchQuery查询盛和小区2-201，得到结果如下
{
"score": 12.894995,
"createTime": "2021-12-13T08:11:01.625Z",
"xqmc": "寺巷镇集镇2－201号",
"pageSize": 10,
"sqbm": "321203006003",
"pageNum": 1,
"addressId": "257546"
},
{
"score": 5.7983465,
"createTime": "2021-12-13T08:10:22.658Z",
"xqmc": "凤凰东路99号盛和花园33幢",
"pageSize": 10,
"sqbm": "321203004012",
"pageNum": 1,
"addressId": "144808"
}
如何做能将第二个结果放在前面？

2 个回复

hapjin

ES 作为召回，ES的打分作为粗排分数。将召回结果返回给 rank 服务，在 rank 里面实现各种排序干预逻辑。:)

xiaowuge - 90后

索引创建方法（一个分片）
PUT /test_score
{
"settings" : {
"number_of_shards" : 1
},
"mappings": {
"_doc": {
"properties": {
"xqmc":{
"type":"text","analyzer":"ik_smart"
}
}
}
}
}

用ik_smart解析词条：
GET /test_score/_analyze
{
"analyzer": "ik_smart"
, "text": ["寺巷镇集镇2－201"]
}

{
"tokens": [
{
"token": "寺",
"start_offset": 0,
"end_offset": 1,
"type": "CN_CHAR",
"position": 0
},
{
"token": "巷",
"start_offset": 1,
"end_offset": 2,
"type": "CN_CHAR",
"position": 1
},
{
"token": "镇",
"start_offset": 2,
"end_offset": 3,
"type": "CN_CHAR",
"position": 2
},
{
"token": "集镇",
"start_offset": 3,
"end_offset": 5,
"type": "CN_WORD",
"position": 3
},
{
"token": "2-201",
"start_offset": 5,
"end_offset": 10,
"type": "LETTER",
"position": 4
}
]
}

用ik_smart解析词条二：
GET /test_score/_analyze
{
"analyzer": "ik_smart"
, "text": ["凤凰东路99号盛和花园33幢"]
}

{
"tokens": [
{
"token": "凤凰",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 0
},
{
"token": "东路",
"start_offset": 2,
"end_offset": 4,
"type": "CN_WORD",
"position": 1
},
{
"token": "99号",
"start_offset": 4,
"end_offset": 7,
"type": "TYPE_CQUAN",
"position": 2
},
{
"token": "盛",
"start_offset": 7,
"end_offset": 8,
"type": "CN_CHAR",
"position": 3
},
{
"token": "和",
"start_offset": 8,
"end_offset": 9,
"type": "CN_CHAR",
"position": 4
},
{
"token": "花园",
"start_offset": 9,
"end_offset": 11,
"type": "CN_WORD",
"position": 5
},
{
"token": "33幢",
"start_offset": 11,
"end_offset": 14,
"type": "TYPE_CQUAN",
"position": 6
}
]
}

用ik_smart解析查询词条三：
GET /test_score/_analyze
{
"analyzer": "ik_smart"
, "text": ["盛和小区2-201"]
}

{
"tokens": [
{
"token": "盛",
"start_offset": 0,
"end_offset": 1,
"type": "CN_CHAR",
"position": 0
},
{
"token": "和",
"start_offset": 1,
"end_offset": 2,
"type": "CN_CHAR",
"position": 1
},
{
"token": "小区",
"start_offset": 2,
"end_offset": 4,
"type": "CN_WORD",
"position": 2
},
{
"token": "2-201",
"start_offset": 4,
"end_offset": 9,
"type": "LETTER",
"position": 3
}
]
}

从解析结果中可以看出，二和三会有两个词条匹配上："token": "盛" "token": "和"，而一和三只有"token": "2-201" 匹配上；为此二的分数肯定高！

最后直接查询：
GET /test_score/_search
{
"query": {
"match": {
"xqmc": "盛和小区2-201"
}
}
}

结果为：
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1.3440006,
"hits": [
{
"_index": "test_score",
"_type": "_doc",
"_id": "iXzdyn0BRxwSc_uunfyp",
"_score": 1.3440006,
"_source": {
"xqmc": "凤凰东路99号盛和花园33幢"
}
},
{
"_index": "test_score",
"_type": "_doc",
"_id": "iHzdyn0BRxwSc_uukvym",
"_score": 0.7156682,
"_source": {
"xqmc": "寺巷镇集镇2－201号"
}
}
]
}
}

所以，初步猜测你可能是分片原因导致排序问题，你可以在查询语句后面加上 "explain": true，分析一下打分，类似这样：
GET /test_score/_search
{
"query": {
"match": {
"xqmc": "盛和小区2-201"
}
}
, "explain": true
}

要回复问题请先登录或注册

分词匹对，分值问题，如何提高精确度

2 个回复

发起人

活动推荐

相关问题

问题状态

分词匹对，分值问题，如何提高精确度

与内容相关的链接

2 个回复

发起人

活动推荐

相关问题

问题状态