无论才能、知识多么卓著,如果缺乏热情,则无异纸上画饼充饥,无补于事。

拼音搜索,设置了简拼analyzer,看分词没问题,查询部分词查不到,不知道什么原因?

Elasticsearch | 作者 tygcs | 发布于2018年11月21日 | 阅读数:3371

拼音analysis如下(只截取了需要的部分):
"filter": {               
"pinyin_simple_filter":{
"type" : "pinyin",
"keep_first_letter":true,
"keep_separate_first_letter" : false,
"keep_full_pinyin" : false,
"keep_original" : false,
"limit_first_letter_length" : 50,
"lowercase" : true
},
"pinyin_full_filter":{
"type" : "pinyin",
"keep_first_letter":false,
"keep_separate_first_letter" : false,
"keep_full_pinyin" : true,
"none_chinese_pinyin_tokenize":true,
"keep_original" : false,
"limit_first_letter_length" : 50,
"lowercase" : true
}
},
"analyzer": {
"pinyiSimpleSearchAnalyzer":{
"tokenizer" : "ik_max_word",
"filter": ["pinyin_simple_filter", "lowercase"]
},
"pinyiFullSearchAnalyzer":{
"tokenizer" : "ik_max_word",
"filter": ["pinyin_full_filter", "lowercase"]
}
}

 
docName字段定义如下:
 
"docName" : {
"type": "text",
"analyzer": "k_analyzer",
"search_analyzer": "k2_analyzer",
"fields": {
"f_pinyin":{
"type": "text",
"analyzer": "pinyiFullSearchAnalyzer",
"search_analyzer": "pinyiFullSearchAnalyzer"
},
"s_pinyin":{
"type": "text",
"analyzer": "pinyiSimpleSearchAnalyzer",
"search_analyzer": "pinyiSimpleSearchAnalyzer"
},
"std":{
"type": "text",
"analyzer": "std_analyzer",
"search_analyzer": "std2_analyzer"
}
}
},

分词结果如下:
curl -H "Content-Type: application/json" -XGET 'http://localhost:9200/pinyin_index/_analyze?pretty=true' -d '{
"analyzer":"pinyiSimpleSearchAnalyzer",
"text":"大黑牛2018套餐"
}'


{
"tokens" : [
{
"token" : "d",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_CHAR",
"position" : 0
},
{
"token" : "h",
"start_offset" : 1,
"end_offset" : 2,
"type" : "CN_CHAR",
"position" : 1
},
{
"token" : "n",
"start_offset" : 2,
"end_offset" : 3,
"type" : "CN_CHAR",
"position" : 2
},
{
"token" : "2018",
"start_offset" : 3,
"end_offset" : 7,
"type" : "ARABIC",
"position" : 3
},
{
"token" : "tc",
"start_offset" : 7,
"end_offset" : 9,
"type" : "CN_WORD",
"position" : 4
},
{
"token" : "t",
"start_offset" : 7,
"end_offset" : 8,
"type" : "COUNT",
"position" : 5
},
{
"token" : "c",
"start_offset" : 8,
"end_offset" : 9,
"type" : "CN_CHAR",
"position" : 6
}
]
}

构建查询如下:
  curl  -H "Content-Type: application/json" -XGET 'localhost:9200/pinyin_index/_search?pretty' -d '
{
"query": {
"query_string": {
"query": "tc",
"default_field": "docName.s_pinyin",
"default_operator":"AND"
}
},
"size": 100,
"highlight": {
"pre_tags": ["<h1>"],
"post_tags": ["</h1>"],
"fields": {
"docName.s_pinyin": {}
}
}
}'

结果中没有上面分词的那条数据“大黑牛2018套餐”,但通过搜索"taocan"或者“套餐”都是可以搜到的。
搜索"dhn"、“2018”都能搜到,但“tc”就不行。。没有配置过各类词表。

想知道为啥搜“tc”就出不来结果,各位大佬可以帮忙看看问题出在哪吗?
 
 
 
 
已邀请:

medcl - 今晚打老虎。

赞同来自:

把 tc 做为数据测试一下分词结果看看。

要回复问题请先登录注册