例如:我在漫步人生,分词为:
{
"tokens": [
{
"token": "w",
"start_offset": 0,
"end_offset": 1,
"type": "word",
"position": 0
},
{
"token": "wo",
"start_offset": 0,
"end_offset": 1,
"type": "word",
"position": 1
},
{
"token": "z",
"start_offset": 1,
"end_offset": 2,
"type": "word",
"position": 2
},
{
"token": "zai",
"start_offset": 1,
"end_offset": 2,
"type": "word",
"position": 3
},
{
"token": "m",
"start_offset": 2,
"end_offset": 3,
"type": "word",
"position": 4
},
{
"token": "man",
"start_offset": 2,
"end_offset": 3,
"type": "word",
"position": 5
},
{
"token": "b",
"start_offset": 3,
"end_offset": 4,
"type": "word",
"position": 6
},
{
"token": "bu",
"start_offset": 3,
"end_offset": 4,
"type": "word",
"position": 7
},
{
"token": "r",
"start_offset": 4,
"end_offset": 5,
"type": "word",
"position": 8
},
{
"token": "ren",
"start_offset": 4,
"end_offset": 5,
"type": "word",
"position": 9
},
{
"token": "s",
"start_offset": 5,
"end_offset": 6,
"type": "word",
"position": 10
},
{
"token": "sheng",
"start_offset": 5,
"end_offset": 6,
"type": "word",
"position": 11
},
{
"token": "wozaimanburensheng",
"start_offset": 0,
"end_offset": 18,
"type": "word",
"position": 12
},
{
"token": "wzmbrs",
"start_offset": 0,
"end_offset": 6,
"type": "word",
"position": 13
}
]
}
问题1:
如果我进行短语搜索不能匹配 “wozaiman” ,需要写成:“wwozzaimman”。
问题2:
理想中的位置:
w,z,,m应该是0,1,2
wo,zai,man也是0,1,2
wozaimanbu是0,
wzmb是0.
能解析一下为什么不是这样吗,谢谢大神。
我不希望把wozaiman 索引成 w,z,m,我希望保留原来的,wo,zai,man,这样不会丢失精度。同时我也希望搜搜wozmbu也能匹配,如果对应的term的位置一致,应该就可以实现吧
{
"tokens": [
{
"token": "w",
"start_offset": 0,
"end_offset": 1,
"type": "word",
"position": 0
},
{
"token": "wo",
"start_offset": 0,
"end_offset": 1,
"type": "word",
"position": 1
},
{
"token": "z",
"start_offset": 1,
"end_offset": 2,
"type": "word",
"position": 2
},
{
"token": "zai",
"start_offset": 1,
"end_offset": 2,
"type": "word",
"position": 3
},
{
"token": "m",
"start_offset": 2,
"end_offset": 3,
"type": "word",
"position": 4
},
{
"token": "man",
"start_offset": 2,
"end_offset": 3,
"type": "word",
"position": 5
},
{
"token": "b",
"start_offset": 3,
"end_offset": 4,
"type": "word",
"position": 6
},
{
"token": "bu",
"start_offset": 3,
"end_offset": 4,
"type": "word",
"position": 7
},
{
"token": "r",
"start_offset": 4,
"end_offset": 5,
"type": "word",
"position": 8
},
{
"token": "ren",
"start_offset": 4,
"end_offset": 5,
"type": "word",
"position": 9
},
{
"token": "s",
"start_offset": 5,
"end_offset": 6,
"type": "word",
"position": 10
},
{
"token": "sheng",
"start_offset": 5,
"end_offset": 6,
"type": "word",
"position": 11
},
{
"token": "wozaimanburensheng",
"start_offset": 0,
"end_offset": 18,
"type": "word",
"position": 12
},
{
"token": "wzmbrs",
"start_offset": 0,
"end_offset": 6,
"type": "word",
"position": 13
}
]
}
问题1:
如果我进行短语搜索不能匹配 “wozaiman” ,需要写成:“wwozzaimman”。
问题2:
理想中的位置:
w,z,,m应该是0,1,2
wo,zai,man也是0,1,2
wozaimanbu是0,
wzmb是0.
能解析一下为什么不是这样吗,谢谢大神。
我不希望把wozaiman 索引成 w,z,m,我希望保留原来的,wo,zai,man,这样不会丢失精度。同时我也希望搜搜wozmbu也能匹配,如果对应的term的位置一致,应该就可以实现吧
3 个回复
kepmoving - 90后
赞同来自:
我不希望把wozaiman 索引成 w,z,m,我希望保留原来的,wo,zai,man,这样不会丢失精度。同时我也希望搜搜wozmbu也能匹配,如果对应的term的位置一致,应该就可以实现吧
分词的时候不切分出首字母,那你这个wozmbu怎么匹配?
kennywu76 - Wood
赞同来自:
wmj
赞同来自: