"name": "阿尔托莉雅・潘德拉贡",
"class": "Saber",
"description": "不列颠传说中的王,被称为骑士王。阿尔托莉雅是幼名,从成为王的那一天起就被称为亚瑟王。在那个骑士道如花般凋零的时代,用手中的圣剑为不列颠带来了短暂的和平与最后的繁荣。虽然史实上是男性,但在这个世界似乎是男装的丽人。"
"name": "吉尔伽美什",
"class": "Archer",
"description": "公元以前统治着苏美尔的都市国家乌鲁克的半神半人的王者。不仅仅是传说而是真实存在的人物,记述于人类最古的叙事诗《吉尔伽美什叙事诗》中的王。"

我采用了ES自己的那个smartcn analyzer插件。我的搜索词是“不列颠传说中的王”,分词器分出来是“不列颠/传说/中/的/王”。
“吉尔伽美什”的score: 2.447756
“阿尔托莉雅・潘德拉贡”的score: 1.9885943
补充: 这是对阿尔托莉雅这条数据做explain请求后的结果:
"_index": "fgo",
"_type": "servant",
"_id": "1",
"matched": true,
"explanation": {
"value": 1.9885945,
"description": "sum of:",
"details": [
"value": 1.9885945,
"description": "sum of:",
"details": [
"value": 0.39341488,
"description": "weight(_all:不列颠 in 0) [PerFieldSimilarity], result of:",
"details": [
"value": 0.39341488,
"description": "score(doc=0,freq=2.0 = termFreq=2.0\n), product of:",
"details": [
"value": 0.2876821,
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
"value": 1,
"description": "docFreq",
"details": []
"value": 1,
"description": "docCount",
"details": []
"value": 1.3675334,
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
"value": 2,
"description": "termFreq=2.0",
"details": []
"value": 1.2,
"description": "parameter k1",
"details": []
"value": 0.75,
"description": "parameter b",
"details": []
"value": 82,
"description": "avgFieldLength",
"details": []
"value": 83.591835,
"description": "fieldLength",
"details": []
"value": 0.28541544,
"description": "weight(_all:传说 in 0) [PerFieldSimilarity], result of:",
"details": [
"value": 0.28541544,
"description": "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
"value": 0.2876821,
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
"value": 1,
"description": "docFreq",
"details": []
"value": 1,
"description": "docCount",
"details": []
"value": 0.992121,
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
"value": 1,
"description": "termFreq=1.0",
"details": []
"value": 1.2,
"description": "parameter k1",
"details": []
"value": 0.75,
"description": "parameter b",
"details": []
"value": 82,
"description": "avgFieldLength",
"details": []
"value": 83.591835,
"description": "fieldLength",
"details": []
"value": 0.28541544,
"description": "weight(_all:中 in 0) [PerFieldSimilarity], result of:",
"details": [
"value": 0.28541544,
"description": "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
"value": 0.2876821,
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
"value": 1,
"description": "docFreq",
"details": []
"value": 1,
"description": "docCount",
"details": []
"value": 0.992121,
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
"value": 1,
"description": "termFreq=1.0",
"details": []
"value": 1.2,
"description": "parameter k1",
"details": []
"value": 0.75,
"description": "parameter b",
"details": []
"value": 82,
"description": "avgFieldLength",
"details": []
"value": 83.591835,
"description": "fieldLength",
"details": []
"value": 0.53913236,
"description": "weight(_all:的 in 0) [PerFieldSimilarity], result of:",
"details": [
"value": 0.53913236,
"description": "score(doc=0,freq=7.0 = termFreq=7.0\n), product of:",
"details": [
"value": 0.2876821,
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
"value": 1,
"description": "docFreq",
"details": []
"value": 1,
"description": "docCount",
"details": []
"value": 1.874056,
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
"value": 7,
"description": "termFreq=7.0",
"details": []
"value": 1.2,
"description": "parameter k1",
"details": []
"value": 0.75,
"description": "parameter b",
"details": []
"value": 82,
"description": "avgFieldLength",
"details": []
"value": 83.591835,
"description": "fieldLength",
"details": []
"value": 0.48521632,
"description": "weight(_all:王 in 0) [PerFieldSimilarity], result of:",
"details": [
"value": 0.48521632,
"description": "score(doc=0,freq=4.0 = termFreq=4.0\n), product of:",
"details": [
"value": 0.2876821,
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
"value": 1,
"description": "docFreq",
"details": []
"value": 1,
"description": "docCount",
"details": []
"value": 1.6866407,
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
"value": 4,
"description": "termFreq=4.0",
"details": []
"value": 1.2,
"description": "parameter k1",
"details": []
"value": 0.75,
"description": "parameter b",
"details": []
"value": 82,
"description": "avgFieldLength",
"details": []
"value": 83.591835,
"description": "fieldLength",
"details": []
"value": 0,
"description": "match on required clause, product of:",
"details": [
"value": 0,
"description": "# clause",
"details": []
"value": 1,
"description": "*:*, product of:",
"details": [
"value": 1,
"description": "boost",
"details": []
"value": 1,
"description": "queryNorm",
"details": []

"description": "weight(_all:不列颠 in 0) [PerFieldSimilarity], result of:",

但为什么是 in 0呢?明明我原文description里面有不列颠啊。

你query 加上 explain 查看一下就知道如何打分的,现在默认是 bm25 评分模型。


The problem lies within the distributed score calculation.

You create a new index with default settings, that is, 5 shards. Each shard is its own Lucene index. When you index your data, Elasticsearch needs to decide to which shard the document should go and it does so by hashing on the _id (in absence of the routing parameter).

So, by shifting the IDs, you eventually distributed the documents to different shards. As written above, each shard is its own Lucene index and when you search across multiple shards, you have to combine the different scores of each separate shard and due to the different routing, the individual scores are different.

You can verify this by adding explain to your query. For Sand Roger, the idf is calculated as idf(docFreq=1, maxDocs=1) = 0.30685282 and idf(docFreq=1, maxDocs=2) = 1respectively, which yields the different results.

You can change either the shard size to 1 or the query type to a dfs type. Searching against http://localhost:9200/test/use ... fetch will give you correct scores, because of its

http://stackoverflow.com/quest ... ethod
