类似status这种固定值字段的查询过滤优化问题

Elasticsearch | 作者 poettian | 发布于2020年01月03日 | 阅读数：2261

Es版本：7.4.2

索引mapping中字段 status: byte

status 字段值只有 1，2，3，4 这四个值，其中值为3的最多，占到文档总数的75%

当执行一个bool查询，类似：

{

    "query": {

        "bool": {

          "should": [

            {

              "term": {

                "teacher_uid": "2576312"

              }

            },

            {

              "term": {

                "assistant_uid": "2576312"

              }

            },

            {

              "term": {

                "student_uid": "2576312"

              }

            }

          ],

          "minimum_should_match": 1,

          "filter": [

            {

              "term":{

                "status":3

              }

            }

          ]

        }

    }

 }

通过 profile 分析，发现在查询 status:3 上耗时较多：

"took" : 593



{

    "type" : "PointRangeQuery",

    "description" : "status:[3 TO 3]",

    "time_in_nanos" : 520765186,

    "breakdown" : {

      "set_min_competitive_score_count" : 0,

      "match_count" : 0,

      "shallow_advance_count" : 0,

      "set_min_competitive_score" : 0,

      "next_doc" : 0,

      "match" : 0,

      "next_doc_count" : 0,

      "score_count" : 0,

      "compute_max_score_count" : 0,

      "compute_max_score" : 0,

      "advance" : 2728,

      "advance_count" : 19,

      "score" : 0,

      "build_scorer_count" : 55,

      "create_weight" : 131,

      "shallow_advance" : 0,

      "create_weight_count" : 1,

      "build_scorer" : 520762252

    }

  }

但是查询时又必须加上找个字段执行过滤，没有想到好的方法去优化。网上搜索了一下资料，理解为是对 status：3 对应的倒排链求交集，那这样的话岂不是无解？

5 个回复

xiaoyanghapi - Elasticsearch 爱好者

status和_uid等字段可以修改为keyword试下。性能会有一定提升

Charele - Cisco4321

我也感觉用keyword可能会快一点。
只是可能，实际上说不清。

拿int类型来举例，Lucene底层实际上没有int类型，实际上它都是转成BytesRef这个类型来存的。
(你可以看下IntPoint这个类就会明白）

当然，全搞成keyword的也不是办法，有些程序写起来有点麻烦
比如拿数字来说，10排在9后面，
但从字符串来看，"10"排在“9”的前面。