es聚合统计数据不准确问题，如何解决？

Elasticsearch | 作者 jianjianhe | 发布于2018年06月07日 | 阅读数：24993

前端执行sql：select count(distinct external_id) from HUMAN_INFO where facelib_id=2 ，底层走的es查询语句如下：

{

  "query": {

    "bool": {

      "must": {

        "term": {

          "facelib_id": 2

        }

      }

    }

  },

  "size": 0,

  "aggs": {

    "distinct": {

      "cardinality": {

        "field": "external_id"

      }

    }

  }

}

{

	"took": 384,

	"timed_out": false,

	"_shards": {

		"total": 1109,

		"successful": 1109,

		"failed": 0

	},

	"hits": {

		"total": 39999915,

		"max_score": 0,

		"hits": []

	},

	"aggregations": {

		"distinct": {

			"value": 40044804

		}

	}

}

响应结果精确度有问题，去重的总数比hits数组的查询总量还多，我了解es聚合不支持分页的原因，就是每个分片排序列表不同，但是我这里仅仅执行的是去重操作，也没有执行分页操作
有没有办法提高es在聚合查询精度，如果在数量大的情况进行高精度聚合查询是不是就不能采用es了，只能采用mpp数据库了？

3 个回复

jianjianhe

赞同来自: novia

这里，我又重新阅读官方文档，发现es的cardinality，有一个准确值的问题：precision_threshold，这里贴出官网的原话：
The precision_threshold options allows to trade memory for accuracy, and defines a unique count below which counts are expected to be close to accurate. Above this value, counts might become a bit more fuzzy. The maximum supported value is 40000, thresholds above this number will have the same effect as a threshold of 40000. The default value is 3000.
即这个api是存在一定误差的，后续将查询语句添加precision_threshold参数，还是没能解决，估计是数据太大了，后续研究研究是否有好的解决办法：

{

  "query": {

    "bool": {

      "must": {

        "term": {

          "facelib_id": 2

        }

      }

    }

  },

  "size": 0,

  "aggs": {

    "distinct": {

      "cardinality": {

        "field": "external_id",

        "precision_threshold": 40000

      }

    }

  }

}

Elastic_Cdw

赞同来自: jiangjiang2

请问兄台这个问题现在是否有解决方案，我遇见同样问题，配置了"precision_threshold": 40000，数据总条数9853，按订单号去重后实际结果应是9815，es返回结果是9816

laoyang360 - 《一本书讲透Elasticsearch》作者，Elastic认证工程师 [死磕Elasitcsearch]知识星球地址：http://t.cn/RmwM3N9；微信公众号：铭毅天下; 博客：https://elastic.blog.csdn.net

聚合本来就是不精确的，shard_size调大试试。

要回复问题请先登录或注册

es聚合统计数据不准确问题，如何解决？

3 个回复

发起人

活动推荐

相关问题

问题状态

es聚合统计数据不准确问题，如何解决？

与内容相关的链接

3 个回复

发起人

活动推荐

相关问题

问题状态