前端执行sql:select count(distinct external_id) from HUMAN_INFO where facelib_id=2 ,底层走的es查询语句如下:
有没有办法提高es在聚合查询精度,如果在数量大的情况进行高精度聚合查询是不是就不能采用es了,只能采用mpp数据库了?
{
"query": {
"bool": {
"must": {
"term": {
"facelib_id": 2
}
}
}
},
"size": 0,
"aggs": {
"distinct": {
"cardinality": {
"field": "external_id"
}
}
}
}
{
"took": 384,
"timed_out": false,
"_shards": {
"total": 1109,
"successful": 1109,
"failed": 0
},
"hits": {
"total": 39999915,
"max_score": 0,
"hits": []
},
"aggregations": {
"distinct": {
"value": 40044804
}
}
}
响应结果精确度有问题,去重的总数比hits数组的查询总量还多,我了解es聚合不支持分页的原因,就是每个分片排序列表不同,但是我这里仅仅执行的是去重操作,也没有执行分页操作有没有办法提高es在聚合查询精度,如果在数量大的情况进行高精度聚合查询是不是就不能采用es了,只能采用mpp数据库了?
3 个回复
jianjianhe
赞同来自: novia
The precision_threshold options allows to trade memory for accuracy, and defines a unique count below which counts are expected to be close to accurate. Above this value, counts might become a bit more fuzzy. The maximum supported value is 40000, thresholds above this number will have the same effect as a threshold of 40000. The default value is 3000.
即这个api是存在一定误差的,后续将查询语句添加precision_threshold参数,还是没能解决,估计是数据太大了,后续研究研究是否有好的解决办法:
Elastic_Cdw
赞同来自: jiangjiang2
laoyang360 - 《一本书讲透Elasticsearch》作者,Elastic认证工程师 [死磕Elasitcsearch]知识星球地址:http://t.cn/RmwM3N9;微信公众号:铭毅天下; 博客:https://elastic.blog.csdn.net
赞同来自: