文章 - 搜索客，搜索人自己的社区

社区日报第749期 (2019-10-12）

2.Elasticsearch各版本升级核心内容必看 http://t.cn/AiupttNg

3.基于ELK的数据分析平台 http://t.cn/RQv63PX

继续阅读 »

社区日报第748期 (2019-10-11)

1、Elasticsearch中的身份验证和授权使用解读
https://tinyurl.com/y235bygv
2、kafka连接Elasticsearch7.X
https://tinyurl.com/y56v2h26
3、ElasticSearch的JAVA API使用教程
https://tinyurl.com/y2s4d3d6

编辑：铭毅天下
归档：https://ela.st/cn-daily-all
订阅：https://ela.st/cn-daily-sub
沙龙：https://ela.st/cn-meetup

继续阅读 »

社区日报第747期 (2019-10-10)

1、Elasticsearch 性能调优，让你的集群飞起来
http://tinyurl.com/y44asjr9
2、对 Golang 代码调用 Elasticsearch 进行单元测试
http://tinyurl.com/y4zvkt4r
3、Kibana如何制作出好看酷炫的图表
http://tinyurl.com/yxgajzkz

编辑：江水
归档：https://ela.st/cn-daily-all
订阅：https://ela.st/cn-daily-sub
沙龙：https://ela.st/cn-meetup

继续阅读 »

前言

我们在使用聚合时总是有各种各样的聚合需求，其中一个比较常用的就是根据聚合的结果过滤聚合的桶，例如：1、每个IP登录次数超过5次的IP；2、每个IP登录人数超过2的IP。还有我之前的一个案例，访问量超过1000的人数，这些都是很常见的统计需求。

案例需求

我们在使用聚合计算的时候一般都有两类，一种是计算文档的数量，另一种是计算文档内字段的值的数量（去重计算）或者值的数学计算。两种聚合计算在过滤的时候采用不同的方法来计算。

我们使用以下案例来说明两种过滤的不同：用户每次登录都会记录一个登录记录:

{"userID":"a","IP":"10.70.25.1","time":"2019-10-10 12:12:12.222"}

然后提出以下两个需求： 1、每个IP登录次数超过5次的IP； 2、每个IP登录人数超过2的IP。

实现

每个IP登录次数超过5次的IP

这个是对登录记录个数的桶聚合统计，然后过滤。使用IP做term聚合，就可以得出每个IP的登录次数，然后term聚合中有一个参数min_doc_count这个字段就可以对文档数量进行过滤，具体的语句如下： 查询语句

{
  "aggs": {
    "IP": {
      "terms": {
        "field": "IP",
        "size": 3000,
        "order": {
          "_count": "desc"
        },
        "min_doc_count": 5
      }
    }
  },
  "size": 0
}

结果

{
  "took" : 614,
  "timed_out" : false,
  "num_reduce_phases" : 3,
  "_shards" : {
    "total" : 1105,
    "successful" : 1105,
    "skipped" : 75,
    "failed" : 0
  },
  "hits" : {
    "total" : 2826,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "IP" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "10.25.90.139",
          "doc_count" : 61
        },
        {
          "key" : "10.25.78.146",
          "doc_count" : 45
        },
        {
          "key" : "10.25.94.22",
          "doc_count" : 21
        },
        {
          "key" : "10.25.75.52",
          "doc_count" : 18
        },
        {
          "key" : "10.25.89.32",
          "doc_count" : 13
        },
        {
          "key" : "10.25.93.243",
          "doc_count" : 10
        },
        {
          "key" : "10.25.78.189",
          "doc_count" : 9
        },
        {
          "key" : "10.25.90.82",
          "doc_count" : 8
        },
        {
          "key" : "10.25.91.240",
          "doc_count" : 8
        },
        {
          "key" : "10.25.90.57",
          "doc_count" : 7
        },
        {
          "key" : "10.25.91.251",
          "doc_count" : 7
        },
        {
          "key" : "10.25.95.166",
          "doc_count" : 6
        },
        {
          "key" : "10.25.89.33",
          "doc_count" : 5
        },
        {
          "key" : "10.25.90.88",
          "doc_count" : 5
        },
        {
          "key" : "10.25.92.53",
          "doc_count" : 5
        }
      ]
    }
  }
}

每个IP登录人数超过2的IP

这个是对登录记录用户ID的去重数聚合，然后过滤。对用户ID进行去重可以使用Cardinality Aggregation聚合，然后再使用Bucket Selector Aggregation聚合过滤器过滤数据。具体内容如下： 查询语句

{
  "aggs": {
    "IP": {
      "terms": {
        "field": "IP",
        "size": 3000,
        "order": {
          "distinct": "desc"
        },
        "min_doc_count": 5
      },
      "aggs": {
        "distinct": {
          "cardinality": {
            "field": "IP.keyword"
          }
        },
        "dd":{
          "bucket_selector": {
            "buckets_path": {"userCount":"distinct"},
            "script": "params.userCount > 2"
          }
        }
      }
    }
  },
  "size": 0
}

结果

{
  "took" : 317,
  "timed_out" : false,
  "num_reduce_phases" : 3,
  "_shards" : {
    "total" : 1105,
    "successful" : 1105,
    "skipped" : 75,
    "failed" : 0
  },
  "hits" : {
    "total" : 2826,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "IP" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "10.25.75.52",
          "doc_count" : 18,
          "distinct" : {
            "value" : 4
          }
        },
        {
          "key" : "10.25.78.146",
          "doc_count" : 45,
          "distinct" : {
            "value" : 3
          }
        },
        {
          "key" : "10.25.90.139",
          "doc_count" : 61,
          "distinct" : {
            "value" : 3
          }
        },
        {
          "key" : "10.25.91.240",
          "doc_count" : 8,
          "distinct" : {
            "value" : 3
          }
        },
        {
          "key" : "10.25.94.22",
          "doc_count" : 21,
          "distinct" : {
            "value" : 3
          }
        }
      ]
    }
  }
}

桶聚合选择器： https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-aggregations-pipeline-bucket-selector-aggregation.html

继续阅读 »

前言

我们在使用聚合时总是有各种各样的聚合需求，其中一个比较常用的就是根据聚合的结果过滤聚合的桶，例如：1、每个IP登录次数超过5次的IP；2、每个IP登录人数超过2的IP。还有我之前的一个案例，访问量超过1000的人数，这些都是很常见的统计需求。

案例需求

我们在使用聚合计算的时候一般都有两类，一种是计算文档的数量，另一种是计算文档内字段的值的数量（去重计算）或者值的数学计算。两种聚合计算在过滤的时候采用不同的方法来计算。

我们使用以下案例来说明两种过滤的不同：用户每次登录都会记录一个登录记录:

{"userID":"a","IP":"10.70.25.1","time":"2019-10-10 12:12:12.222"}

然后提出以下两个需求： 1、每个IP登录次数超过5次的IP； 2、每个IP登录人数超过2的IP。

实现

每个IP登录次数超过5次的IP

这个是对登录记录个数的桶聚合统计，然后过滤。使用IP做term聚合，就可以得出每个IP的登录次数，然后term聚合中有一个参数min_doc_count这个字段就可以对文档数量进行过滤，具体的语句如下： 查询语句

{
  "aggs": {
    "IP": {
      "terms": {
        "field": "IP",
        "size": 3000,
        "order": {
          "_count": "desc"
        },
        "min_doc_count": 5
      }
    }
  },
  "size": 0
}

结果

{
  "took" : 614,
  "timed_out" : false,
  "num_reduce_phases" : 3,
  "_shards" : {
    "total" : 1105,
    "successful" : 1105,
    "skipped" : 75,
    "failed" : 0
  },
  "hits" : {
    "total" : 2826,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "IP" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "10.25.90.139",
          "doc_count" : 61
        },
        {
          "key" : "10.25.78.146",
          "doc_count" : 45
        },
        {
          "key" : "10.25.94.22",
          "doc_count" : 21
        },
        {
          "key" : "10.25.75.52",
          "doc_count" : 18
        },
        {
          "key" : "10.25.89.32",
          "doc_count" : 13
        },
        {
          "key" : "10.25.93.243",
          "doc_count" : 10
        },
        {
          "key" : "10.25.78.189",
          "doc_count" : 9
        },
        {
          "key" : "10.25.90.82",
          "doc_count" : 8
        },
        {
          "key" : "10.25.91.240",
          "doc_count" : 8
        },
        {
          "key" : "10.25.90.57",
          "doc_count" : 7
        },
        {
          "key" : "10.25.91.251",
          "doc_count" : 7
        },
        {
          "key" : "10.25.95.166",
          "doc_count" : 6
        },
        {
          "key" : "10.25.89.33",
          "doc_count" : 5
        },
        {
          "key" : "10.25.90.88",
          "doc_count" : 5
        },
        {
          "key" : "10.25.92.53",
          "doc_count" : 5
        }
      ]
    }
  }
}

每个IP登录人数超过2的IP

这个是对登录记录用户ID的去重数聚合，然后过滤。对用户ID进行去重可以使用Cardinality Aggregation聚合，然后再使用Bucket Selector Aggregation聚合过滤器过滤数据。具体内容如下： 查询语句

{
  "aggs": {
    "IP": {
      "terms": {
        "field": "IP",
        "size": 3000,
        "order": {
          "distinct": "desc"
        },
        "min_doc_count": 5
      },
      "aggs": {
        "distinct": {
          "cardinality": {
            "field": "IP.keyword"
          }
        },
        "dd":{
          "bucket_selector": {
            "buckets_path": {"userCount":"distinct"},
            "script": "params.userCount > 2"
          }
        }
      }
    }
  },
  "size": 0
}

结果

{
  "took" : 317,
  "timed_out" : false,
  "num_reduce_phases" : 3,
  "_shards" : {
    "total" : 1105,
    "successful" : 1105,
    "skipped" : 75,
    "failed" : 0
  },
  "hits" : {
    "total" : 2826,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "IP" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "10.25.75.52",
          "doc_count" : 18,
          "distinct" : {
            "value" : 4
          }
        },
        {
          "key" : "10.25.78.146",
          "doc_count" : 45,
          "distinct" : {
            "value" : 3
          }
        },
        {
          "key" : "10.25.90.139",
          "doc_count" : 61,
          "distinct" : {
            "value" : 3
          }
        },
        {
          "key" : "10.25.91.240",
          "doc_count" : 8,
          "distinct" : {
            "value" : 3
          }
        },
        {
          "key" : "10.25.94.22",
          "doc_count" : 21,
          "distinct" : {
            "value" : 3
          }
        }
      ]
    }
  }
}

桶聚合选择器： https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-aggregations-pipeline-bucket-selector-aggregation.html

收起阅读 »

社区日报第746期 (2019-10-09)

1.通过某瓣真实案例看Elasticsearch优化
http://tinyurl.com/y6l7qm6d
2.使用 Elastic Beats 搜集日志到 Pulsa
http://tinyurl.com/y2vm89vu
3.Filebeat的Registry文件越来越大？
http://tinyurl.com/y37pst27

编辑：金桥

Elastic中文社区民意调查，期待您的参与！
http://tinyurl.com/y35mtwes
归档：https://ela.st/cn-daily-all
订阅：https://ela.st/cn-daily-sub
沙龙：https://ela.st/cn-meetup

继续阅读 »

社区日报第745期 (2019-10-08)

1、使用CloudWatch metricset管理AWS服务。
http://tinyurl.com/y527yjdm
2、在Elastic SIEM中集成地图。
http://tinyurl.com/y5zluwz9
3、利用Open Distro中的Performance Analyzer和PerfTop进行轻量级调试。
http://tinyurl.com/yxpuo2nt

Elastic中文社区民意调查，期待您的参与！
https://www.wjx.cn/m/46684393. ... d%3D0

编辑：叮咚光军

归档：https://ela.st/cn-daily-all
订阅：https://ela.st/cn-daily-sub
沙龙：https://ela.st/cn-meetup

继续阅读 »

1.基于spark集群的券商个性化推荐系统架构设计最佳实践;
http://1t.click/a2mv
2.图辅助的搜索;
http://1t.click/a2mw
3.k8s nginx ingress日志收集;
http://1t.click/a2mx

PS：举国欢庆，大家快乐；国庆后见~~
编辑：wt
归档：https://ela.st/cn-daily-all
订阅：https://ela.st/cn-daily-sub

继续阅读 »

社区日报第743期 (2019-09-29)

1.Logstash教程：快速入门指南。
https://tinyurl.com/y6gb3sen
2.X-Pack替代方案比较。
https://tinyurl.com/y8andhyf
3.(自备梯子)非洲正在建立一个不像硅谷的人工智能行业。
https://tinyurl.com/yxfwsk7v

线上活动：
1.10月16日，ELK初学者入门
http://1t.click/aywj
2.10月30日，Elastic SIEM实操网络研讨会
http://1t.click/ayxu

Elastic Meetingup：
1.10月25日，Elastic教育行业分享会
http://1t.click/aywm

编辑：至尊宝
归档：https://ela.st/cn-daily-all
订阅：https://ela.st/cn-daily-sub
沙龙：https://ela.st/cn-meetup

继续阅读 »

社区日报第742期 (2019-09-28）

1.ES写报错的解决方法 http://t.cn/Ain8Bv55 2.腾讯对ES进行写优化的步骤 http://t.cn/Ain8Bv5V 3.Elasticsearch X-Pack 系列之 Machine Learning 解析 http://t.cn/Ain8Bv5b

继续阅读 »

社区日报第741期 (2019-09-27)

1、开源：Elasticsearch Site&App Search PHP 客户端1.0发布
https://tinyurl.com/yxw9zoku
2、Elasticsearch实时数据监控实战
https://tinyurl.com/y2s2gk7q
3、视频：Elasticsearch集群性能的5个基础认知
https://tinyurl.com/y48vcr9a

编辑：铭毅天下
归档：https://ela.st/cn-daily-all
订阅：https://ela.st/cn-daily-sub
沙龙：https://ela.st/cn-meetup

继续阅读 »

社区日报第740期 (2019-09-26)

1、如何基于 Kibana 进行数据探索和机器学习实践
http://tinyurl.com/y2j3vthx
2、Kibana analyze api ui 升级到最新 7.3 版本
http://tinyurl.com/y3s64klj
3、Elastic Cloud On Kubernetes 介绍
http://tinyurl.com/y2sx2f4z

编辑：rockybean
归档：https://ela.st/cn-daily-all
订阅：https://ela.st/cn-daily-sub
沙龙：https://ela.st/cn-meetup

继续阅读 »

社区日报第739期 (2019-09-25)

1、基于 ElasticStack 实现 Mulesoft 的可观察性
https://tinyurl.com/yy7xwe3c
2、Elasticsearch 7.3 新功能 DataFrame 实战演练
https://tinyurl.com/yxqgxsdz
3、Elastic 在金融领域的发展和应用趋势探讨
https://tinyurl.com/y5fufsy7

编辑：rockybean
归档：https://ela.st/cn-daily-all
订阅：https://ela.st/cn-daily-sub
沙龙：https://ela.st/cn-meetup

继续阅读 »

社区日报第738期 (2019-09-24)

1、Elasticsearch高级调优方法论之——根治慢查询！
http://1t.click/ausC
2、Elasticsearch高并发写入优化的开源协同经历。
http://1t.click/ausD
3、Elasticsearch由浅入深（十一）内核原理。
http://1t.click/ausE

编辑：叮咚光军

归档：https://ela.st/cn-daily-all
订阅：https://ela.st/cn-daily-sub
沙龙：https://ela.st/cn-meetup

继续阅读 »

聚合元数据——对聚合结果进行打标签

背景

在我们的项目中需要对聚合后的结果进行二次的terms的聚合。实际需求就是有n个模型，需要统计每个模型在每段时间的调用次数，然后需要查询指定m个模型的调用总次数。我们要为每个模型建立一个索引，然后为每个模型查一次这段时间内的使用次数。

注：我们记录的值只能从结果中拿取。

实现过程

第一次设计

每个模型在第一次设计的时候是两个字段，【调用次数】、【错误次数】，使用以下语句：

{
  "aggs": {
    "error": {
      "filters": {
        "filters": {
          "error": {
            "query_string": {
              "query": "state:-1",
              "analyze_wildcard": true,
              "default_field": "*"
            }
          }
        }
      }
    }
  },
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "should": [
              {
                "match_phrase": {
                  "li": "D003" //说明这是一次模型调用
                }
              }
            ],
            "minimum_should_match": 1
          }
        },
        {
          "match_phrase": {
            "modelId..keyword": {
              "query": "modelId01"
            }
          }
        },
        {
          "range": {
            "x_st": {
              "gte": "now/h-1h-10s",
              "lte": "now/h-10s",
              "format": "epoch_millis"
            }
          }
        }
      ]
    }
  }
}

结果为：

{
  "took" : 215,
  "timed_out" : false,
  "_shards" : {
    "total" : 1095,
    "successful" : 1095,
    "skipped" : 1053,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "error" : {
      "buckets" : {
        "error" : {
          "doc_count" : 0
        }
      }
    }
  }
}

解析后保存：

{"@timestamp":"2019-09-23T12:23:23.333","count":0,"error":0}

这种情况可以统计每个模型在某段时间内的调用次数，使用索引名来区分每个model。但是在统计指定m个模型的时候就不行了，使用索引名来查询的时候由于是指定m个，前缀不能使用* 匹配，并且不能罗列所有m个索引来查询，就无法达到统计的效果。第一次设计因为不能在结果中记录模型ID导致不能统计指定的m个模型的数量，以失败告终。

第二次设计

既然需要在统计结果中记录模型ID，那就使用terms聚合来进行操作，先使用模型ID过滤一下数据，然后使用聚合唯一的模型ID，这样就有了模型ID。查询语句如下：

{
  "aggs": {
    "modelId": {
      "terms": {
        "field": "modelId.keyword",
        "size": 5,
        "order": {
          "_count": "desc"
        }
      }
    }
  },
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "match_phrase": {
            "modelId.keyword": {
              "query": "modelId01"
            }
          }
        },
        {
          "range": {
            "x_st": {
              "gte": "now/h-1h",
              "lte": "now/h",
              "format": "epoch_millis"
            }
          }
        },
        {
          "match_phrase": {
            "modelId.keyword": {
              "query": "modelId01"
            }
          }
        }
      ]
    }
  }
}

结果为：

{
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
    "total" : 140,
    "successful" : 140,
    "skipped" : 135,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "modelId" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ ]
    }
  }
}

在有模型调用的时候这个方法还好用，但是在无模型调用的时候这个返回结果就如上面的一样，是不包含任何信息的，错误次数的0和模型ID都没有了。第二次设计因为在无模型调用的时候导致模型ID不能记录，然后也是不能实现指定m个模型的查询次数统计，也以失败告终。

第三次设计

经过两次失败的实际案例，我发现现有的知识已经不能满足需求了，我需要新的方法，我需要一个能给查询结果添加字段的方法，所以我去查询官方文档，让我找到了这个方法聚合元数据也就是对聚合结果进行打标签。我使用第一次的设计方案，然后添加上对聚合结果打标签的方法，就可以记录一次统计值的模型ID了。查询语句如下：

{
  "aggs": {
    "error": {
      "filters": {
        "filters": {
          "error": {
            "query_string": {
              "query": "state:-1",
              "analyze_wildcard": true,
              "default_field": "*"
            }
          }
        }
      },
      "meta": {
        "modelId": "modelId01"
      }
    }
  },
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "should": [
              {
                "match_phrase": {
                  "li": "D003" //说明这是一次模型调用
                }
              }
            ],
            "minimum_should_match": 1
          }
        },
        {
          "match_phrase": {
            "modelId..keyword": {
              "query": "modelId01"
            }
          }
        },
        {
          "range": {
            "x_st": {
              "gte": "now/h-1h-10s",
              "lte": "now/h-10s",
              "format": "epoch_millis"
            }
          }
        }
      ]
    }
  }
}

查询结果为：

{
  "took" : 88,
  "timed_out" : false,
  "_shards" : {
    "total" : 1095,
    "successful" : 1095,
    "skipped" : 1056,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "error" : {
      "meta" : {
        "modelId" : "modelId01"
      },
      "buckets" : {
        "error" : {
          "doc_count" : 0
        }
      }
    }
  }
}

至此完成了统计需求。

总结

使用聚合元数据方法，可以对聚合的结果进行打标签，可以使用在聚合结果保存后再次进行terms聚合的时候使用，或者通过标签进行各种其他查询。

继续阅读 »

背景

在我们的项目中需要对聚合后的结果进行二次的terms的聚合。实际需求就是有n个模型，需要统计每个模型在每段时间的调用次数，然后需要查询指定m个模型的调用总次数。我们要为每个模型建立一个索引，然后为每个模型查一次这段时间内的使用次数。

注：我们记录的值只能从结果中拿取。

实现过程

第一次设计

每个模型在第一次设计的时候是两个字段，【调用次数】、【错误次数】，使用以下语句：

{
  "aggs": {
    "error": {
      "filters": {
        "filters": {
          "error": {
            "query_string": {
              "query": "state:-1",
              "analyze_wildcard": true,
              "default_field": "*"
            }
          }
        }
      }
    }
  },
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "should": [
              {
                "match_phrase": {
                  "li": "D003" //说明这是一次模型调用
                }
              }
            ],
            "minimum_should_match": 1
          }
        },
        {
          "match_phrase": {
            "modelId..keyword": {
              "query": "modelId01"
            }
          }
        },
        {
          "range": {
            "x_st": {
              "gte": "now/h-1h-10s",
              "lte": "now/h-10s",
              "format": "epoch_millis"
            }
          }
        }
      ]
    }
  }
}

结果为：

{
  "took" : 215,
  "timed_out" : false,
  "_shards" : {
    "total" : 1095,
    "successful" : 1095,
    "skipped" : 1053,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "error" : {
      "buckets" : {
        "error" : {
          "doc_count" : 0
        }
      }
    }
  }
}

解析后保存：

{"@timestamp":"2019-09-23T12:23:23.333","count":0,"error":0}

这种情况可以统计每个模型在某段时间内的调用次数，使用索引名来区分每个model。但是在统计指定m个模型的时候就不行了，使用索引名来查询的时候由于是指定m个，前缀不能使用* 匹配，并且不能罗列所有m个索引来查询，就无法达到统计的效果。第一次设计因为不能在结果中记录模型ID导致不能统计指定的m个模型的数量，以失败告终。

第二次设计

既然需要在统计结果中记录模型ID，那就使用terms聚合来进行操作，先使用模型ID过滤一下数据，然后使用聚合唯一的模型ID，这样就有了模型ID。查询语句如下：

{
  "aggs": {
    "modelId": {
      "terms": {
        "field": "modelId.keyword",
        "size": 5,
        "order": {
          "_count": "desc"
        }
      }
    }
  },
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "match_phrase": {
            "modelId.keyword": {
              "query": "modelId01"
            }
          }
        },
        {
          "range": {
            "x_st": {
              "gte": "now/h-1h",
              "lte": "now/h",
              "format": "epoch_millis"
            }
          }
        },
        {
          "match_phrase": {
            "modelId.keyword": {
              "query": "modelId01"
            }
          }
        }
      ]
    }
  }
}

结果为：

{
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
    "total" : 140,
    "successful" : 140,
    "skipped" : 135,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "modelId" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ ]
    }
  }
}

在有模型调用的时候这个方法还好用，但是在无模型调用的时候这个返回结果就如上面的一样，是不包含任何信息的，错误次数的0和模型ID都没有了。第二次设计因为在无模型调用的时候导致模型ID不能记录，然后也是不能实现指定m个模型的查询次数统计，也以失败告终。

第三次设计

经过两次失败的实际案例，我发现现有的知识已经不能满足需求了，我需要新的方法，我需要一个能给查询结果添加字段的方法，所以我去查询官方文档，让我找到了这个方法聚合元数据也就是对聚合结果进行打标签。我使用第一次的设计方案，然后添加上对聚合结果打标签的方法，就可以记录一次统计值的模型ID了。查询语句如下：

{
  "aggs": {
    "error": {
      "filters": {
        "filters": {
          "error": {
            "query_string": {
              "query": "state:-1",
              "analyze_wildcard": true,
              "default_field": "*"
            }
          }
        }
      },
      "meta": {
        "modelId": "modelId01"
      }
    }
  },
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "should": [
              {
                "match_phrase": {
                  "li": "D003" //说明这是一次模型调用
                }
              }
            ],
            "minimum_should_match": 1
          }
        },
        {
          "match_phrase": {
            "modelId..keyword": {
              "query": "modelId01"
            }
          }
        },
        {
          "range": {
            "x_st": {
              "gte": "now/h-1h-10s",
              "lte": "now/h-10s",
              "format": "epoch_millis"
            }
          }
        }
      ]
    }
  }
}

查询结果为：

{
  "took" : 88,
  "timed_out" : false,
  "_shards" : {
    "total" : 1095,
    "successful" : 1095,
    "skipped" : 1056,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "error" : {
      "meta" : {
        "modelId" : "modelId01"
      },
      "buckets" : {
        "error" : {
          "doc_count" : 0
        }
      }
    }
  }
}

至此完成了统计需求。

总结

使用聚合元数据方法，可以对聚合的结果进行打标签，可以使用在聚合结果保存后再次进行terms聚合的时候使用，或者通过标签进行各种其他查询。

收起阅读 »

社区日报第737期 (2019-09-23)

1、将使用elasticsearch的成本降低90-99%
http://t.cn/EKCScjC

2、Elasticsearch调优实践
http://t.cn/AincWu86

3、Elasticsearch解决问题之道——请亮出你的DSL！
http://t.cn/AincnvSQ

编辑：cyberdak
归档：https://ela.st/cn-daily-all
订阅：https://ela.st/cn-daily-sub
沙龙：https://ela.st/cn-meetup

继续阅读 »

社区日报第749期 (2019-10-12）

社区日报第748期 (2019-10-11)

社区日报第747期 (2019-10-10)

ES Aggs根据聚合的结果（数值）进行过滤

前言

案例需求

实现

每个IP登录次数超过5次的IP

每个IP登录人数超过2的IP

前言

案例需求

实现

每个IP登录次数超过5次的IP

每个IP登录人数超过2的IP

社区日报第746期 (2019-10-09)

社区日报第745期 (2019-10-08)

社区日报第744期 (2019-09-30)

社区日报第743期 (2019-09-29)

社区日报第742期 (2019-09-28）

社区日报第741期 (2019-09-27)

社区日报第740期 (2019-09-26)

社区日报第739期 (2019-09-25)

社区日报第738期 (2019-09-24)

聚合元数据——对聚合结果进行打标签

背景

实现过程

第一次设计

第二次设计

第三次设计

总结

背景

实现过程

第一次设计

第二次设计

第三次设计

总结

社区日报第737期 (2019-09-23)

活动推荐

热门文章

热门话题