circuit_breaking_exception 自动恢复

Elasticsearch | 作者 liujiacheng | 发布于2021年12月13日 | 阅读数：3355

软件版本: ES7.13.4 自带openjdk16， G1 GC
运行环境: CentOS7
描述：
程序A往ES里写数据，触发了内存熔断：
Elasticsearch exception [type=circuit_breaking_exception, reason=[parent]
Data too large, data for [indices:data/write/bulk[s]] would be [13773081266/12.8gb],
which is larger than the limit of [13743895347/12.7gb],
这个没问题，符合我的熔断器设置（Xmx16gb * 0.8 = 12.8gb）
但问题是，某次GC之后，ES的内存下降到12gb的时候，程序A仍然继续报错，无法恢复

是不是客户端有缓存熔断结果？？还是其他原因？

3 个回复

liujiacheng

赞同来自: kin122

客户端PERMANENT熔断恢复的问题还没有解决，但是通过修改如下配置暂时规避了永久熔断：
indices.breaker.total.limit=80% 修改为95%。

这个参数的配置就是默认的95%，为什么我们使用了80%呢？这个参数配置是从ES6.3上继承过来的。
6.3升级到7.13的时候，绝大部分的cluster setting都原封不动的复制到7.13，
但是因为6.x和7.x的内存熔断是完全不一样的机制，7.x使用了真实内存断路器：使用真实内存断路器提高节点弹性
所以再继续使用80%就不合适了。

后面研究下客户端是如何处理429 永久熔断异常的。

liujiacheng

  Suppressed: org.elasticsearch.client.ResponseException: method [POST], 

  host [https://paas-test2-node1.es.te ... :7200], 

  URI [/ntocc_wechat_msg-2021-12-08/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], 

  status line [HTTP/1.1 429 Too Many Requests]

{

    "error":{

        "root_cause":[

            {

                "type":"circuit_breaking_exception",

                "reason":"[parent] Data too large, data for [<http_request>] would be [6156662842/5.7gb], which is larger than the limit of [5988548608/5.5gb], real usage: [6156662328/5.7gb], new bytes reserved: [514/514b], usages [request=0/0b, fielddata=148/148b, in_flight_requests=514/514b, model_inference=0/0b, accounting=28672920/27.3mb]",

                "bytes_wanted":6156662842,

                "bytes_limit":5988548608,

                "durability":"PERMANENT"

            }

        ],

        "type":"circuit_breaking_exception",

        "reason":"[parent] Data too large, data for [<http_request>] would be [6156662842/5.7gb], which is larger than the limit of [5988548608/5.5gb], real usage: [6156662328/5.7gb], new bytes reserved: [514/514b], usages [request=0/0b, fielddata=148/148b, in_flight_requests=514/514b, model_inference=0/0b, accounting=28672920/27.3mb]",

        "bytes_wanted":6156662842,

        "bytes_limit":5988548608,

        "durability":"PERMANENT"

    },

    "status":429

}

找到了一些线索，熔断分两种：PERMNENT和TRANSIEN，如果PERMANENT永久熔断，需要人工介入（intervention by operator）；而TRANSIENT是临时熔断，客户能会重试；

我们遇到的是PERMANENT熔断，难道只能通过重启客户端进行恢复吗？

Charele - Cisco4321

circuit_breaking是个预警机制，为了防止出现OOM而使ES服务挂掉。

没办法自动恢复，需要内存，而内存又被占着。
有几个方法（不一定适用）
1 加内存
2 暂时关闭circuit_breaking
3 清一下缓存以释放一些内存出来

要回复问题请先登录或注册

circuit_breaking_exception 自动恢复

3 个回复

发起人

活动推荐

相关问题

问题状态

circuit_breaking_exception 自动恢复

与内容相关的链接

3 个回复

发起人

活动推荐

相关问题

问题状态