执行强制分片命令:
curl -H "Content-Type:application/json" -XPOST 'http://127.0.0.1:9200/_cluster/reroute' -d '{"commands" : [{"allocate_replica" : {"index" : "wa_pk_wb.log_201906","shard":51,"node":"1553096497000058309"}}]}'
报错如下:
{"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[1553096497000059109][cluster:admin/reroute]"}],"type":"illegal_argument_exception","reason":"[allocate_replica] allocation of [wa_pk_wb.log_201906][51] on node {1553096497000058309}{diwcFeFXSmGMTIVL87nesA}{wnmvPC68T-Wzop5LtMgvXQ}{1.18.48.42}{1.18.48.42:9302}{temperature=hot, rack=rack_1, xpack.installed=true, set=2, region=2,} is not allowed, reason: [NO(shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2019-10-04T00:23:47.137Z], failed_attempts[5], delayed=false, details[failed shard on node [Avu6FgVqQkeD9w7JRQkIMA]: failed recovery, failure RecoveryFailedException[[wa_pk_wb.log_201906][51]: Recovery failed from {1566460253000005411}{XRrOUyhWQsqjd-TgWjesjA}{qI1jiSlKSpOk7UZrdeCtvQ}{1.18.48.6}{1.18.48.6:9302}{temperature=hot, rack=rack_1, xpack.installed=true, set=2, region=2, ip=1.18.48.6} into {1568959492000072011}{Avu6FgVqQkeD9w7JRQkIMA}{TJFPQYofT42CPbg-Zzxaxw}{1.18.48.31}{1.18.48.31:9302}{rack=rack_1, xpack.installed=true, set=2, ip=1.18.48.31, temperature=hot, region=2}]; nested: RemoteTransportException[[1566460253000005411][1.18.48.6:9302][internal:index/shard/recovery/start_recovery]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [5392754809/5gb], which is larger than the limit of [5392485580/5gb], usages [request=0/0b, fielddata=503475099/480.1mb, in_flight_requests=861/861b, accounting=4889278849/4.5gb]]; ], allocation_status[no_attempt]]])][YES(primary shard for this replica is already active)][YES(explicitly ignoring any disabling of allocation due to manual allocation commands via the reroute API)][YES(can allocate replica shard to a node with version [6.4.3] since this is equal-or-newer than the primary version [6.4.3])][YES(the shard is not being snapshotted)][YES(ignored as shard is not being recovered from a snapshot)][YES(node passes include/exclude/require filters)][YES(the shard does not exist on the same node)][YES(enough disk for shard on node, free: [2.8tb], shard size: [0b], free after allocating shard: [2.8tb])][YES(below shard recovery limit of outgoing: [0 < 2] incoming: [0 < 2])][YES(total shard limits are disabled: [index: -1, cluster: -1] <= 0)][YES(node meets all awareness attribute requirements)]"},"status":400}
现在集群有150个分片UNASSIGNED 强制分片都是报以上错误
请指教
curl -H "Content-Type:application/json" -XPOST 'http://127.0.0.1:9200/_cluster/reroute' -d '{"commands" : [{"allocate_replica" : {"index" : "wa_pk_wb.log_201906","shard":51,"node":"1553096497000058309"}}]}'
报错如下:
{"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[1553096497000059109][cluster:admin/reroute]"}],"type":"illegal_argument_exception","reason":"[allocate_replica] allocation of [wa_pk_wb.log_201906][51] on node {1553096497000058309}{diwcFeFXSmGMTIVL87nesA}{wnmvPC68T-Wzop5LtMgvXQ}{1.18.48.42}{1.18.48.42:9302}{temperature=hot, rack=rack_1, xpack.installed=true, set=2, region=2,} is not allowed, reason: [NO(shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2019-10-04T00:23:47.137Z], failed_attempts[5], delayed=false, details[failed shard on node [Avu6FgVqQkeD9w7JRQkIMA]: failed recovery, failure RecoveryFailedException[[wa_pk_wb.log_201906][51]: Recovery failed from {1566460253000005411}{XRrOUyhWQsqjd-TgWjesjA}{qI1jiSlKSpOk7UZrdeCtvQ}{1.18.48.6}{1.18.48.6:9302}{temperature=hot, rack=rack_1, xpack.installed=true, set=2, region=2, ip=1.18.48.6} into {1568959492000072011}{Avu6FgVqQkeD9w7JRQkIMA}{TJFPQYofT42CPbg-Zzxaxw}{1.18.48.31}{1.18.48.31:9302}{rack=rack_1, xpack.installed=true, set=2, ip=1.18.48.31, temperature=hot, region=2}]; nested: RemoteTransportException[[1566460253000005411][1.18.48.6:9302][internal:index/shard/recovery/start_recovery]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [5392754809/5gb], which is larger than the limit of [5392485580/5gb], usages [request=0/0b, fielddata=503475099/480.1mb, in_flight_requests=861/861b, accounting=4889278849/4.5gb]]; ], allocation_status[no_attempt]]])][YES(primary shard for this replica is already active)][YES(explicitly ignoring any disabling of allocation due to manual allocation commands via the reroute API)][YES(can allocate replica shard to a node with version [6.4.3] since this is equal-or-newer than the primary version [6.4.3])][YES(the shard is not being snapshotted)][YES(ignored as shard is not being recovered from a snapshot)][YES(node passes include/exclude/require filters)][YES(the shard does not exist on the same node)][YES(enough disk for shard on node, free: [2.8tb], shard size: [0b], free after allocating shard: [2.8tb])][YES(below shard recovery limit of outgoing: [0 < 2] incoming: [0 < 2])][YES(total shard limits are disabled: [index: -1, cluster: -1] <= 0)][YES(node meets all awareness attribute requirements)]"},"status":400}
现在集群有150个分片UNASSIGNED 强制分片都是报以上错误
请指教
5 个回复
Ombres
赞同来自: yaohe
匿名用户
赞同来自:
The accounting circuit breaker allows Elasticsearch to limit the memory usage of things held in memory that are not released when a request is completed. This includes things like the Lucene segment memory.
indices.breaker.accounting.limit
Limit for accounting breaker, defaults to 100% of JVM heap. This means that it is bound by the limit configured for the parent circuit breaker.
indices.breaker.accounting.overhead
A constant that all accounting estimations are multiplied with to determine a final estimation. Defaults to 1
我大概看懂了。
你们的ES 堆内存大概是8GB(7GB?)吧? 按照新版本的ES 限制,Accounting 限制了分段内存的使用上线。
当你迁移数据(分配分片)到这个节点上时,触发了,分段内存限制的断路器,导致失败。
你这种情况,没办法,只能加大jvm内存,或者增加断路器的限制,即可解决问题。 但是不建议这么做,你这样的话,应该加机器扩容了。
这都是个人的猜测,不敢保证正确。
匿名用户
赞同来自:
不像18GB堆内存啊, 这写这 parent 最大限制是5GB,按照70%jvm内存,也就7GB,按照95%的话就更少了啊。
18GB,parent 应该是18*0.7=12.6GB的限制。
匿名用户
赞同来自:
匿名用户
赞同来自:
https://www.elastic.co/guide/e ... eaker