es版本5.2.2_cluster/allocation/explain显示为
尝试过调用:/_cluster/reroute?retry_failed=true,但是结果还是这样
{
"node_id": "VCMPiqWZSYW4hnNDj_NExg",
"node_name": "es-1",
"transport_address": "127.0.0.1:9300",
"node_attributes": {
"tag": "warm"
},
"node_decision": "no",
"store": {
"in_sync": true,
"allocation_id": "zZHXkwouS_SPJUmLzg3nWQ",
"store_exception": {
"type": "shard_lock_obtain_failed_exception",
"reason": "[test-test][3]: obtaining shard lock timed out after 5000ms",
"index_uuid": "HI8Z5vAdTqmM8rfw_JT0Lw",
"shard": "3",
"index": "test-test"
}
},
"deciders": [
{
"decider": "max_retry",
"decision": "NO",
"explanation": "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2018-04-18T07:09:02.434Z], failed_attempts[10], delayed=false, details[failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[test-test][3]: obtaining shard lock timed out after 5000ms]; ], allocation_status[deciders_no]]]"
}
]
}
尝试过调用:/_cluster/reroute?retry_failed=true,但是结果还是这样
POST /_cluster/reroute?pretty
{
"commands" : [ {
"allocate_stale_primary" :
{
"index" : "test-test", "shard" : 3,
"node" : "es-2",
"accept_data_loss" : true
}
}
]
}
这样也是提示失败"unassigned_info": {
"reason": "ALLOCATION_FAILED",
"at": "2018-04-18T07:09:02.434Z",
"failed_attempts": 10,
"delayed": false,
"details": "failed to create shard, failure IOException[failed to obtain in-memory shard lock]; nested: ShardLockObtainFailedException[[test-test][3]: obtaining shard lock timed out after 5000ms]; ",
"allocation_status": "deciders_no"
}
只能够allocate_empty_primary,但这样会导致数据完全丢失,搜了下,也没有找到好方法
5 个回复
kennywu76 - Wood
赞同来自: laoyang360 、abia 、cccthought 、小风 、SpadeKing 、Atom 、dragon434更多 »
如果retry_failed无法解决问题,可以尝试一下allocate_stale_primary,前提是需要知道这个shard的primary在哪个结点上。实在解决不了,又不想丢数据,还可以重启一下该结点,内存锁应该可以释放。
yayg2008
赞同来自:
现在的错误是无法获取到shard lock。
源码:
JackGe
赞同来自:
当时我的操作过程是先停止数据写入es任务,关闭这个索引(POST xxxx/_close)一段时间后,再打开这个索引(POST xxxx/_open)并观察这个索引的状态(GET _cat/shards/xxxx),然后这个索引就恢复了,最后启用数据写入任务。
guopeng7216 - 90后运维
赞同来自:
org.elasticsearch.env.ShardLockObtainFailedException: [user][0]: obtaining shard lock timed out after 5000ms
at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:668) ~[elasticsearch-6.1.0.jar:6.1.0]
at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:587) ~[elasticsearch-6.1.0.jar:6.1.0]
at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:430) [elasticsearch-6.1.0.jar:6.1.0]
兄弟是否找到了有效解决的办法,看说停止es写入,不太显示啊,业务还得照常跑的,也试了上面的方法:
curl -XPOST '172.19.6.127:9200/_cluster/reroute?retry_failed=true&pretty'
但是没效果,
shwtz - 学物理想做演员的IT男
赞同来自: