ES运行一段时间后会自动崩溃,集群状态变为Red。删掉数据库重启,运行一段时间(通常是几天)后又崩溃。求大神指点!
ES版本5.4.1,系统是阿里云Ubuntu 12。所在磁盘分区总大小20G,es占用的空间大约有1.6G。集群基本信息如下:
以下是错误发生时的日志,省略了一些:
ES版本5.4.1,系统是阿里云Ubuntu 12。所在磁盘分区总大小20G,es占用的空间大约有1.6G。集群基本信息如下:
{
"cluster_name" : "elasticsearch",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 50.0
}
以下是错误发生时的日志,省略了一些:
[2019-12-07T01:04:08,770][INFO ][o.e.c.r.a.DiskThresholdMonitor] [107room-node-1] rerouting shards: [high disk watermark exceeded on one or more nodes]
[2019-12-07T01:04:38,793][WARN ][o.e.c.r.a.DiskThresholdMonitor] [107room-node-1] high disk watermark [90%] exceeded on [xRIeFFvgTMes53cAJzhcYQ][107room-node-1][/alidata/server/elasticsearch/data/nodes/0] free: 1.3gb[6.6%], shards will be relocated away from this node
[2019-12-07T01:05:08,827][INFO ][o.e.c.r.a.DiskThresholdMonitor] [107room-node-1] rerouting shards: [one or more nodes has gone under the high or low watermark]
[2019-12-07T03:19:25,815][WARN ][o.e.i.e.Engine ] [107room-node-1] [107room][3] failed engine [already closed by tragic event on the translog]
java.nio.file.NoSuchFileException: /alidata/server/elasticsearch/data/nodes/0/indices/4ZnhGezFTlqUwWV1hcvMKQ/3/translog/translog.ckp
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[?:?]
…
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) [elasticsearch-5.4.1.jar:5.4.1]
at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:627) [elasticsearch-5.4.1.jar:5.4.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) [elasticsearch-5.4.1.jar:5.4.1]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.4.1.jar:5.4.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_161]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_161]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]
[2019-12-07T03:19:25,824][WARN ][o.e.i.c.IndicesClusterStateService] [107room-node-1] [[107room][3]] marking and sending shard failed due to [shard failure, reason [already closed by tragic event on the translog]]
java.nio.file.NoSuchFileException: /alidata/server/elasticsearch/data/nodes/0/indices/4ZnhGezFTlqUwWV1hcvMKQ/3/translog/translog.ckp
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[?:?]
at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177) ~[?:?]
at java.nio.channels.FileChannel.open(FileChannel.java:287) ~[?:1.8.0_161]
at java.nio.channels.FileChannel.open(FileChannel.java:335) ~[?:1.8.0_161]
at org.elasticsearch.index.translog.Checkpoint.write(Checkpoint.java:127) ~[elasticsearch-5.4.1.jar:5.4.1]
…
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) ~[elasticsearch-5.4.1.jar:5.4.1]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-5.4.1.jar:5.4.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_161]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_161]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]
[2019-12-07T03:19:25,826][WARN ][o.e.c.a.s.ShardStateAction] [107room-node-1] [107room][3] received shard failed for shard id [[107room][3]], allocation id [MqwuitbpTweTbVPquCRzDg], primary term [0], message [shard failure, reason [already closed by tragic event on the translog]], failure [NoSuchFileException[/alidata/server/elasticsearch/data/nodes/0/indices/4ZnhGezFTlqUwWV1hcvMKQ/3/translog/translog.ckp]]
java.nio.file.NoSuchFileException: /alidata/server/elasticsearch/data/nodes/0/indices/4ZnhGezFTlqUwWV1hcvMKQ/3/translog/translog.ckp
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[?:?]
…
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) ~[elasticsearch-5.4.1.jar:5.4.1]
at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:627) ~[elasticsearch-5.4.1.jar:5.4.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) ~[elasticsearch-5.4.1.jar:5.4.1]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-5.4.1.jar:5.4.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_161]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_161]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]
[2019-12-07T03:19:25,864][INFO ][o.e.c.r.a.AllocationService] [107room-node-1] Cluster health status changed from [YELLOW] to [RED] (reason: [shards failed [[107room][3]] ...]).
[2019-12-07T03:19:25,974][WARN ][o.e.i.c.IndicesClusterStateService] [107room-node-1] [[107room][3]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.indices.recovery.RecoveryFailedException: [107room][3]: Recovery failed on {107room-node-1}{xRIeFFvgTMes53cAJzhcYQ}{mElodbIhS96k-5uqnbX8WQ}{127.0.0.1}{127.0.0.1:9300}
at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$1(IndexShard.java:1490) ~[elasticsearch-5.4.1.jar:5.4.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.4.1.jar:5.4.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_161]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_161]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]
Caused by: org.elasticsearch.index.shard.IndexShardRecoveryException: failed to recover from gateway
at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:365) ~[elasticsearch-5.4.1.jar:5.4.1]
…
... 4 more
Caused by: org.elasticsearch.index.engine.EngineCreationFailureException: failed to create engine
at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:154) ~[elasticsearch-5.4.1.jar:5.4.1]
at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:25) ~[elasticsearch-5.4.1.jar:5.4.1]
…
at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$1(IndexShard.java:1486) ~[elasticsearch-5.4.1.jar:5.4.1]
... 4 more
Caused by: java.nio.file.NoSuchFileException: /alidata/server/elasticsearch-5.4.1/data/nodes/0/indices/4ZnhGezFTlqUwWV1hcvMKQ/3/translog/translog.ckp
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) ~[?:?]
…
at org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:1238) ~[elasticsearch-5.4.1.jar:5.4.1]
at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$1(IndexShard.java:1486) ~[elasticsearch-5.4.1.jar:5.4.1]
... 4 more
5 个回复
Qiaoqing
赞同来自:
ww107
赞同来自:
Charele - Cisco4321
赞同来自:
看样子,似乎是data里的文件被物理删除了???
另外,现在都7。5了,干嘛还用5.4.1那么老的版本?
locatelli
赞同来自:
ww107
赞同来自: