Q:有两个人掉到陷阱里了,死的人叫死人,活人叫什么?

有3个es机器,总有固定一台起不来,症状如下.

Elasticsearch | 作者 kyun | 发布于2015年12月24日 | 阅读数:14474

有3个es机器,总有固定一台起不来,症状如下.
[2015-12-24 11:44:21,596][WARN ][bootstrap ] unable to install syscall filter: prctl(PR_GET_NO_NEW_PRIVS): Invalid argument
[2015-12-24 11:44:21,956][INFO ][node ] [node-1] version[2.1.0], pid[21436], build[72cd1f1/2015-11-18T22:40:03Z]
[2015-12-24 11:44:21,956][INFO ][node ] [node-1] initializing ...
[2015-12-24 11:44:22,037][INFO ][plugins ] [node-1] loaded [], sites []
[2015-12-24 11:44:22,088][INFO ][env ] [node-1] using [1] data paths, mounts [[/data/data1 (/dev/sdb)]], net usable_space [19.2tb], net total_space [27.2tb], spins? [possibly], types [xfs]
[2015-12-24 11:44:24,797][INFO ][node ] [node-1] initialized
[2015-12-24 11:44:24,798][INFO ][node ] [node-1] starting ...
[2015-12-24 11:44:24,861][INFO ][transport ] [node-1] publish_address {10.0.3.97:9300}, bound_addresses {10.0.3.97:9300}
[2015-12-24 11:44:24,869][INFO ][discovery ] [node-1] ksc_s_op/-VLwYHs2T4WzObFZ4-78Aw
[2015-12-24 11:44:27,953][INFO ][cluster.service ] [node-1] detected_master {node-2}{2YiPUh8tSBOUt7XCHX4kMg}{10.0.3.99}{10.0.3.99:9300}, added {{node-3}{V58Pps9ETXe3WRTcFx_fiw}{10.0.3.98}{10.0.3.98:9300},{node-2}{2YiPUh8tSBOUt7XCHX4kMg}{10.0.3.99}{10.0.3.99:9300},}, reason: zen-disco-receive(from master [{node-2}{2YiPUh8tSBOUt7XCHX4kMg}{10.0.3.99}{10.0.3.99:9300}])
[2015-12-24 11:44:28,111][INFO ][http ] [node-1] publish_address {10.0.3.97:9200}, bound_addresses {10.0.3.97:9200}
[2015-12-24 11:44:28,112][INFO ][node ] [node-1] started
[2015-12-24 11:45:01,671][INFO ][node ] [node-1] stopping ...
[2015-12-24 11:45:01,750][INFO ][node ] [node-1] stopped
[2015-12-24 11:45:01,750][INFO ][node ] [node-1] closing ...
[2015-12-24 11:45:01,754][INFO ][node ] [node-1] closed
[liaominke@bjlg-c17-junjiestorage97 bin]$ ./elasticsearch
[2015-12-24 11:51:47,842][WARN ][bootstrap ] unable to install syscall filter: prctl(PR_GET_NO_NEW_PRIVS): Invalid argument
[2015-12-24 11:51:48,575][INFO ][node ] [node-1] version[2.1.0], pid[23194], build[72cd1f1/2015-11-18T22:40:03Z]
[2015-12-24 11:51:48,576][INFO ][node ] [node-1] initializing ...
[2015-12-24 11:51:48,748][INFO ][plugins ] [node-1] loaded [], sites []
[2015-12-24 11:51:48,803][INFO ][env ] [node-1] using [1] data paths, mounts [[/data/data1 (/dev/sdb)]], net usable_space [19.2tb], net total_space [27.2tb], spins? [possibly], types [xfs]
[2015-12-24 11:51:53,252][INFO ][node ] [node-1] initialized
[2015-12-24 11:51:53,252][INFO ][node ] [node-1] starting ...
[2015-12-24 11:51:53,427][INFO ][transport ] [node-1] publish_address {10.0.3.97:9300}, bound_addresses {10.0.3.97:9300}
[2015-12-24 11:51:53,455][INFO ][discovery ] [node-1] ksc_s_op/70nJ1QWuTuSS4vdHWiMuUw
[2015-12-24 11:51:56,533][INFO ][cluster.service ] [node-1] detected_master {node-2}{2YiPUh8tSBOUt7XCHX4kMg}{10.0.3.99}{10.0.3.99:9300}, added {{node-2}{2YiPUh8tSBOUt7XCHX4kMg}{10.0.3.99}{10.0.3.99:9300},{node-3}{V58Pps9ETXe3WRTcFx_fiw}{10.0.3.98}{10.0.3.98:9300},}, reason: zen-disco-receive(from master [{node-2}{2YiPUh8tSBOUt7XCHX4kMg}{10.0.3.99}{10.0.3.99:9300}])
[2015-12-24 11:51:56,665][INFO ][http ] [node-1] publish_address {10.0.3.97:9200}, bound_addresses {10.0.3.97:9200}
[2015-12-24 11:51:56,665][INFO ][node ] [node-1] started
[2015-12-24 11:55:02,068][INFO ][node ] [node-1] stopping ...
[2015-12-24 11:55:02,104][WARN ][action.bulk ] [node-1] [nginx-2015.12.24][0] failed to perform indices:data/write/bulk[s] on node {node-3}{V58Pps9ETXe3WRTcFx_fiw}{10.0.3.98}{10.0.3.98:9300}
SendRequestTransportException[[node-3][10.0.3.98:9300][indices:data/write/bulk[s][r]]]; nested: TransportException[TransportService is closed stopped can't send request];
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:323)
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicationPhase.performOnReplica(TransportReplicationAction.java:882)
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicationPhase.doRun(TransportReplicationAction.java:859)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.finishAndMoveToReplication(TransportReplicationAction.java:530)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.performOnPrimary(TransportReplicationAction.java:608)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase$1.doRun(TransportReplicationAction.java:452)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: TransportException[TransportService is closed stopped can't send request]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:303)
... 10 more
[2015-12-24 11:55:02,110][WARN ][netty.channel.DefaultChannelPipeline] An exception was thrown by an exception handler.
java.util.concurrent.RejectedExecutionException: Worker has already been shutdown
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.registerTask(AbstractNioSelector.java:120)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:72)
at org.jboss.netty.channel.socket.nio.NioWorker.executeInIoThread(NioWorker.java:36)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:56)
at org.jboss.netty.channel.socket.nio.NioWorker.executeInIoThread(NioWorker.java:36)
at org.jboss.netty.channel.socket.nio.AbstractNioChannelSink.execute(AbstractNioChannelSink.java:34)
at org.jboss.netty.channel.DefaultChannelPipeline.execute(DefaultChannelPipeline.java:636)
at org.jboss.netty.channel.Channels.fireExceptionCaughtLater(Channels.java:496)
at org.jboss.netty.channel.AbstractChannelSink.exceptionCaught(AbstractChannelSink.java:46)
at org.jboss.netty.channel.DefaultChannelPipeline.notifyHandlerException(DefaultChannelPipeline.java:658)
at org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:577)
at org.jboss.netty.channel.Channels.write(Channels.java:704)
at org.jboss.netty.channel.Channels.write(Channels.java:671)
at org.jboss.netty.channel.AbstractChannel.write(AbstractChannel.java:348)
at org.elasticsearch.transport.netty.NettyTransportChannel.sendResponse(NettyTransportChannel.java:103)
at org.elasticsearch.transport.netty.NettyTransportChannel.sendResponse(NettyTransportChannel.java:75)
at org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncReplicaAction.doRun(TransportReplicationAction.java:315)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicaOperationTransportHandler.messageReceived(TransportReplicationAction.java:241)
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicaOperationTransportHandler.messageReceived(TransportReplicationAction.java:238)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:299)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2015-12-24 11:55:02,105][WARN ][action.bulk ] [node-1] [nginx-2015.12.24][2] failed to perform indices:data/write/bulk[s] on node {node-3}{V58Pps9ETXe3WRTcFx_fiw}{10.0.3.98}{10.0.3.98:9300}
SendRequestTransportException[[node-3][10.0.3.98:9300][indices:data/write/bulk[s][r]]]; nested: TransportException[TransportService is closed stopped can't send request];
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:323)
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicationPhase.performOnReplica(TransportReplicationAction.java:882)
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicationPhase.doRun(TransportReplicationAction.java:859)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.finishAndMoveToReplication(TransportReplicationAction.java:530)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.performOnPrimary(TransportReplicationAction.java:608)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase$1.doRun(TransportReplicationAction.java:452)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: TransportException[TransportService is closed stopped can't send request]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:303)
... 10 more
[2015-12-24 11:55:02,104][WARN ][action.bulk ] [node-1] [nginx-2015.12.24][1] failed to perform indices:data/write/bulk[s] on node {node-2}{2YiPUh8tSBOUt7XCHX4kMg}{10.0.3.99}{10.0.3.99:9300}
SendRequestTransportException[[node-2][10.0.3.99:9300][indices:data/write/bulk[s][r]]]; nested: TransportException[TransportService is closed stopped can't send request];
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:323)
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicationPhase.performOnReplica(TransportReplicationAction.java:882)
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicationPhase.doRun(TransportReplicationAction.java:859)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.finishAndMoveToReplication(TransportReplicationAction.java:530)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.performOnPrimary(TransportReplicationAction.java:608)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase$1.doRun(TransportReplicationAction.java:452)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: TransportException[TransportService is closed stopped can't send request]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:303)
... 10 more
[2015-12-24 11:55:02,121][WARN ][cluster.action.shard ] [node-1] failed to send failed shard to {node-2}{2YiPUh8tSBOUt7XCHX4kMg}{10.0.3.99}{10.0.3.99:9300}
SendRequestTransportException[[node-2][10.0.3.99:9300][internal:cluster/shard/failure]]; nested: TransportException[TransportService is closed stopped can't send request];
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:323)
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:282)
at org.elasticsearch.cluster.action.shard.ShardStateAction.innerShardFailed(ShardStateAction.java:97)
at org.elasticsearch.cluster.action.shard.ShardStateAction.shardFailed(ShardStateAction.java:87)
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicationPhase$1.handleException(TransportReplicationAction.java:895)
at org.elasticsearch.transport.TransportService$3.run(TransportService.java:327)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: TransportException[TransportService is closed stopped can't send request]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:303)
... 8 more
[2015-12-24 11:55:02,122][WARN ][cluster.action.shard ] [node-1] failed to send failed shard to {node-2}{2YiPUh8tSBOUt7XCHX4kMg}{10.0.3.99}{10.0.3.99:9300}
SendRequestTransportException[[node-2][10.0.3.99:9300][internal:cluster/shard/failure]]; nested: TransportException[TransportService is closed stopped can't send request];
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:323)
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:282)
at org.elasticsearch.cluster.action.shard.ShardStateAction.innerShardFailed(ShardStateAction.java:97)
at org.elasticsearch.cluster.action.shard.ShardStateAction.shardFailed(ShardStateAction.java:87)
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicationPhase$1.handleException(TransportReplicationAction.java:895)
at org.elasticsearch.transport.TransportService$3.run(TransportService.java:327)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: TransportException[TransportService is closed stopped can't send request]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:303)
... 8 more
[2015-12-24 11:55:02,122][WARN ][cluster.action.shard ] [node-1] failed to send failed shard to {node-2}{2YiPUh8tSBOUt7XCHX4kMg}{10.0.3.99}{10.0.3.99:9300}
SendRequestTransportException[[node-2][10.0.3.99:9300][internal:cluster/shard/failure]]; nested: TransportException[TransportService is closed stopped can't send request];
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:323)
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:282)
at org.elasticsearch.cluster.action.shard.ShardStateAction.innerShardFailed(ShardStateAction.java:97)
at org.elasticsearch.cluster.action.shard.ShardStateAction.shardFailed(ShardStateAction.java:87)
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReplicationPhase$1.handleException(TransportReplicationAction.java:895)
at org.elasticsearch.transport.TransportService$3.run(TransportService.java:327)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: TransportException[TransportService is closed stopped can't send request]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:303)
... 8 more
[2015-12-24 11:55:02,467][INFO ][node ] [node-1] stopped
[2015-12-24 11:55:02,468][INFO ][node ] [node-1] closing ...
[2015-12-24 11:55:02,476][INFO ][node ] [node-1] closed
已邀请:

medcl - 今晚打老虎。

赞同来自:

1.你的系统配置?
2.你的esJVM配置改成什么样子了?
3.监听下服务器的资源情况,内存是不是用完了?
4.ulimit -a 贴一下
5.固定一台是固定的制定的那台服务器么,还是随机任意一台?

kyun

赞同来自:

1.64个G内存,5T硬盘
2.JVM没动,是用es 2.1版本
3.内存没完,64G当时还有4G内存。
4.core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 514859
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65535
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 10240
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
5.总是固定一台死掉。
附件是分别在存活,死掉时候,head管理见面截图。

kyun

赞同来自:

谢谢,问题解决了。

要回复问题请先登录注册