Q:有两个人掉到陷阱里了,死的人叫死人,活人叫什么?

求助,ES7.1.1 network.host: 0.0.0.0 时 无法组成集群

Elasticsearch | 作者 remainsu | 发布于2019年07月04日 | 阅读数:12608

使用es7.1.1 搭建 集群,network.host 监听内网ip时一切正常,但当监听0.0.0.0时,却无法组成集群。
 
报错如下:
[2019-07-04T10:11:52,814][WARN ][o.e.c.c.ClusterFormationFailureHelper] [node-2] master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes [node-3] to bootstrap a cluster: have discovered ; discovery will continue using [10.20.0.6:9300, 10.20.0.7:9300, 10.20.0.8:9300] from hosts providers and [{node-2}{qdHd5pwDQA6x2rL6zLBUbQ}{wtGok7oLS-mAaPh_qxKo9A}{1xx.xx.24.23x}{1xx.xx.24.23x:9300}{ml.machine_memory=16709562368, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 0, last-accepted version 0 in term 0

详细配置如下:
master节点:
cluster.name: es-test
#
node.name: node-3
node.master: true
node.data: true
#
path.data: /data/es/data
path.logs: /data/es/logs
#
bootstrap.memory_lock: false
bootstrap.system_call_filter: false
#
network.host: 0.0.0.0
#
http.port: 9200
discovery.seed_hosts: ["10.20.0.6","10.20.0.7", "10.20.0.8"]
cluster.initial_master_nodes: ["node-3"]

# 是否支持跨域
http.cors.enabled: true
# *表示支持所有域名
http.cors.allow-origin: "*"

其他节点:
cluster.name: es-test
#
node.name: node-2
node.master: true
node.data: true
#
path.data: /data/es/data
path.logs: /data/es/logs
#
bootstrap.memory_lock: false
bootstrap.system_call_filter: false
#
network.host: 0.0.0.0
#
http.port: 9200
discovery.seed_hosts: ["10.20.0.6","10.20.0.7", "10.20.0.8"]
cluster.initial_master_nodes: ["node-3"]
#
# 是否支持跨域
http.cors.enabled: true
# *表示支持所有域名
http.cors.allow-origin: "*"


ps:尝试过 将init的写成ip 或写多个,但都不行。。

查看集群状态,只能查看到一个node,其他的就是加不进去。。
curl -XGET 'http://10.20.0.8:9200/_cat/health?v'
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1562206789 02:19:49 es-test green 1 1 0 0 0 0 0 0 - 100.0%
已邀请:

remainsu

赞同来自: tianyaguozhe

已经解决了。  留作备份吧。。
 
需要加入如下的一行配置,用作集群内各机器见通信使用。
 
network.publish_host: 10.20.0.8 (内网ip)
 
 
ps:  可能是我的机器的某些配置影响的,因为我看了很多案例,包括es官网的介绍,都没说这个是必须的。而且也都是可以正常运行的

Ombres

赞同来自: tianyaguozhe

嗯,我说下我的看法吧,在集群启动的时候,会启动TransportService,TransportService是用来进行节点之间通信的服务,内部封装了一个Transport(接口),在TransportService启动的过程中会初始化Transport(有多种实现,最基本的实现是Netty4Transport),会根据配置参数来进行adress绑定(network.host,transport.host等等都是相关的一些配置,楼主在设置network.host为0.0.0.0时,这个地址会选取一个最合适的host,需要注意,有且只有一个地址被绑定)。具体选择的原理实现如下(emm,我想说这里排序的注释很意思嘛):
 // 1. single wildcard address, probably set by network.host: expand to all interface addresses.
if (addresses.length == 1 && addresses[0].isAnyLocalAddress()) {
HashSet<InetAddress> all = new HashSet<>(Arrays.asList(NetworkUtils.getAllAddresses()));
addresses = all.toArray(new InetAddress[all.size()]);
}

// 2. try to deal with some (mis)configuration
for (InetAddress address : addresses) {
// check if its multicast: flat out mistake
if (address.isMulticastAddress()) {
throw new IllegalArgumentException("publish address: {" + NetworkAddress.format(address) +
"} is invalid: multicast address");
}
// check if its a wildcard address: this is only ok if its the only address!
// (if it was a single wildcard address, it was replaced by step 1 above)
if (address.isAnyLocalAddress()) {
throw new IllegalArgumentException("publish address: {" + NetworkAddress.format(address) +
"} is wildcard, but multiple addresses specified: this makes no sense");
}
}

// 3. if we end out with multiple publish addresses, select by preference.
// don't warn the user, or they will get confused by bind_host vs publish_host etc.
if (addresses.length > 1) {
List<InetAddress> sorted = new ArrayList<>(Arrays.asList(addresses));
NetworkUtils.sortAddresses(sorted);
addresses = new InetAddress{sorted.get(0)};
}
return addresses[0];
 
  /** 
* Sorts addresses by order of preference. This is used to pick the first one for publishing
* @deprecated remove this when multihoming is really correct
*/
@Deprecated
// only public because of silly multicast
public static void sortAddresses(List<InetAddress> list) {
Collections.sort(list, new Comparator<InetAddress>() {
@Override
public int compare(InetAddress left, InetAddress right) {
int cmp = Integer.compare(sortKey(left, PREFER_V6), sortKey(right, PREFER_V6));
if (cmp == 0) {
cmp = new BytesRef(left.getAddress()).compareTo(new BytesRef(right.getAddress()));
}
return cmp;
}
});
}

在启动成功后会在控制台输出当前transport绑定的地址。
如下所示
[2019-07-04T11:32:24,334][INFO ][o.e.t.TransportService   ] [node-1] publish_address {192.168.1.69:9300}, bound_addresses {[::]:9300}
 
这里说一下publish_host,引用官方文档。


network.publish_host
The publish host is the single interface that the node advertises to other nodes in the cluster, so that those nodes can connect to it. Currently an Elasticsearch node may be bound to multiple addresses, but only publishes one. If not specified, this defaults to the “best” address from network.host, sorted by IPv4/IPv6 stack preference, then by reachability. If you set a network.host that results in multiple bind addresses yet rely on a specific address for node-to-node communication, you should explicitly set network.publish_host.


 
 
 
如果不清楚是否配置正确的情况下,那么现在排查这个问题就有一个比较好的思路,首先查看服务启动时绑定的host和port;其次呢需要检查discovery.seed_hosts,是否能访问到;再次呢,检查其他节点指向的master的指向的host,也就是cluster.initial_master_nodes这个参数对应的地址;然后对比host是否一致。
各个节点间的通信基础是tranport维护,而上述的配置目的是保证节点能够通过tranpost正确的连接到其他节点。(简单的这样理解)
最终基本就能得出结论了。
 

要回复问题请先登录注册