关于es7中的节点失效探测(fault_detection)参数

Elasticsearch | 作者 shwtz | 发布于2023年01月17日 | 阅读数：8581

使用ansible剧本部署的es集群，发现在ansible生成的elasticsearch.yml.j2文件中，其中关于节点失效探测的几个参数的值比默认值太很多：

其中
cluster.fault_detection.follower_check.interval和cluster.fault_detection.leader_check.interval都是30s（默认1s）；
cluster.fault_detection.follower_check.timeout和cluster.fault_detection.leader_check.timeout都是120s（默认10s）

官方文档中对于上述两个timeout的描述是：
(Static) Sets how long the elected master waits for a response to a follower check before considering it to have failed. Defaults to 10s. Changing this setting from the default may cause your cluster to become unstable.

后面明确说明，修改这个timeout的值可能会导致集群变得不稳定。

实际测试，使用禁用集群中某节点所在的服务器的网卡的方式，发现集群要花费7分半左右的时间才认定节点离线，期间集群的数据读写均出现故障。

这个7分半确实就是interval的(30s+120s) * 3 (重试3次）的时间。

我现在的疑问是ansible剧本的elasticsearch中的templates中，节点的elasticsearch.yml.j2的模板文件中，上述几个参数的值偏离默认值太多，设置不合理，导致集群出现网络故障会陷入长时间的不可用状态？

ps. 我去elastic下的ansbile-elasticsearch的git仓库看了一下，最新的版本的elasticsearch.yml.j2的模板文件中，压根没有设置过这些参数。怀疑我现在看到的服务器上的剧本是人为修改过的。

0 个回复

要回复问题请先登录或注册

关于es7中的节点失效探测(fault_detection)参数

0 个回复

发起人

活动推荐

相关问题

问题状态

关于es7中的节点失效探测(fault_detection)参数

与内容相关的链接

0 个回复

发起人

活动推荐

相关问题

问题状态