elasticsearch5.4集群超时

来源:互联网 发布:美国行业自律数据保护 编辑:程序博客网 时间:2024/06/05 02:58

四个节点,有两个是新增加的节点,两个老节点间组成集群没有问题,新增加了两个节点,无论是四个组成集群

# --------------------------------- Discovery ----------------------------------## Pass an initial list of hosts to perform discovery when new node is started:# The default list of hosts is ["127.0.0.1", "[::1]"]#discovery.zen.ping.unicast.hosts: ["10.96.91.208","10.96.91.209","10.96.91.210","10.96.91.211"]## Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):#discovery.zen.minimum_master_nodes: 3## For more information, consult the zen discovery module documentation.#

还是两个节点集群(新旧搭配)

# --------------------------------- Discovery ----------------------------------## Pass an initial list of hosts to perform discovery when new node is started:# The default list of hosts is ["127.0.0.1", "[::1]"]#discovery.zen.ping.unicast.hosts: ["10.96.91.208","10.96.91.210"]## Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):#discovery.zen.minimum_master_nodes: 2## For more information, consult the zen discovery module documentation.

都是有问题,报错内容如下

[2017-10-11T13:30:38,240][WARN ][o.e.n.Node               ] [node-03] timed out while waiting for initial discovery state - timeout: 30s[2017-10-11T13:30:38,254][INFO ][o.e.h.n.Netty4HttpServerTransport] [node-03] publish_address {10.96.91.210:9200}, bound_addresses {10.96.91.210:9200}[2017-10-11T13:30:38,259][INFO ][o.e.n.Node               ] [node-03] started[2017-10-11T13:30:41,301][WARN ][o.e.d.z.ZenDiscovery     ] [node-03] failed to connect to master [{node-01}{VwK2Mm2hSDy4avASCpZt5w}{PMslvo9XSRWYESBXqPwz1w}{10.96.91.208}{10.96.91.208:9300}], retrying...org.elasticsearch.transport.ConnectTransportException: [node-01][10.96.91.208:9300] connect_timeout[30s]    at org.elasticsearch.transport.netty4.Netty4Transport.connectToChannels(Netty4Transport.java:361) ~[?:?]    at org.elasticsearch.transport.TcpTransport.openConnection(TcpTransport.java:549) ~[elasticsearch-5.4.3.jar:5.4.3]    at org.elasticsearch.transport.TcpTransport.connectToNode(TcpTransport.java:473) ~[elasticsearch-5.4.3.jar:5.4.3]    at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:315) ~[elasticsearch-5.4.3.jar:5.4.3]    at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:302) ~[elasticsearch-5.4.3.jar:5.4.3]    at org.elasticsearch.discovery.zen.ZenDiscovery.joinElectedMaster(ZenDiscovery.java:468) [elasticsearch-5.4.3.jar:5.4.3]    at org.elasticsearch.discovery.zen.ZenDiscovery.innerJoinCluster(ZenDiscovery.java:420) [elasticsearch-5.4.3.jar:5.4.3]    at org.elasticsearch.discovery.zen.ZenDiscovery.access$4100(ZenDiscovery.java:83) [elasticsearch-5.4.3.jar:5.4.3]    at org.elasticsearch.discovery.zen.ZenDiscovery$JoinThreadControl$1.run(ZenDiscovery.java:1197) [elasticsearch-5.4.3.jar:5.4.3]    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.4.3.jar:5.4.3]    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_101]    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_101]    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_101]Caused by: io.netty.channel.ConnectTimeoutException: connection timed out: 10.96.91.208/10.96.91.208:9300    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:267) ~[?:?]    at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38) ~[?:?]    at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120) ~[?:?]    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) ~[?:?]    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403) ~[?:?]    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462) ~[?:?]    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) ~[?:?]    ... 1 more

查看日志,可以发现是网络问题。
排查网络
网卡的网络配置

cd /etc/sysconfig/network/more ifcfg-eth0

网络路由配置

more routes

网关配置

more /etc/resolv.conf

这些配置四台服务器基本都是一样的。所以不是配置问题
继续检查ping 和 traceroute
ping没有问题
traceroute显示不一样,发现有了一个空跳。怀疑是防火墙的问题

查看防火墙的状态

chkconfig --list|grep fire

关闭防火墙

cd /etc/init.d/./SuSEfirewall2_setup stop./SuSEfirewall2_init stop

开机关闭防火墙

chkconfig SuSEfirewall2_setup offchkconfig SuSEfirewall2_init off

至此,解决问题