node.lock消失问题记录,ELKB-6.0.0安装,以及更新license日志

来源:互联网 发布:淘宝流量劫持工具 编辑:程序博客网 时间:2024/06/13 23:02

问题

发现硬盘爆满,装满了elastic的错误日志,具体报错为:

[2017-11-24T15:59:53,590][WARN ][o.e.e.NodeEnvironment    ] [es_node1] lock assertion failedjava.nio.file.NoSuchFileException: /tmp/elasticsearch/data/nodes/0/node.lock[2017-11-24T15:59:53,755][DEBUG][o.e.a.a.c.s.TransportClusterStatsAction] [es_node1] failed to execute on node [PodBWQxJTGqJioeYk7TBsw]org.elasticsearch.transport.RemoteTransportException: [es_node1][192.168.5.233:9300][cluster:monitor/stats[n]]Caused by: java.lang.IllegalStateException: environment is not lockedCaused by: java.nio.file.NoSuchFileException: /tmp/elasticsearch/data/nodes/0/node.lock[2017-11-24T15:59:53,935][ERROR][o.e.x.m.c.n.NodeStatsCollector] [es_node1] collector [node-stats] failed to collect dataorg.elasticsearch.action.FailedNodeException: Failed node [PodBWQxJTGqJioeYk7TBsw]Caused by: org.elasticsearch.transport.RemoteTransportException: [es_node1][192.168.5.233:9300][cluster:monitor/nodes/stats[n]]Caused by: java.lang.IllegalStateException: environment is not lockedCaused by: java.nio.file.NoSuchFileException: /tmp/elasticsearch/data/nodes/0/node.lock

值得一记的是ELK的另一个Slave节点下也出现了这个问题,但是是在26号才开始报这个错的.

同时kibana在2天后奔溃,最后的日志是elastic还在初始化Monitoring,然后报RSP超时。

另外elastic日志中还看到了license到期的问题,elastic如果不申请license,就只有30天试用期,到期之后会有这些问题:

# License [will expire] on [Thursday, November 30, 2017]. If you have a new license, please update it.
# Otherwise, please reach out to your support contact.
#
# Commercial plugins operate with reduced functionality on license expiration:
# - security
# - Cluster health, cluster stats and indices stats operations are blocked
# - All data operations (read and write) continue to work
# - watcher
# - PUT / GET watch APIs are disabled, DELETE watch API continues to work
# - Watches execute and write to the history
# - The actions of the watches don’t execute
# - monitoring
# - The agent will stop collecting cluster and indices metrics
# - The agent will stop automatically cleaning indices older than [xpack.monitoring.history.duration]
# - graph
# - Graph explore APIs are disabled
# - ml
# - Machine learning APIs are disabled
# - deprecation
# - Deprecation APIs are disabled
# - upgrade
# - Upgrade API is disabled

处理

首先kibana崩溃可能是由于elastic的异常,license的问题也可以在30号之前处理,最需要解决的是的找到无限报错的问题,看报错可以知道是由于node.lock文件的消失,进data文件夹下发现确实这个文件都不见了.
因此寻找这个问题的发生原因,没找到,google上只搜到一个人和我有同样的问题,并且他也从3月开始有这个问题直到7月也没解决,他的版本是5.2.2.我的是5.6.2.
最后由于无法解决,决定安装6.0.0试试

6.0.0ELKB安装

首先在ELKB中文官网https://www.elastic.co/cn/products 下载最新的匹配的安装包,目前先不使用logstash,直接使用轻量级的filebeat
elasticsearch-6.0.0.tar.gz
filebeat-6.0.0-linux-x86_64.tar.gz
kibana-6.0.0-linux-x86_64.tar.gz
然后在linux下解压,修改ELB的配置文件

首先是elastic

vim config/elasticsearch.yml

内容与5.6基本一致:
配置了:

cluster.name: es_clusterNewnode.name: es_node1path.data: /tmp/elasticsearch/datapath.logs: /tmp/elasticsearch/logsnetwork.host: Masterdiscovery.zen.ping.unicast.hosts: ["Master", "Slave1"]discovery.zen.ping_timeout: 120sclient.transport.ping_timeout: 60sxpack.security.enabled: false

然后JVM内存配置:

vim config/jvm.options

发现这版本默认的jvm内存使用为1G了

-Xms1g-Xmx1g

顺带安装x-pack插件

./bin/elasticsearch-plugin install x-pack

然后是filebeat:

vim filebeat.yml

这次6.0变化最大的应该是filebeat,多出了很多配置项,如下filebeatmodules,应该是filebeat多了很多模式,并且在modules.d/下有各种配置文件,这里要把reload.enabled改为true才能使这些配置文件生效.

#============================= Filebeat modules ===============================filebeat.config.modules:  # Glob pattern for configuration loading  path: ${path.config}/modules.d/*.yml  # Set to true to enable config reloading  reload.enabled: false  # Period on which files under path should be checked for changes

多了一个elasticsearch template的设置如下

#==================== Elasticsearch template setting ==========================setup.template.settings:  index.number_of_shards: 3  #index.codec: best_compression  #_source.enabled: false

看来是关于一个数据分为多少个碎片,数据压缩相关的设置,在6.0之前,elastic数据默认是分为5个碎片.这里暂不作改动.
之后看了下,确实由filebeat提交的数据,分为3个碎片了,至于elastic本来默认的碎片数,是否改变还未验证
还多了一个kibana设置,可以通过KibanaAPI加载beat到dashboards,这里配置你kibana所在地址就可以

#============================== Kibana =====================================# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.# This requires a Kibana endpoint configuration.setup.kibana:  # Kibana Host  # Scheme and port can be left out and will be set to the default (http and 5601)  # In case you specify and additional path, the scheme is required: http://localhost:5601/path  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601  host: "Master:5601"

其他配置与以前版本相同
输入配置,这里多了一个enabled配置,需要改为true才有效:

#=========================== Filebeat prospectors =============================filebeat.prospectors:# Each - is a prospector. Most options can be set at the prospector level, so# you can use different prospectors for various configurations.# Below are the prospector specific configurations.- type: log  # Change to true to enable this prospector configuration.  enabled: true  # Paths that should be crawled and fetched. Glob based paths.  paths:    - /home/hadoop/logs/test/*.log    #- c:\programdata\elasticsearch\logs\*

然后是多行录入配置:

  multiline.pattern: ^\t.*$  multiline.negate: false  multiline.match: after  max_line: 10

输出配置:

#-------------------------- Elasticsearch output ------------------------------output.elasticsearch:  # Array of hosts to connect to.  hosts: ["Master:9200"]

然后配置kibana

vim config/kibana.yml 

与5.6基本一致,简单设置一下

server.host: "Master"server.name: "LogAnalysis"elasticsearch.url: "http://Master:9200"xpack.security.enabled: false

然后安装x-pack:

./bin/kibana-plugin install x-pack

启动elastic,filebeat,kibana
其中filebeat不能用start副命令了,并且官方启动命令也添加了-c [配置文件] 的部分.
发现6.0kibana的x-pack初次启动比5.6快很多.
这里要注意如果是从6.0以前的版本升级到6.0的话,date和log文件如果没有改变,会报错提示你升级kibana.
这里可以删除原来的date和log文件夹或者换个date和log目录以解决问题

license

到此安装完成,node.lock的问题是否解决目前不知道,先解决license的问题,我们安装elastic有30天的试用期,快到期的时候会在日志中每10分钟提示1次.
这里重新安装之后log中不在有license的提醒,我们可以通过一下命令查询

curl -XGET 'master:9200/_license'

结果:

可以看到确实只有一个月使用期.
到官网注册申请license,这个是免费的,可以持续一年,当到期,可以再次申请延长,注册地址
之后你的邮箱会收到一封邮件,点进去完成注册并获得license.下载对应版本的license文件.这里假设文件为license.json
根据官方文档,将license添加进elastic:

curl -XPUT -u elastic 'master:9200/_xpack/license' -H "Content-Type: application/json" -d @license.json

这里如果安装了xpack会需要输入密码,默认密码为changeme
如果返回失败:

{"acknowledged":false,"license_status":"valid","acknowledge":{"message":"This license update requires acknowledgement. To acknowledge the license, please read the following messages and update the license again, this time with the \"acknowledge=true\" parameter:","watcher":["Watcher will be disabled"],"logstash":["Logstash specific APIs will be disabled, but you can continue to manage and poll stored configurations"],"security":["The following X-Pack security functionality will be disabled: authentication, authorization, ip filtering, and auditing. Please restart your node after applying the license.","Field and document level access control will be disabled.","Custom realms will be ignored."],"monitoring":["Multi-cluster support is disabled for clusters with [BASIC] license. If you are\nrunning multiple clusters, users won't be able to access the clusters with\n[BASIC] licenses from within a single X-Pack Kibana instance. You will have to deploy a\nseparate and dedicated X-pack Kibana instance for each [BASIC] cluster you wish to monitor.","Automatic index cleanup is locked to 7 days for clusters with [BASIC] license."],"graph":["Graph will be disabled"],"ml":["Machine learning will be disabled"]}}

那就需要重新提交申请并将acknowledge设置为true:

curl -XPUT -u elastic 'master:9200/_xpack/license?acknowledge=true' -H "Content-Type: application/json" -d @license.json

再次查看许可

有了一年的使用期

至此安装6.0.0以及更新license完成.

node.lock问题

至于node.lock和kibana的问题需要继续观察.到2017-12-2号,还没有再次发生.但是为了防止事故一旦发生就炸掉硬盘,可以写一个简单的shell,”当检测到node.lock消失时,创建一个node.lock.”加入计划任务.来亡羊补牢.

#!/bin/bashpath=/tmp/elasticsearch/data/nodes/0/now=`date "+%Y-%m-%d %H:%M:%S"`logpath=/tmp/elasticsearch/logs/detectLockfile.logcd $pathecho "${now}" >> $logpathif test ! -f 'node.lock' ;then  echo "Lockfile disappeared" >> $logpath  touch node.lock;else  echo "All right" >> $logpathfi

path是当时我node.lock消失的文件夹.
logpath用于检测消失的结果.
加入计划任务:
crontab -e

0 */4 * * * /tmp/elasticsearch/prevent.sh

4小时检测一次

到12月11号发现,12月8号和9号分别发生了一次node.lock消失,脚本生成了一个假的lock文件,但是文件不能通过elastic的验证,所以一直报错。并且,重启elastic之后也仍然报错。不过在这个版本,报错信息不会导致硬盘写满,对于lock文件错误的日志写一定数量之后就会停止。运行了几天错误日志也就几个1M的zip包。lock文件消失的原因还是没找到。

后记

1

发现,在集群的其他节点安装elastic后,license会自动加到其他节点。不需要手动重复XPUT过程。

原创粉丝点击