yarn日常维护之nm健康状态为false

来源：互联网发布：中文翻译阿拉伯语软件编辑：程序博客网时间：2024/06/01 08:11

最近发现yarn集群的ui上显示的nodes个数为2个,正常情况下是2个,然后就很无语了,因为以前一直都没有问题

然后差问题呗,从ui上显示丢失了206机器的nm,重新启动206上的nm 然后我查看206机器nm的日志和207上的rm的日志

从日志上来看没有任何问题,nm显示注册到了207机器,207机器显示收到了206机器的注册,这就无语了,我累个法克

然后磨叽了好几个小时,在查看206 nm的ui上注意到了一个东西,上面显示NodeHealthyStatus为false而且还显示出data log bad

那就谷歌了一下

https://stackoverflow.com/questions/29131449/why-does-hadoop-report-unhealthy-node-local-dirs-and-log-dirs-are-bad

30down voteaccepted

The most common cause of local-dirs are bad is due to available disk space on the node exceeding yarn's max-disk-utilization-per-disk-percentage default value of 90.0%.

Either clean up the disk that the unhealthy node is running on, or increase the threshold in yarn-site.xml

<property>        <name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>        <value>98.5</value></property>

Avoid disabling disk check, because your jobs may failed when the disk eventually run out of space, or if there are permission issues. Refer to the yarn-site.xml Disk Checker section for more details.

大致的意思是 hdfs的数据目录使用率达到了90%,然后yarn就修改nm的状态为不健康,so 我们只需要进行修改阀值就ok,或者动手删除数据喽

我的选择是先修改阀值

30down voteaccepted

The most common cause of local-dirs are bad is due to available disk space on the node exceeding yarn's max-disk-utilization-per-disk-percentage default value of 90.0%.

Either clean up the disk that the unhealthy node is running on, or increase the threshold in yarn-site.xml

<property>        <name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>        <value>98.5</value></property>

阅读全文

0 0