Hadoop集群活跃节点为0解决方案
来源:互联网 发布:风险矩阵法的作用 编辑:程序博客网 时间:2024/05/29 18:27
Hadoop集群活跃节点为0解决方案
问题描述
我按照教程修改了Hadoop下的配置文件,然后使用start-all.sh启动整个集群。jps查看每个节点该启动的进程都启动了,可以访问master:50070页面,但是其中的Live Nodes项显示为0, 可是我明明有两个data node服务器。
解决思路
造成这样结果的原因有很多,如多单纯的copy这样的问题去问度娘,很难快速得到满意的答案。但是也可以用这一步做一些基本的问题排除。比如,查看一下你的Name Node的Cluster ID与Data Node的Cluster ID是否一样,防火墙是否关闭,或者相应的端口是否开放等等。
在排除了一些网上可搜到的常见问题之后,最靠谱的操作就是查看日志!
解决方案
首先查看NameNode的输出日志:
cat /home/hostname/hadoop/hadoop-2.8.2/logs/hadoop-pangying-namenode-master.log
注:我是用的rz指令直接将日志输出到本地客户端
2017-11-10 18:37:11,685 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]2017-11-10 18:37:11,694 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: createNameNode []2017-11-10 18:37:12,167 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties2017-11-10 18:37:12,302 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).2017-11-10 18:37:12,302 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system started2017-11-10 18:37:12,328 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: fs.defaultFS is hdfs://master:90002017-11-10 18:37:12,334 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Clients are to use master:9000 to access this namenode/service.2017-11-10 18:37:12,776 INFO org.apache.hadoop.util.JvmPauseMonitor: Starting JVM pause monitor2017-11-10 18:37:12,795 INFO org.apache.hadoop.hdfs.DFSUtil: Starting Web-server for hdfs at: http://0.0.0.0:500702017-11-10 18:37:12,873 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog2017-11-10 18:37:12,888 INFO org.apache.hadoop.security.authentication.server.AuthenticationFilter: Unable to initialize FileSignerSecretProvider, falling back to use random secrets.2017-11-10 18:37:12,897 INFO org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.namenode is not defined2017-11-10 18:37:12,906 INFO org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)2017-11-10 18:37:12,913 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context hdfs2017-11-10 18:37:12,913 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static2017-11-10 18:37:12,913 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs2017-11-10 18:37:13,080 INFO org.apache.hadoop.http.HttpServer2: Added filter 'org.apache.hadoop.hdfs.web.AuthFilter' (class=org.apache.hadoop.hdfs.web.AuthFilter)2017-11-10 18:37:13,084 INFO org.apache.hadoop.http.HttpServer2: addJerseyResourcePackage: packageName=org.apache.hadoop.hdfs.server.namenode.web.resources;org.apache.hadoop.hdfs.web.resources, pathSpec=/webhdfs/v1/*2017-11-10 18:37:13,101 INFO org.apache.hadoop.http.HttpServer2: Jetty bound to port 500702017-11-10 18:37:13,101 INFO org.mortbay.log: jetty-6.1.262017-11-10 18:37:13,409 INFO org.mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:500702017-11-10 18:37:13,445 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Only one image storage directory (dfs.namenode.name.dir) configured. Beware of data loss due to lack of redundant storage directories!2017-11-10 18:37:13,445 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Only one namespace edits storage directory (dfs.namenode.edits.dir) configured. Beware of data loss due to lack of redundant storage directories!2017-11-10 18:37:13,476 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Edit logging is async:false2017-11-10 18:37:13,486 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: KeyProvider: null2017-11-10 18:37:13,486 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsLock is fair: true
并没有看到错误,只看到了两个警告,我也试图在网上搜过有关这两个警告的原因和解决方法,但是没有找到很好的解释。于是,我去查看,Data Node的日志。
cat /home/pangying/hadoop/hadoop-2.8.2/logs/hadoop-pangying-datanode-slave1.log
同样,其实我是将日志保存到本地查看的。
2017-11-10 18:18:59,084 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.31.134:9000. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)2017-11-10 18:18:59,087 WARN org.apache.hadoop.ipc.Client: Failed to connect to server: master/192.168.31.134:9000: retries get failed due to exceeded maximum allowed retries number: 10java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:682) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:778) at org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:410) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1544) at org.apache.hadoop.ipc.Client.call(Client.java:1375) at org.apache.hadoop.ipc.Client.call(Client.java:1339) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) at com.sun.proxy.$Proxy15.versionRequest(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.versionRequest(DatanodeProtocolClientSideTranslatorPB.java:274) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:215) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:261) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:750) at java.lang.Thread.run(Thread.java:748)
发现在Data Node这里出现了错误,显示无法连接master(Name Dode)的9000端口。于是我在Data Node的服务器上,使用
telnet master 9000
指令,发现确实无法连接master的9000端口。在确定该端口进程存在且防火墙已关的情况下,我发现我是host文件出了问题。
sudo vim /etc/hosts
将前两行注释掉,然后在加入本机的IP,而不是127.0.0.1
使修改过的hosts文件生效
source /etc/hosts
启动Hadoop集群
访问网址http://192.168.31.134:50070
可以看到,Live Nodes为2。
访问网址http://192.168.31.134:8088, 也可以看到存活的节点为2。
END
- Hadoop集群活跃节点为0解决方案
- Hadoop异常笔记之活跃节点为零
- Hadoop集群live nodes为0解决方案
- Ubuntu-Hadoop集群live nodes为0解决方案
- hadoop 搭建3节点集群,遇到Live Nodes显示为0时解决办法
- hadoop大集群优化配置,datanode节点数量为100
- Hadoop集群下线节点
- hadoop集群添加节点
- Hadoop集群节点扩展
- hadoop集群删除节点
- Hadoop集群新增节点
- hadoop集群出现live nodes为1的解决方案
- Hadoop问题集锦:Hadoop 2.7.0 集群添加新节点
- Hadoop集群安装部署---从单节点的伪分布式扩展为多节点分布式
- hadoop 集群搭建 三个节点
- CDH3 Hadoop集群摘除节点
- 部署Hadoop集群,三个节点
- hadoop集群增加删除节点
- 深入理解 Promise 五部曲:3. 可靠性问题
- Converting circular structure to JSON
- Markdown编辑器初体验
- Linux 权限设置
- 个人命令整理(一)
- Hadoop集群活跃节点为0解决方案
- SimSo Web 开发者手册(中文版)
- ES简要知识点
- 首届全球程序员节:一个群体、一张名片、一方生态
- 分布式开发简介
- Golang 验证 struct 字段的数据格式
- Android获取应用列表
- log4j的分类打印
- 数据结构