Hadoop集群各种坑

来源：互联网发布：咏春拳教学软件编辑：程序博客网时间：2024/06/05 01:57

最近在玩大数据集群的部署，在部署过程中遇到各种坑爹的问题，真实日了TMD日本狗了。

起初是hadoop主节点9000端口问题，刚开始查国内国外各种问题，从异常日志中提示是IP端口被占用，然后直接被shutdown，各种无厘头的问题，后面根据日志慢慢分析

Starting namenodes on [Master]
Master: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Master: @ WARNING: POSSIBLE DNS SPOOFING DETECTED! @
Master: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Master: The ECDSA host key for master has changed,
Master: and the key for the corresponding IP address 192.168.40.245
Master: is unknown. This could either mean that
Master: DNS SPOOFING is happening or the IP address for the host
Master: and its host key have changed at the same time.
Master: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Master: @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
Master: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Master: IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Master: Someone could be eavesdropping on you right now (man-in-the-middle attack)!
Master: It is also possible that a host key has just been changed.
Master: The fingerprint for the ECDSA key sent by the remote host is
Master: cb:7f:7b:bb:12:69:08:92:77:f8:35:64:d5:df:c8:f5.
Master: Please contact your system administrator.
Master: Add correct host key in /root/.ssh/known_hosts to get rid of this message.
Master: Offending ECDSA key in /root/.ssh/known_hosts:4
Master: ECDSA host key for master has changed and you have requested strict checking.
Master: Host key verification failed.

密钥的问题，原因是本台主机换过IP，换IP之前已经做过一次免密钥登录，后面再做一次免密钥，所以解决方法是删除之前的密钥，在 /root/.ssh/know_hosts 中，将旧的进行删除。

这时候，9000端口终于正常启动，接下来又是一个问题，两台slave节点的datanode服务一直启动不起来，通过控制台 http://host:50070 查看的话，也没找到对应的datanode节点

2017-07-14 22:12:26,150 WARN  [Thread-69] hdfs.DFSClient: DataStreamer Exceptionorg.apache.hadoop.ipc.RemoteException(java.io.IOException): File /hbase/.tmp/hbase.version could only be replicated to 0 nodes instead of minReplication (=1).  There are 0 datanode(s) running and no node(s) are excluded in this operation.at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1733)at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:265)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2496)at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:828)at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:506)at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:845)at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:788)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2455

2017-07-14 22:09:25,947 WARN org.apache.hadoop.hdfs.server.common.Util: Path /data/hadoop/name should be specified as a URI in configuration files. Please update hdfs configuration.2017-07-14 22:09:25,947 WARN org.apache.hadoop.hdfs.server.common.Util: Path /data/hadoop/name should be specified as a URI in configuration files. Please update hdfs configuration.2017-07-14 22:09:25,948 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Only one image storage directory (dfs.namenode.name.dir) configured. Beware of data loss due to lack of redundant storage directories!2017-07-14 22:09:25,948 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Only one namespace edits storage directory (dfs.namenode.edits.dir) configured. Beware of data loss due to lack of redundant storage directories!2017-07-14 22:09:25,955 WARN org.apache.hadoop.hdfs.server.common.Util: Path /data/hadoop/name should be specified as a URI in configuration files. Please update hdfs configuration.2017-07-14 22:09:25,956 WARN org.apache.hadoop.hdfs.server.common.Util: Path /data/hadoop/name should be specified as a URI in configuration files. Please update hdfs configuration.

通过日志并没有很好的定位好具体的问题。

但是有个比较奇葩的问题，在我本机hadoop namenode -format进行初始化的时候，本机的主机名会自动变成我配置的对应master slave，但是在这个新环境却总是显示原来旧的主机名。

这时我就开始怀疑主机名的问题，查看本机主机名

查看主机名信息：hostnamectl

修改主机名：hostnamectl set-hostname Master

然后删除 /hadoop/name 以下内容

name是在hdfs-site.xml中配置对应name

<value>/data/hadoop/name</value>

</property>

然后把 Slave1 Slave2的hadoop删除，重新再scp复制过去。

复制完成，再进行初始化

/hadoop/bin/hadoop namenode -format