hadoop cluster分布式安装
来源:互联网 发布:玩数码频道淘宝 编辑:程序博客网 时间:2024/06/06 02:24
本文主要说明hadoop的完全分布式的安装,至于单节点、伪分布式的安装很简单,基本上很少的配置,参加官网:
http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SingleCluster.html
安装环境:
1、三台Ubuntu同系统的服务器
uname -aLinux panzha-master 3.16.0-71-generic #92~14.04.1-Ubuntu SMP Thu May 12 23:31:46 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
2、hadoop 2.7.2
3、jdk 1.7
4、maven 3.3.9
HADOOP安装过程
1、安装ssh服务
2、
配置master无密码登录
2.1在/root/.ssh目录下面键入下面命令,生成密钥对:
ssh-keygen -t rsa //一路回车,不用输入文件名和密码
2.2然后将公钥追加到文件
cat id_rsa.pub >> authorized_keys
2.3赋予权限
chmod 600 .ssh/authorized_keys //可以是600或者700,其他用户或者用户组
不能拥有权限,不然用可能导致搭建不成功
2.4验证本机可以无密码访问
Ssh10.124.22.213
3、配置master本机无密码到slave1、slave2
将master的公钥复制到slave1和slave2,并追加到他们的authorized_keys文件,然后验证是否可以在master登录到slave1和slave2
4、配置slave1和slave2登录到master
在slave1的/root/.ssh下面生成公钥然后复制到master,并追加到authorized_keys文件,然后验证
总结:可以用ssh-copy-id命令将公钥复制到远程机器的authorized_keys文件,为了方便起见可以在所有的机器上先生成无密码的公钥,然后追加到master的authorized_keys文件,然后将master的authorized_keys文件copy到各个slave机器
5、安装hadoop
1、安装jdk、hadoop、maven。
1.1在官网http://hadoop.apache.org/releases.html上下载一个稳定版本的hadoop到master,然后用tar -zxvf 解压
2.2下载maven
Wgethttp://www.interior-dsgn.com/apache/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz
3.3安装jdk(1.7以上版本)
2、环境变量配置(我的maven、jdk、hadoop全部安装在/BBBB目录下,方便管理)
在root/.bashrc的末尾加上下面配置
exportJAVA_HOME=/BBBB/jdk1.7
exportJRE_HOME=$JAVA_HOME/jre
exportM2_HOME=/BBBB/maven3.3.9
exportM2=$M2_HOME/bin
exportHADOOP_INSTALL=/BBBB/hadoop2.7.2
exportPATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin:$M2_HOME/bin:$HADOOP_INSTALL/bin:$HADOOP_INSTALL/sbin
3、hadoop环境配置
1、在hadoop的安装目录下:etc、hadoop下面的hadoop-env.sh中配置jdk的安装目录
JAVA_HOME=/BBBB/jdk1.7exportJAVA_HOME=${JAVA_HOME}
2、配置core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://10.124.22.213:9000</value> </property> <!--<property> <name>hadoop.tmp.dir</name> <value>file:/BBBB/hadoop2.7.2/tmp</value> </property>--> <property> <name>io.file.buffer.size</name> <value>13170</value> </property></configuration>
3、配置hdfs-site.xml文件
<configuration> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/BBBB/hadoop2.7.2/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/BBBB/hadoop2.7.2/dfs/data</value> </property> </configuration>
4、配置mapred-site.xml文件
<configuration> <property> <name>mapred.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>10.124.22.213:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>10.124.22.213:19888</value> </property></configuration>
5、配置yarn-site.xml
<configuration> <!--Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>10.124.22.213:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>10.124.22.213:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>10.124.22.213:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>10.124.22.213:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>10.124.22.213:8088</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>768</value> </property></configuration>
6、配置slaves文件
10.124.10.910.75.7.64
7、配置hosts和hostname文件(在2.7.2,机器间通知hostname来通信,如果不配置将无法通信)
master、slave1和slave2的hosts文件和hostname文件均要配置
hosts文件:
10.124.22.213 panzha-master
10.124.10.9 panzha-slave001
10.75.7.64 panzha-slave002
hostname文件:
写上你的主机名
6、将配置好的hadoop整个目录 拷贝到slave1和slave2
7、启动hadoop
Start-dfs.sh
Start-yarn.sh
常用命令
1、格式化HDFS文件系统:
Hadooopnamenode -format
2、命令查看Hadoop集群的状态hadoop dfsadmin -report
hadoop fs -copyFromLocal /BBBB/b.txt /user/sunny/d.txt //将本地的b.txt,拷贝到HDFS中
hadoop fs -copytoLocal XXX XXX //将HDFS中的文件复制到本地
Hadoop fs-mkidr XXX
Hadoop fs-ls
Hadoop fs等价于 hdfs dfs
常见错误
Namenodenot formatted
运行如下命令,然后重启start-dfs.sh
hadoop namenode -format
2、How-to:Resolve "Datanode denied communication with namenode because hostnamecannot be resolved (
Error:
org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException:Datanode denied communication with namenode because hostname cannot be resolved(ip=172.31.34.70, h
ostname=172.31.34.70): DatanodeRegistration(0.0.0.0,datanodeUuid=e218bf57-41b5-46a6-8343-ced968272b9a, infoPort=50075,ipcPort=50020, storageInfo=lv=-56;cid=CID-28174
aeb-5c3e-4500-ba25-1d9ca001538f;nsid=122644145;c=0)
atorg.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:904)
atorg.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:5085)
atorg.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1140)
atorg.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92)
atorg.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:27329)
atorg.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
atorg.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
atorg.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
atorg.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
atjava.security.AccessController.doPrivileged(Native Method)
atjavax.security.auth.Subject.doAs(Subject.java:415)
atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
atorg.apache.hadoop.ipc.Server$Handler.run(Server.java:2038)
Analytics:
This error was throwned viaorg/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java:
public void registerDatanode(DatanodeRegistrationnodeReg)
throwsDisallowedDatanodeException, UnresolvedTopologyException {
InetAddress dnAddress = Server.getRemoteIp();
if (dnAddress != null) {
// Mostly called inside an RPC,update ip and peer hostname
String hostname =dnAddress.getHostName();
String ip =dnAddress.getHostAddress();
if(checkIpHostnameInRegistration && !isNameResolved(dnAddress)) {
// Rejectregistration of unresolved datanode to prevent performance
// impact ofrepetitive DNS lookups later.
final String message= "hostname cannot be resolved (ip="
+ ip + ", hostname=" + hostname + ")";
LOG.warn("Unresolved datanode registration: " + message);
throw newDisallowedDatanodeException(nodeReg, message);
}
checkIpHostnameInRegistration iscontroled by hdfs-site.xml property"dfs.namenode.datanode.registration.ip-hostname-check". And isNameResolved(dnAddress) is controle byreverse dns and /etc/hosts.
Resolution:
Add datanode ip/hostname into/etc/hosts. Or enable reverse DNS(Make sure you could get hostname via command:host ip). Or add following property in hdfs-site.xml:
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
</property>
3、safe modehadoop dfsadmin -safemode leave
4、org.apache.hadoop.ipc.RemoteException(java.io.IOException): File/a.tcl._COPYING_ could only be replicated to 0 nodes instead of minReplication(=1). There are 0 datanode(s) runningand no node(s) are excluded in this operation.
解决方案:
这个问题是由于没有添加节点的原因,也就是说需要先启动namenode,再启动datanode,然后启动jobtracker和tasktracker。这样就不会存在这个问题了。目前解决办法是分别启动节点#hadoop-daemon.sh start namenode#$hadoop-daemon.sh start datanode
1. 重新启动namenode
# hadoop-daemon.sh start namenode
starting namenode, logging to/usr/hadoop-0.21.0/bin/../logs/hadoop-root-namenode-www.keli.com.out
2. 重新启动datanode
# hadoop-daemon.sh start datanode
starting datanode, logging to/usr/hadoop-0.21.0/bin/../logs/hadoop-root-datanode-www.keli.com.out
参考文献:
http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/ClusterSetup.html
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
- hadoop cluster分布式安装
- Hadoop cluster安装部署
- install cluster hadoop 安装集群版hadoop
- Hadoop分布式安装
- Hadoop简易分布式安装
- Hadoop分布式安装
- Hadoop 伪分布式安装
- hadoop 伪分布式安装
- Hadoop 伪分布式安装
- hadoop分布式安装
- hadoop 分布式文件系统安装
- Hadoop分布式安装
- HADOOP伪分布式安装
- hadoop完全分布式安装
- hadoop分布式安装过程
- hadoop 安装 分布式模式
- hadoop伪分布式安装
- hadoop 分布式安装
- 考研笔记--有些事情不努力不知道结果
- mybatis 3.x 缓存Cache的使用
- Dichotomy(递+非递)
- 最长公共子序列的经典c++解法
- 357. Count Numbers with Unique Digits
- hadoop cluster分布式安装
- Ubuntu16.04下Sublime Text 3解决无法输入中文的方法
- mstsc解决远程桌面连接提示"远程终端连接数超过了允许连接数"
- Java源代码分析之StringBuffer
- linux网络编程--TCP/IP协议
- MySQL5.7 SLAVE监控zabbix报警 报错Slave_SQL_Running_State: invalidating query cache entries (table)处理
- LR场景运行提示:This Vuser already started a transaction with the same name, and has not yet processed the
- storm如何保证可靠性传输
- android中保存Bitmap图片写入Sdcard