CentOS 7下安装集群Hadoop-2.7.3
来源:互联网 发布:程序员入门必看的书籍 编辑:程序博客网 时间:2024/05/16 15:02
hostname与/etc/hosts的关系
很过人一提到更改hostname首先就想到修改/etc/hosts文件,认为hostname的配置文件就是/etc/hosts。其实不是的。 hosts文件的作用相当如DNS,提供IP地址到hostname的对应。早期的互联网计算机少,单机hosts文件里足够存放所有联网计算机。不过随着互联网的发展,这就远远不够了。于是就出现了分布式的DNS系统。由DNS服务器来提供类似的IP地址到域名的对应。具体可以man hosts。 Linux系统在向DNS服务器发出域名解析请求之前会查询/etc/hosts文件,如果里面有相应的记录,就会使用hosts里面的记录。/etc/hosts文件通常里面包含这一条记录
转至 https://my.oschina.net/xhhuang/blog/807914
一、硬件环境
我使用的硬件是云创的一个minicloud设备。由三个节点(每个节点8GB内存+128GB SSD+3块3TB SATA)和一个千兆交换机组成。
二、安装前准备
1.在CentOS 7下新建hadoop用户,官方推荐的是hadoop、mapreduce、yarn分别用不同的用户安装,这里我为了省事就全部在hadoop用户下安装了。
2.下载安装包:
1)JDK:jdk-8u112-linux-x64.rpm
下载地址:http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
2)Hadoop-2.7.3:hadoop-2.7.3.tar.gz
下载地址:http://archive.apache.org/dist/hadoop/common/stable2/
3.卸载CentOS 7自带的OpenJDK(root权限下)
1)首先查看系统已有的openjdk
rpm -qa|grep jdk
看到如下结果:
[hadoop@localhost Desktop]$ rpm -qa|grep jdkjava-1.7.0-openjdk-1.7.0.111-2.6.7.2.el7_2.x86_64java-1.8.0-openjdk-headless-1.8.0.101-3.b13.el7_2.x86_64java-1.8.0-openjdk-1.8.0.101-3.b13.el7_2.x86_64java-1.7.0-openjdk-headless-1.7.0.111-2.6.7.2.el7_2.x86_64
2)卸载上述找到的openjdk包
yum -y remove java-1.7.0-openjdk-1.7.0.111-2.6.7.2.el7_2.x86_64yum -y remove java-1.8.0-openjdk-headless-1.8.0.101-3.b13.el7_2.x86_64yum -y remove java-1.8.0-openjdk-1.8.0.101-3.b13.el7_2.x86_64yum -y remove java-1.7.0-openjdk-headless-1.7.0.111-2.6.7.2.el7_2.x86_64
4.安装Oracle JDK(root权限下)
rpm -ivh jdk-8u112-linux-x64.rpm
安装完毕后,jdk的路径为/usr/java/jdk1.8.0_112
接着将安装的jdk的路径添加至系统环境变量中:
vi /etc/profile
在文件末尾加上如下内容:
export JAVA_HOME=/usr/java/jdk1.8.0_112export JRE_HOME=/usr/java/jdk1.8.0_112/jreexport PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/binexport CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
关闭profile文件,执行下列命令使配置生效:
source /etc/profile
此时我们就可以通过java -version命令检查jdk路径是否配置成功,如下所示:
[root@localhost jdk1.8.0_112]# java -versionjava version "1.8.0_112"Java(TM) SE Runtime Environment (build 1.8.0_112-b15)Java HotSpot(TM) 64-Bit Server VM (build 25.112-b15, mixed mode)[root@localhost jdk1.8.0_112]#
5.关闭防火墙(root权限下)
执行下述命令关闭防火墙:
systemctl stop firewalld.service systemctl disable firewalld.service
在终端效果如下:
[root@localhost Desktop]# systemctl stop firewalld.service [root@localhost Desktop]# systemctl disable firewalld.serviceRemoved symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.Removed symlink /etc/systemd/system/basic.target.wants/firewalld.service.[root@localhost Desktop]#
6.修改主机名并配置相关网络(root权限下)
1)修改主机名
在master主机上
hostnamectl set-hostname Master
在slave1主机上
hostnamectl set-hostname slave1
在slave2主机上
hostnamectl set-hostname slave2
2)配置网络
以master主机为例,演示如何配置静态网络及host文件。
我的机器每个节点有两块网卡,我配置其中一块网卡为静态IP作为节点内部通信使用。
vi /etc/sysconfig/network-scripts/ifcfg-enp7s0
(注:我的master机器上要配置的网卡名称为ifcfg-enp7s0)
ifcfg-enp7s0原始内容如下:
TYPE=EthernetBOOTPROTO=dhcpDEFROUTE=yesPEERDNS=yesPEERROUTES=yesIPV4_FAILURE_FATAL=noIPV6INIT=yesIPV6_AUTOCONF=yesIPV6_DEFROUTE=yesIPV6_PEERDNS=yesIPV6_PEERROUTES=yesIPV6_FAILURE_FATAL=noNAME=enp7s0UUID=914595f1-e6f9-4c9b-856a-c4bd79ffe987DEVICE=enp7s0ONBOOT=no
修改为:
TYPE=EthernetONBOOT=yesDEVICE=enp7s0UUID=914595f1-e6f9-4c9b-856a-c4bd79ffe987BOOTPROTO=staticIPADDR=59.71.229.189GATEWAY=59.71.229.254DEFROUTE=yesIPV6INIT=noIPV4_FAILURE_FATAL=yes
3)修改/etc/hosts文件
vi /etc/hosts
加入以下内容:
59.71.229.189 master59.71.229.190 slave159.71.229.191 slave2
为集群中所有节点执行上述的网络配置及hosts文件配置。
7.配置集群节点SSH免密码登录(hadoop权限下)
这里我为了方便,是配置的集群中任意节点能够SSH免密码登录到集群其他任意节点。具体步骤如下:
1)对于每一台机器,在hadoop用户下执行以下指令:
ssh-keygen -t rsa -P ''
直接按Enter到底。
2)对于每台机器,首先将自己的公钥加到authorized_keys中,保证ssh localhost无密码登录:
cat id_rsa.pub >> authorized_keys
3)然后将自己的公钥添加至其他每台机器的authorized_keys中,在此过程中需要输入其他机器的密码:
master:scp /home/hadoop/.ssh/id_rsa.pub hadoop@slave1:/home/hadoop/.ssh/id_rsa_master.pubscp /home/hadoop/.ssh/id_rsa.pub hadoop@slave2:/home/hadoop/.ssh/id_rsa_master.pubslave1:scp /home/hadoop/.ssh/id_rsa.pub hadoop@master:/home/hadoop/.ssh/id_rsa_slave1.pubscp /home/hadoop/.ssh/id_rsa.pub hadoop@slave2:/home/hadoop/.ssh/id_rsa_slave1.pubslave2:scp /home/hadoop/.ssh/id_rsa.pub hadoop@master:/home/hadoop/.ssh/id_rsa_slave2.pubscp /home/hadoop/.ssh/id_rsa.pub hadoop@slave1:/home/hadoop/.ssh/id_rsa_slave2.pub
4)分别进每一台主机的/home/hadoop/.ssh/目录下,将除本机产生的公钥(id_rsa.pub)之外的其他公钥使用cat命令添加至authorized_keys中。添加完毕之后使用chmod命令给authorized_keys文件设置权限,然后使用rm命令删除所有的公钥:
master:cat id_rsa_slave1.pub >> authorized_keyscat id_rsa_slave2.pub >> authorized_keyschmod 600 authorized_keysrm id_rsa*.pubslave1:cat id_rsa_master.pub >> authorized_keyscat id_rsa_slave2.pub >> authorized_keyschmod 600 authorized_keysrm id_rsa*.pubslave2:cat id_rsa_master.pub >> authorized_keyscat id_rsa_slave1.pub >> authorized_keyschmod 600 authorized_keysrm id_rsa*.pub
完成上述步骤,就可以实现从任意一台机器通过ssh命令免密码登录任意一台其他机器了。
三、安装和配置Hadoop(下述步骤在hadoop用户下执行)
1.将hadoop-2.7.3.tar.gz文件解压至/home/hadoop/目录下(在本文档中,文件所在地是hadoop账户下桌面上)可通过下述命令先解压至文件所在地:
tar -zxvf hadoop-2.7.3.tar.gz
然后将解压的文件hadoop-2.7.3所有内容拷贝至/home/hadoop目录下,拷贝之后删除文件所在地的hadoop文件夹:
cp -r /home/hadoop/Desktop/hadoop-2.7.3 /home/hadoop/
2.具体配置过程:
1)在master上,首先/home/hadoop/目录下创建以下目录:
mkdir -p /home/hadoop/hadoopdir/namemkdir -p /home/hadoop/hadoopdir/datamkdir -p /home/hadoop/hadoopdir/tempmkdir -p /home/hadoop/hadoopdir/logsmkdir -p /home/hadoop/hadoopdir/pids
2)然后通过scp命令将hadoopdir目录复制至其他节点:
scp -r /home/hadoop/hadoopdir hadoop@slave1:/home/hadoop/scp -r /home/hadoop/hadoopdir hadoop@slave2:/home/hadoop/
3)进入/home/hadoop/hadoop-2.7.3/etc/hadoop目录下,修改以下文件:
hadoop-env.sh:export JAVA_HOME=/usr/java/jdk1.8.0_112export HADOOP_LOG_DIR=/home/hadoop/hadoopdir/logsexport HADOOP_PID_DIR=/home/hadoop/hadoopdir/pidsmapred-env.sh:export JAVA_HOME=/usr/java/jdk1.8.0_112export HADOOP_MAPRED_LOG_DIR=/home/hadoop/hadoopdir/logsexport HADOOP_MAPRED_PID_DIR=/home/hadoop/hadoopdir/pidsyarn-env.sh:export JAVA_HOME=/usr/java/jdk1.8.0_112YARN_LOG_DIR=/home/hadoop/hadoopdir/logsSlaves文件:#localhostslave1slave2
(注意:如果slaves文件里面不注释localhost,意思是把本机也作为一个DataNode节点)
core-site.xml:<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:///home/hadoop/hadoopdir/temp</value> </property></configuration>hdfs-site.xml:<configuration> <property> <name>dfs.namenode.name.dir</name> <value>file:///home/hadoop/hadoopdir/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:///home/hadoop/hadoopdir/data</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.blocksize</name> <value>64m</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>master:9001</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property></configuration>mapred-site.xml:cp mapred-site.xml.template mapred-site.xmlvi mapred-site.xml<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> <final>true</final> </property> <property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value> </property> <property> <name>mapreduce.jobtracker.http.address</name> <value>master:50030</value> </property> <property> <name>mapred.job.tracker</name> <value>http://master:9001</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value> </property></configuration>yarn-site.xml:<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value></property><property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value></property><property> <name>yarn.resourcemanager.hostname</name> <value>master</value></property><property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value></property><property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8031</value></property><property> <name>yarn.resourcemanager.address</name> <value>master:8032</value></property><property> <name>yarn.resourcemanager.admin.address</name> <value>master:8033</value></property><property> <name>yarn.resourcemanager.webapp.address</name> <value>master:8088</value></property>
4)master机器下,将/home/hadoop/hadoop-2.7.3目录里面所有内容拷贝至其他节点
scp -r /home/hadoop/hadoop-2.7.3 hadoop@slave1:/home/hadoop/scp -r /home/hadoop/hadoop-2.7.3 hadoop@slave2:/home/hadoop/
5)进入/home/hadoop/hadoop-2.7.3/bin目录,格式化文件系统:
./hdfs namenode -format
格式化文件系统会产生一系列的终端输出,在输出最后几行看到STATUS=0表示格式化成功,如果格式化失败请详细查看日志确定错误原因。
下面是我的错误和解决方法:
17/10/26 19:44:34 INFO ipc.Client: Retrying connect to server: slave2/192.168.84.202:8485. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)17/10/26 19:44:34 INFO ipc.Client: Retrying connect to server: slave3/192.168.84.203:8485. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)17/10/26 19:44:34 INFO ipc.Client: Retrying connect to server: slave1/192.168.84.201:8485. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)17/10/26 19:44:35 INFO ipc.Client: Retrying connect to server: slave2/192.168.84.202:8485. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)17/10/26 19:44:35 INFO ipc.Client: Retrying connect to server: slave3/192.168.84.203:8485. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)17/10/26 19:44:35 INFO ipc.Client: Retrying connect to server: slave1/192.168.84.201:8485. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)17/10/26 19:44:36 INFO ipc.Client: Retrying connect to server: slave2/192.168.84.202:8485. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)17/10/26 19:44:36 WARN namenode.NameNode: Encountered exception during format: org.apache.hadoop.hdfs.qjournal.client.QuorumException: Unable to check if JNs are ready for formatting. 1 exceptions thrown:192.168.84.202:8485: Call From master/192.168.84.200 to slave2:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81) at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.hasSomeData(QuorumJournalManager.java:232) at org.apache.hadoop.hdfs.server.common.Storage.confirmFormat(Storage.java:901) at org.apache.hadoop.hdfs.server.namenode.FSImage.confirmFormat(FSImage.java:184) at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:988) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1434) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559)17/10/26 19:44:36 INFO ipc.Client: Retrying connect to server: slave1/192.168.84.201:8485. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)17/10/26 19:44:36 INFO ipc.Client: Retrying connect to server: slave3/192.168.84.203:8485. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)17/10/26 19:44:36 ERROR namenode.NameNode: Failed to start namenode.org.apache.hadoop.hdfs.qjournal.client.QuorumException: Unable to check if JNs are ready for formatting. 1 exceptions thrown:192.168.84.202:8485: Call From master/192.168.84.200 to slave2:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81) at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.hasSomeData(QuorumJournalManager.java:232) at org.apache.hadoop.hdfs.server.common.Storage.confirmFormat(Storage.java:901) at org.apache.hadoop.hdfs.server.namenode.FSImage.confirmFormat(FSImage.java:184) at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:988) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1434) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559)17/10/26 19:44:36 INFO util.ExitUtil: Exiting with status 117/10/26 19:44:36 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************SHUTDOWN_MSG: Shutting down NameNode at master/192.168.84.200************************************************************/
6)进入/home/hadoop/hadoop-2.7.3/sbin目录:
./start-dfs.sh./start-yarn.sh
上述命令就启动了hdfs和yarn。hadoop集群就跑起来了,如果要关闭,在sbin目录下执行以下命令:
./stop-yarn.sh./stop-dfs.sh
7)HDFS启动示例
执行start-dfs.sh之后,可以在master:50070网页上看到如下结果,可以看到集群信息和datanode相关信息:
执行start-yarn.sh之后,可以在master:8088网页上看到如下结果,可以看到集群信息相关信息:
- CentOS 7下安装集群Hadoop-2.7.3
- centos下安装ganglia监控hadoop集群
- CentOS环境下安装Hadoop集群
- Hadoop集群 CentOS安装配置
- Hadoop 2.7.1在centos 6.3下配置变量环境并安装配置伪分布式集群
- centos 6.3下安装Hadoop 2.7.1并配置伪分布式集群
- CentOS系统下的Hadoop集群(第3期)_VSFTP安装配置
- CentOS系统下的Hadoop集群(第3期)_VSFTP安装配置
- centos下安装hadoop
- CentOS下安装hadoop
- CentOS下安装hadoop
- CentOS 7下安装集群HBase1.2.4
- CentOS 7下配置hadoop 2.8 分布式集群
- centos 7下Hadoop 2.7.2 伪分布式安装
- Centos 7 下安装hadoop 2.6.x
- CentOS 7 Hadoop 集群搭建
- Ubuntu下hadoop集群安装
- hadoop集群下spark安装
- hdu 5115 Dire Wolf(区间dp)
- 单页面应用下的比如angular做聚合支付时候拦截后退的坑
- Hive 建表 加载数据 查询
- framebuffer 子系统分析
- [bzoj2169]连边
- CentOS 7下安装集群Hadoop-2.7.3
- Java传递参数
- noSQL-redis学习(五) -- redis事务
- TTL转485电路设计
- composer使用详解
- 积分图计算方法
- java.lang.IllegalThreadStateException异常分析
- PCA的数学原理
- MySQL中主从复制的原理