VM下CentOS7x86-64bit+JDK1.8+hadoop2.7.2安装部署
来源:互联网 发布:手机精确定位软件 编辑:程序博客网 时间:2024/06/06 02:08
今天,记录一下CentOS7x86-64bit+JDK1.8+hadoop2.7.2安装部署过程。
环境介绍及规划
[root@master etc]# cat redhat-release CentOS Linux release 7.2.1511 (Core)
192.168.1.200 master192.168.1.201 slave1192.168.1.202 slave2我这里用的是CentOS7x86-64bit的。为了节约时间,我们可以安装一台虚拟机后,再进行虚拟机的复制,拷贝出另外两台虚拟机。不过需要注意虚拟机在开机时,第二台和第三台要选择copy模式,或者自行去修改网卡的mac地址。这一部分内容不在这里赘述,如果需要,可在网上自行查阅相关文档。
准备好三台虚拟机后,我们来进行下面的部署配置,一起来吧!
一、配置主机名及IP地址
CentOS7中,修改主机名的配置文件是/etc/hostname
[root@master etc]# vim hostname master
IP地址的修改
[root@master ~]# vim /etc/sysconfig/network-scripts/ifcfg-ens33 TYPE=EthernetBOOTPROTO=staticDEFROUTE=yesPEERDNS=yesPEERROUTES=yesIPV4_FAILURE_FATAL=noNAME=ens33UUID=c8299190-539a-4ba2-b34c-1c4f72d10e09DEVICE=ens33ONBOOT=yesIPADDR=192.168.1.200NETMASK=255.255.255.0GATEWAY=192.168.1.1DNS1=192.168.1.1
二、修改hosts本地解析文件
[root@master ~]# vim /etc/hosts127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4::1 localhost localhost.localdomain localhost6 localhost6.localdomain6192.168.1.200 master192.168.1.201 slave1192.168.1.202 slave2
三、关闭防火墙和SELinux
切记一定要关闭,否则hadoop启动会出错。CentOS7下的防火墙命令与之前版本略有不同。如下
[root@master ~]# systemctl stop firewalld.service #关闭防火墙,但系统重启后会再次启动[root@master ~]# systemctl disable firewalld.service #彻底关闭,重启后也不会再跟着系统启动
查看防火墙的状态:两个命令都可以<pre name="code" class="html">[root@master ~]# firewall-cmd --statenot running[root@master ~]# systemctl status firewalld.service● firewalld.service - firewalld - dynamic firewall daemon Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled) Active: inactive (dead)关闭SELinux
[root@master ~]# setenforce 0 #关闭当前SELinux,单系统重启后,SELinux还会启动
[root@master ~]# vim /etc/selinux/config #修改SELinux配置文件,系统重启后也不会再启动SELinuxSELINUX=disabled查看SELinux状态
[root@master ~]# sestatusSELinux status: disabled
四、创建hadoop用户:grid
[[root@master ~]# useradd grid -s /bin/bash[root@master ~]# passwd gridChanging password for user grid.New password: Retype new password: passwd: all authentication tokens updated successfully.[root@master ~]#
注意:以上四步需要大家在三台虚拟机上都要做配置。IP和主机名可以根据我们之前的环境规划来配置。
五、配置免密码SSH登陆
hadoop各个节点之间需要数据通信,为了方便操作,在数据通信时,不要我们再逐台输入密码。我们需要配置免密码登录的ssh。
首先我们在master主机上,切换到grid用户下,配置grid用户的免密码登陆
1、创建公、私密钥 注意这一步在三台虚拟机上都要执行
[root@master ~]# su - grid[grid@master ~]$ ssh-keygen -t rsaGenerating public/private rsa key pair.Enter file in which to save the key (/home/grid/.ssh/id_rsa): Created directory '/home/grid/.ssh'.Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/grid/.ssh/id_rsa.Your public key has been saved in /home/grid/.ssh/id_rsa.pub.The key fingerprint is:11:ea:24:ab:26:1f:f2:a6:1f:47:72:76:f8:a0:8f:ec grid@masterThe key's randomart image is:+--[ RSA 2048]----+| . || . . || . o . || * . || . B o S || B + ||o * . . || B.* ||.=E . |+-----------------+
[grid@master ~]$ cd .ssh[grid@master .ssh]$ lsid_rsa id_rsa.pub[grid@master .ssh]$
2、将公钥发送给指定主机(我们在这里先配置与slave2的认证),并生成认证文件。
[grid@master .ssh]$ ssh-copy-id -i ./id_rsa.pub slave2The authenticity of host 'slave2 (192.168.1.202)' can't be established.ECDSA key fingerprint is df:f0:4e:f2:17:ee:2c:5f:3d:e8:8e:49:ea:dc:3d:2c.Are you sure you want to continue connecting (yes/no)? yes/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keysgrid@slave1's password: Number of key(s) added: 1Now try logging into the machine, with: "ssh 'slave1'"and check to make sure that only the key(s) you wanted were added.[grid@master .ssh]$ lsauthorized_keys id_rsa id_rsa.pub known_hosts
3、在slave1主机上也做1、2步的操作,让slave1与slave2建立免密码登陆
4、在slave2主机上,将本机的grid家目录下的.ssh/id_rsa.pub追加进authorized_keys文件中
[grid@slave2 ~]$ cat .ssh/id_rsa.pub >> .ssh/authorized_keys
5、最后,我们将slave2中的认证文件authorized_keys通过scp命令拷贝给master和slave1主机即可
[grid@slave2 ~]$ scp .ssh/authorized_keys master:/home/grid/.ssh/authorized_keys 100% 1179 1.2KB/s 00:00
[grid@slave2 ~]$ scp .ssh/authorized_keys slave1:/home/grid/.ssh/authorized_keys 100% 1179 1.2KB/s 00:00
ok,免密码登陆算是搞定了!
六、安装JDK
我们在这里使用的是1.8版本的,是目前最新版本(2016.2.28)我们把jdk放在/usr目录下
[grid@master usr]$ tar jdk-8u73-linux-x64.tar.gz
[grid@master .ssh]$ cd /usr[grid@master usr]$ lsbin etc games include jdk1.8.0_72 jdk-8u72-linux-x64.tar.gz lib lib64 libexec local sbin share src tmp[grid@master usr]$ java -versionjava version "1.8.0_72"Java(TM) SE Runtime Environment (build 1.8.0_72-b15)Java HotSpot(TM) 64-Bit Server VM (build 25.72-b15, mixed mode)
七、安装hadoop
下载解压到grid家目录
<pre name="code" class="html">[grid@master ~]$ tar -zxf <span style="font-family: Arial, Helvetica, sans-serif;">hadoop-2.7.2.tar.gz</span>
[grid@master ~]$ lshadoop-2.7.2 hadoop-2.7.2.tar.gz[grid@master ~]$ [grid@master ~]$ cd hadoop-2.7.2[grid@master hadoop-2.7.2]$ lsbin etc include lib libexec LICENSE.txt NOTICE.txt README.txt sbin share[grid@master hadoop-2.7.2]$
八、配置hadoop
这一步是整个安装中的重头戏,follow me。hadoop2.7.2的配置文件较之前版本也有变化。配置文件路径:
/home/grid/hadoop-2.7.2/etc/hadoop/
[grid@master ~]$ cd hadoop-2.7.2/etc/hadoop/[grid@master hadoop]$ lscapacity-scheduler.xml httpfs-env.sh mapred-env.shconfiguration.xsl httpfs-log4j.properties mapred-queues.xml.templatecontainer-executor.cfg httpfs-signature.secret mapred-site.xmlcore-site.xml httpfs-site.xml slaveshadoop-env.cmd kms-acls.xml ssl-client.xml.examplehadoop-env.sh kms-env.sh ssl-server.xml.examplehadoop-metrics2.properties kms-log4j.properties yarn-env.cmdhadoop-metrics.properties kms-site.xml yarn-env.shhadoop-policy.xml log4j.properties yarn-site.xmlhdfs-site.xml mapred-env.cmd[grid@master hadoop]$ hadoop versionHadoop 2.7.2Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r b165c4fe8a74265c792ce23f546c64604acf0e41Compiled by jenkins on 2016-01-26T00:08ZCompiled with protoc 2.5.0From source with checksum d0fda26633fa762bff87ec759ebe689cThis command was run using /home/grid/hadoop-2.7.2/share/hadoop/common/hadoop-common-2.7.2.jar
其中,我们要配置的文件有以下7个:
======hadoop-env.shexport JAVA_HOME=/usr/jdk1.8.0_73======yarn-env.shexport JAVA_HOME=/usr/jdk1.8.0_73======slavesslave1slave2======core-site.xml<configuration><property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value></property><property> <name>io.file.buffer.size</name> <value>131072</value></property><property> <name>hadoop.tmp.dir</name> <value>file:/home/grid/hadoop-2.7.2/tmp</value> <description>Abase for other temporary directories</description></property><property> <name>hadoop.proxyuser.hduser.hosts</name> <value>*</value></property><property> <name>hadoop.proxyuser.hduser.groups</name> <value>*</value></property></configuration>======hdfs-site.xml<configuration><property> <name>dfs.namenode.secondary.http-address</name> <value>master:9001</value></property><property> <name>dfs.namenode.name.dir</name> <value>file:/home/grid/hadoop-2.7.2/name</value></property><property> <name>dfs.datanode.data.dir</name> <value>file:/home/grid/hadoop-2.7.2/data</value></property><property> <name>dfs.replication</name> <value>2</value></property><property> <name>dfs.webhdfs.enabled</name> <value>true</value></property></configuration>======mapred-site.xml<configuration><property> <name>mapreduce.framework.name</name> <value>yarn</value></property><property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value></property><property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value></property></configuration>======yarn-site.xml<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value></property><property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value></property><property> <name>yarn.resourcemanager.address</name> <value>master:8032</value></property><property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value></property><property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8031</value></property><property> <name>yarn.resourcemanager.admin.address</name> <value>master:8033</value></property><property> <name>yarn.resourcemanager.webapp.address</name> <value>master:8088</value></property>
OK,到这,我们就算配置完成了。不过,我们要想使用hadoop,还需要格式化hdfs文件系统,当然最好能测试一下是否真正的可用了!下面让我们一起继续
为了方便使用hadoop和jdk的命令,我们将他们的bin目录放到/etc/profile文件中
export JAVA_HOME=/usr/jdk1.8.0_73export HADOOP_HOME=/home/grid/hadoop-2.7.2export PATH=$JAVA_HOME/bin:$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:/usr/apache-ant-1.9.6/bin
格式化hdfs
hadoop2.7.2版本的格式化命令也有变化之前用hadoop namenode -format,现在是hdfs
hdfs namenode -format执行上面的命令后,在后面的提示中如果看到successfully formatted的字样,说明我们hdfs格式化成功!
启动hadoop
[grid@master logs]$ start-dfs.sh Starting namenodes on [master]master: starting namenode, logging to /home/grid/hadoop-2.7.2/logs/hadoop-grid-namenode-master.outslave2: starting datanode, logging to /home/grid/hadoop-2.7.2/logs/hadoop-grid-datanode-slave2.outslave1: starting datanode, logging to /home/grid/hadoop-2.7.2/logs/hadoop-grid-datanode-slave1.outStarting secondary namenodes [master]master: starting secondarynamenode, logging to /home/grid/hadoop-2.7.2/logs/hadoop-grid-secondarynamenode-master.out[grid@master logs]$ jps11667 NameNode11960 Jps11851 SecondaryNameNode[grid@master logs]$ start-yarn.sh starting yarn daemonsstarting resourcemanager, logging to /home/grid/hadoop-2.7.2/logs/yarn-grid-resourcemanager-master.outslave2: starting nodemanager, logging to /home/grid/hadoop-2.7.2/logs/yarn-grid-nodemanager-slave2.outslave1: starting nodemanager, logging to /home/grid/hadoop-2.7.2/logs/yarn-grid-nodemanager-slave1.out
<span style="font-family: Arial, Helvetica, sans-serif;">[grid@master logs]$ jps</span>
11667 NameNode12084 Jps12006 ResourceManager11851 SecondaryNameNode查看datanode上的状态
[grid@slave1 hadoop]$ jps2800 NodeManager2899 Jps2687 DataNode[grid@slave2 hadoop]$ jps11392 NodeManager11491 Jps11279 DataNode
以上三台虚拟机状态显示正常。
十、检验hadoop可用性
现在我们来检验一下hadoop是不是真的可以进行数据的分析了,因为单纯的看到上面的状态,还不能完全的说明我们的hadoop可用。
我们将要使用hadoop提供的 hadoop-mapreduce-examples-2.7.2.jar包的wordcount功能
1、准备工作:
[grid@slave1 ~]$ mkdir input[grid@slave1 ~]$ lshadoop-2.7.2 hadoop-2.7.2.tar.gz input[grid@slave1 ~]$ cd input[grid@slave1 input]$ ls[grid@slave1 input]$ echo "hello world" > test1.txt[grid@slave1 input]$ echo "hello hadoop" > test2.txt[grid@slave1 input]$ cat test1.txt test2.txt hello worldhello hadoop[grid@slave1 input]$ cd[grid@slave1 ~]$ cd hadoop-2.7.2[grid@slave1 hadoop-2.7.2]$ hadoop fs -ls /[grid@slave1 hadoop-2.7.2]$ hadoop fs -mkdir /user[grid@slave1 hadoop-2.7.2]$ hadoop fs -ls /Found 1 itemsdrwxr-xr-x - grid supergroup 0 2016-02-20 10:57 /user[grid@slave1 hadoop-2.7.2]$ hadoop fs -mkdir /user/grid[grid@slave1 hadoop-2.7.2]$ hadoop fs -put ../input ./in[grid@slave1 hadoop-2.7.2]$ hadoop fs -lsFound 1 itemsdrwxr-xr-x - grid supergroup 0 2016-02-20 10:58 in[grid@slave1 hadoop-2.7.2]$ hadoop fs -ls ./in/Found 2 items-rw-r--r-- 2 grid supergroup 12 2016-02-20 10:58 in/test1.txt-rw-r--r-- 2 grid supergroup 13 2016-02-20 10:58 in/test2.txt[grid@slave1 hadoop-2.7.2]$
2、开始测试
[grid@slave1 hadoop-2.7.2]$ hadoop jar hadoop-mapreduce-examples-2.7.2.jar wordcount in out16/02/20 11:23:01 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.1.200:803216/02/20 11:23:04 INFO input.FileInputFormat: Total input paths to process : 216/02/20 11:23:04 INFO mapreduce.JobSubmitter: number of splits:216/02/20 11:23:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1455982762238_000116/02/20 11:23:05 INFO impl.YarnClientImpl: Submitted application application_1455982762238_000116/02/20 11:23:06 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1455982762238_0001/16/02/20 11:23:06 INFO mapreduce.Job: Running job: job_1455982762238_000116/02/20 11:23:31 INFO mapreduce.Job: Job job_1455982762238_0001 running in uber mode : false16/02/20 11:23:31 INFO mapreduce.Job: map 0% reduce 0%16/02/20 11:24:36 INFO mapreduce.Job: map 100% reduce 0%16/02/20 11:24:45 INFO mapreduce.Job: map 100% reduce 100%16/02/20 11:24:46 INFO mapreduce.Job: Job job_1455982762238_0001 completed successfully16/02/20 11:24:47 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=55 FILE: Number of bytes written=353323 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=237 HDFS: Number of bytes written=25 HDFS: Number of read operations=9 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=2 Launched reduce tasks=1 Data-local map tasks=2 Total time spent by all maps in occupied slots (ms)=98982 Total time spent by all reduces in occupied slots (ms)=7305 Total time spent by all map tasks (ms)=98982 Total time spent by all reduce tasks (ms)=7305 Total vcore-milliseconds taken by all map tasks=98982 Total vcore-milliseconds taken by all reduce tasks=7305 Total megabyte-milliseconds taken by all map tasks=101357568 Total megabyte-milliseconds taken by all reduce tasks=7480320 Map-Reduce Framework Map input records=2 Map output records=4 Map output bytes=41 Map output materialized bytes=61 Input split bytes=212 Combine input records=4 Combine output records=4 Reduce input groups=3 Reduce shuffle bytes=61 Reduce input records=4 Reduce output records=3 Spilled Records=8 Shuffled Maps =2 Failed Shuffles=0 Merged Map outputs=2 GC time elapsed (ms)=678 CPU time spent (ms)=11250 Physical memory (bytes) snapshot=476303360 Virtual memory (bytes) snapshot=6224809984 Total committed heap usage (bytes)=258678784 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=25 File Output Format Counters Bytes Written=25[grid@slave1 hadoop-2.7.2]$ hadoop fs -lsFound 2 itemsdrwxr-xr-x - grid supergroup 0 2016-02-20 10:58 indrwxr-xr-x - grid supergroup 0 2016-02-20 11:24 out[grid@slave1 hadoop-2.7.2]$ hadoop fs -ls ./outFound 2 items-rw-r--r-- 2 grid supergroup 0 2016-02-20 11:24 out/_SUCCESS-rw-r--r-- 2 grid supergroup 25 2016-02-20 11:24 out/part-r-00000[grid@slave1 hadoop-2.7.2]$ hadoop fs -cat ./out/part-r-00000hadoop 1hello 2world 1[grid@slave1 hadoop-2.7.2]$
至此,整个过程完美收官。在最后的验证部分,细心的可以看到,我是在slave1上执行的wordcount,所以说,数据的分析工作可以在任何一个节点上进行,不一定非要在namedate上。
- VM下CentOS7x86-64bit+JDK1.8+hadoop2.7.2安装部署
- centos7(vm)下hadoop2.7.2伪分布式安装验证(x86)
- HBase1.2.x安装部署(win8+jdk1.8+hadoop2.8.1 无需cygwin)
- Ubuntu14,jdk1.8,hadoop2.7.* 安装
- VM Ubuntu14.10安装JDK1.8
- Hadoop2.5.2 64bit 完全分布式安装
- ubuntu12.04 64bit安装hadoop2.2.0
- hadoop2.7.2安装部署教程
- Hadoop2.8HA安装部署
- centos7(vm)下hadoop2.7.2完全分布式安装验证(x86)-hadoop3节点集群(2副本)
- centos6下安装部署hadoop2.2
- centos6下安装部署hadoop2.2
- win8/win10+Hadoop2.8.1+jdk1.8部署(无需cygwin)
- win7 64bit 企业版 编译Hadoop2.7.2源码+插件+部署伪分布式开发环境
- linux下Hadoop2.7.2集群部署日志
- Hadoop2.0安装部署
- hadoop2.5.2安装部署
- Hadoop2.6.5安装部署
- 给 Web 开发者的 25 款最有用的 AngularJS 工具
- jquery(四)插件
- servlet的几个细节
- 用Struts2技术实现【增,删,改,查】操作
- C++输入输出的关键字:getline、get
- VM下CentOS7x86-64bit+JDK1.8+hadoop2.7.2安装部署
- 机房重构——存储过程详解
- mybatis连接mysql数据库插入中文乱码
- NYOJ 超级台阶
- PHP实现多线程抓取网页
- 波浪动画实现方法的改进——用自定义SurfaceView实现
- Leetcode 4. Median of Two Sorted Arrays
- 过程质量和结果质量
- IM同步协议