VM下CentOS7x86-64bit+JDK1.8+hadoop2.7.2安装部署

来源:互联网 发布:手机精确定位软件 编辑:程序博客网 时间:2024/06/06 02:08

今天,记录一下CentOS7x86-64bit+JDK1.8+hadoop2.7.2安装部署过程。

环境介绍及规划

[root@master etc]# cat redhat-release CentOS Linux release 7.2.1511 (Core) 

192.168.1.200   master192.168.1.201   slave1192.168.1.202   slave2

我这里用的是CentOS7x86-64bit的。为了节约时间,我们可以安装一台虚拟机后,再进行虚拟机的复制,拷贝出另外两台虚拟机。不过需要注意虚拟机在开机时,第二台和第三台要选择copy模式,或者自行去修改网卡的mac地址。这一部分内容不在这里赘述,如果需要,可在网上自行查阅相关文档。
准备好三台虚拟机后,我们来进行下面的部署配置,一起来吧!

一、配置主机名及IP地址

CentOS7中,修改主机名的配置文件是/etc/hostname

[root@master etc]# vim hostname master

IP地址的修改
[root@master ~]# vim /etc/sysconfig/network-scripts/ifcfg-ens33 TYPE=EthernetBOOTPROTO=staticDEFROUTE=yesPEERDNS=yesPEERROUTES=yesIPV4_FAILURE_FATAL=noNAME=ens33UUID=c8299190-539a-4ba2-b34c-1c4f72d10e09DEVICE=ens33ONBOOT=yesIPADDR=192.168.1.200NETMASK=255.255.255.0GATEWAY=192.168.1.1DNS1=192.168.1.1

二、修改hosts本地解析文件

[root@master ~]# vim /etc/hosts127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4::1         localhost localhost.localdomain localhost6 localhost6.localdomain6192.168.1.200   master192.168.1.201   slave1192.168.1.202   slave2

三、关闭防火墙和SELinux

切记一定要关闭,否则hadoop启动会出错。CentOS7下的防火墙命令与之前版本略有不同。如下

[root@master ~]# systemctl stop firewalld.service #关闭防火墙,但系统重启后会再次启动[root@master ~]# systemctl disable firewalld.service #彻底关闭,重启后也不会再跟着系统启动
查看防火墙的状态:两个命令都可以<pre name="code" class="html">[root@master ~]# firewall-cmd --statenot running[root@master ~]# systemctl status firewalld.service● firewalld.service - firewalld - dynamic firewall daemon   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)   Active: inactive (dead)
关闭SELinux
[root@master ~]# setenforce 0 #关闭当前SELinux,单系统重启后,SELinux还会启动

[root@master ~]# vim /etc/selinux/config #修改SELinux配置文件,系统重启后也不会再启动SELinuxSELINUX=disabled
查看SELinux状态

[root@master ~]# sestatusSELinux status:                 disabled

四、创建hadoop用户:grid

[[root@master ~]# useradd grid -s /bin/bash[root@master ~]# passwd gridChanging password for user grid.New password: Retype new password: passwd: all authentication tokens updated successfully.[root@master ~]# 

注意:以上四步需要大家在三台虚拟机上都要做配置。IP和主机名可以根据我们之前的环境规划来配置。

五、配置免密码SSH登陆

hadoop各个节点之间需要数据通信,为了方便操作,在数据通信时,不要我们再逐台输入密码。我们需要配置免密码登录的ssh。

首先我们在master主机上,切换到grid用户下,配置grid用户的免密码登陆

1、创建公、私密钥 注意这一步在三台虚拟机上都要执行

[root@master ~]# su - grid[grid@master ~]$ ssh-keygen -t rsaGenerating public/private rsa key pair.Enter file in which to save the key (/home/grid/.ssh/id_rsa): Created directory '/home/grid/.ssh'.Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/grid/.ssh/id_rsa.Your public key has been saved in /home/grid/.ssh/id_rsa.pub.The key fingerprint is:11:ea:24:ab:26:1f:f2:a6:1f:47:72:76:f8:a0:8f:ec grid@masterThe key's randomart image is:+--[ RSA 2048]----+|        .        ||       . .       ||    . o .        ||     *   .       ||  . B o S        ||   B +           ||o * . .          || B.*             ||.=E .            |+-----------------+
[grid@master ~]$ cd .ssh[grid@master .ssh]$ lsid_rsa  id_rsa.pub[grid@master .ssh]$ 

2、将公钥发送给指定主机(我们在这里先配置与slave2的认证),并生成认证文件。

[grid@master .ssh]$ ssh-copy-id -i ./id_rsa.pub slave2The authenticity of host 'slave2 (192.168.1.202)' can't be established.ECDSA key fingerprint is df:f0:4e:f2:17:ee:2c:5f:3d:e8:8e:49:ea:dc:3d:2c.Are you sure you want to continue connecting (yes/no)? yes/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keysgrid@slave1's password: Number of key(s) added: 1Now try logging into the machine, with:   "ssh 'slave1'"and check to make sure that only the key(s) you wanted were added.[grid@master .ssh]$ lsauthorized_keys  id_rsa  id_rsa.pub  known_hosts

3、在slave1主机上也做1、2步的操作,让slave1与slave2建立免密码登陆

4、在slave2主机上,将本机的grid家目录下的.ssh/id_rsa.pub追加进authorized_keys文件中

[grid@slave2 ~]$ cat .ssh/id_rsa.pub >> .ssh/authorized_keys

5、最后,我们将slave2中的认证文件authorized_keys通过scp命令拷贝给master和slave1主机即可

[grid@slave2 ~]$ scp .ssh/authorized_keys master:/home/grid/.ssh/authorized_keys                               100% 1179     1.2KB/s   00:00
[grid@slave2 ~]$ scp .ssh/authorized_keys slave1:/home/grid/.ssh/authorized_keys                               100% 1179     1.2KB/s   00:00

ok,免密码登陆算是搞定了!


六、安装JDK

我们在这里使用的是1.8版本的,是目前最新版本(2016.2.28)我们把jdk放在/usr目录下

[grid@master usr]$ tar jdk-8u73-linux-x64.tar.gz
[grid@master .ssh]$ cd /usr[grid@master usr]$ lsbin  etc  games  include  jdk1.8.0_72  jdk-8u72-linux-x64.tar.gz  lib  lib64  libexec  local  sbin  share  src  tmp[grid@master usr]$ java -versionjava version "1.8.0_72"Java(TM) SE Runtime Environment (build 1.8.0_72-b15)Java HotSpot(TM) 64-Bit Server VM (build 25.72-b15, mixed mode)

七、安装hadoop

下载解压到grid家目录

<pre name="code" class="html">[grid@master ~]$ tar -zxf <span style="font-family: Arial, Helvetica, sans-serif;">hadoop-2.7.2.tar.gz</span>
[grid@master ~]$ lshadoop-2.7.2  hadoop-2.7.2.tar.gz[grid@master ~]$ [grid@master ~]$ cd hadoop-2.7.2[grid@master hadoop-2.7.2]$ lsbin  etc  include  lib  libexec  LICENSE.txt  NOTICE.txt  README.txt  sbin  share[grid@master hadoop-2.7.2]$ 

八、配置hadoop

这一步是整个安装中的重头戏,follow me。hadoop2.7.2的配置文件较之前版本也有变化。配置文件路径:

/home/grid/hadoop-2.7.2/etc/hadoop/
[grid@master ~]$ cd hadoop-2.7.2/etc/hadoop/[grid@master hadoop]$ lscapacity-scheduler.xml      httpfs-env.sh            mapred-env.shconfiguration.xsl           httpfs-log4j.properties  mapred-queues.xml.templatecontainer-executor.cfg      httpfs-signature.secret  mapred-site.xmlcore-site.xml               httpfs-site.xml          slaveshadoop-env.cmd              kms-acls.xml             ssl-client.xml.examplehadoop-env.sh               kms-env.sh               ssl-server.xml.examplehadoop-metrics2.properties  kms-log4j.properties     yarn-env.cmdhadoop-metrics.properties   kms-site.xml             yarn-env.shhadoop-policy.xml           log4j.properties         yarn-site.xmlhdfs-site.xml               mapred-env.cmd[grid@master hadoop]$ hadoop versionHadoop 2.7.2Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r b165c4fe8a74265c792ce23f546c64604acf0e41Compiled by jenkins on 2016-01-26T00:08ZCompiled with protoc 2.5.0From source with checksum d0fda26633fa762bff87ec759ebe689cThis command was run using /home/grid/hadoop-2.7.2/share/hadoop/common/hadoop-common-2.7.2.jar

其中,我们要配置的文件有以下7个:

======hadoop-env.shexport JAVA_HOME=/usr/jdk1.8.0_73======yarn-env.shexport JAVA_HOME=/usr/jdk1.8.0_73======slavesslave1slave2======core-site.xml<configuration><property>  <name>fs.defaultFS</name>  <value>hdfs://master:9000</value></property><property>  <name>io.file.buffer.size</name>  <value>131072</value></property><property>  <name>hadoop.tmp.dir</name>  <value>file:/home/grid/hadoop-2.7.2/tmp</value>  <description>Abase for other temporary directories</description></property><property>  <name>hadoop.proxyuser.hduser.hosts</name>  <value>*</value></property><property>  <name>hadoop.proxyuser.hduser.groups</name>  <value>*</value></property></configuration>======hdfs-site.xml<configuration><property>  <name>dfs.namenode.secondary.http-address</name>  <value>master:9001</value></property><property>  <name>dfs.namenode.name.dir</name>  <value>file:/home/grid/hadoop-2.7.2/name</value></property><property>  <name>dfs.datanode.data.dir</name>  <value>file:/home/grid/hadoop-2.7.2/data</value></property><property>  <name>dfs.replication</name>  <value>2</value></property><property>  <name>dfs.webhdfs.enabled</name>  <value>true</value></property></configuration>======mapred-site.xml<configuration><property>  <name>mapreduce.framework.name</name>  <value>yarn</value></property><property>  <name>mapreduce.jobhistory.address</name>  <value>master:10020</value></property><property>  <name>mapreduce.jobhistory.webapp.address</name>  <value>master:19888</value></property></configuration>======yarn-site.xml<property>  <name>yarn.nodemanager.aux-services</name>  <value>mapreduce_shuffle</value></property><property>  <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>  <value>org.apache.hadoop.mapred.ShuffleHandler</value></property><property>  <name>yarn.resourcemanager.address</name>  <value>master:8032</value></property><property>  <name>yarn.resourcemanager.scheduler.address</name>  <value>master:8030</value></property><property>  <name>yarn.resourcemanager.resource-tracker.address</name>  <value>master:8031</value></property><property>  <name>yarn.resourcemanager.admin.address</name>  <value>master:8033</value></property><property>  <name>yarn.resourcemanager.webapp.address</name>  <value>master:8088</value></property>


九、格式化并启动集群


OK,到这,我们就算配置完成了。不过,我们要想使用hadoop,还需要格式化hdfs文件系统,当然最好能测试一下是否真正的可用了!下面让我们一起继续

为了方便使用hadoop和jdk的命令,我们将他们的bin目录放到/etc/profile文件中

export JAVA_HOME=/usr/jdk1.8.0_73export HADOOP_HOME=/home/grid/hadoop-2.7.2export PATH=$JAVA_HOME/bin:$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:/usr/apache-ant-1.9.6/bin

格式化hdfs 

hadoop2.7.2版本的格式化命令也有变化之前用hadoop namenode -format,现在是hdfs

hdfs namenode -format
执行上面的命令后,在后面的提示中如果看到successfully formatted的字样,说明我们hdfs格式化成功!

启动hadoop

[grid@master logs]$ start-dfs.sh Starting namenodes on [master]master: starting namenode, logging to /home/grid/hadoop-2.7.2/logs/hadoop-grid-namenode-master.outslave2: starting datanode, logging to /home/grid/hadoop-2.7.2/logs/hadoop-grid-datanode-slave2.outslave1: starting datanode, logging to /home/grid/hadoop-2.7.2/logs/hadoop-grid-datanode-slave1.outStarting secondary namenodes [master]master: starting secondarynamenode, logging to /home/grid/hadoop-2.7.2/logs/hadoop-grid-secondarynamenode-master.out[grid@master logs]$ jps11667 NameNode11960 Jps11851 SecondaryNameNode[grid@master logs]$ start-yarn.sh starting yarn daemonsstarting resourcemanager, logging to /home/grid/hadoop-2.7.2/logs/yarn-grid-resourcemanager-master.outslave2: starting nodemanager, logging to /home/grid/hadoop-2.7.2/logs/yarn-grid-nodemanager-slave2.outslave1: starting nodemanager, logging to /home/grid/hadoop-2.7.2/logs/yarn-grid-nodemanager-slave1.out
<span style="font-family: Arial, Helvetica, sans-serif;">[grid@master logs]$ jps</span>
11667 NameNode12084 Jps12006 ResourceManager11851 SecondaryNameNode
查看datanode上的状态

[grid@slave1 hadoop]$ jps2800 NodeManager2899 Jps2687 DataNode[grid@slave2 hadoop]$ jps11392 NodeManager11491 Jps11279 DataNode

以上三台虚拟机状态显示正常。

十、检验hadoop可用性

现在我们来检验一下hadoop是不是真的可以进行数据的分析了,因为单纯的看到上面的状态,还不能完全的说明我们的hadoop可用。

我们将要使用hadoop提供的 hadoop-mapreduce-examples-2.7.2.jar包的wordcount功能

1、准备工作:

[grid@slave1 ~]$ mkdir input[grid@slave1 ~]$ lshadoop-2.7.2  hadoop-2.7.2.tar.gz  input[grid@slave1 ~]$ cd input[grid@slave1 input]$ ls[grid@slave1 input]$ echo "hello world" > test1.txt[grid@slave1 input]$ echo "hello hadoop" > test2.txt[grid@slave1 input]$ cat test1.txt test2.txt hello worldhello hadoop[grid@slave1 input]$ cd[grid@slave1 ~]$ cd hadoop-2.7.2[grid@slave1 hadoop-2.7.2]$ hadoop fs -ls /[grid@slave1 hadoop-2.7.2]$ hadoop fs -mkdir /user[grid@slave1 hadoop-2.7.2]$ hadoop fs -ls /Found 1 itemsdrwxr-xr-x   - grid supergroup          0 2016-02-20 10:57 /user[grid@slave1 hadoop-2.7.2]$ hadoop fs -mkdir /user/grid[grid@slave1 hadoop-2.7.2]$ hadoop fs -put ../input ./in[grid@slave1 hadoop-2.7.2]$ hadoop fs -lsFound 1 itemsdrwxr-xr-x   - grid supergroup          0 2016-02-20 10:58 in[grid@slave1 hadoop-2.7.2]$ hadoop fs -ls ./in/Found 2 items-rw-r--r--   2 grid supergroup         12 2016-02-20 10:58 in/test1.txt-rw-r--r--   2 grid supergroup         13 2016-02-20 10:58 in/test2.txt[grid@slave1 hadoop-2.7.2]$

2、开始测试

[grid@slave1 hadoop-2.7.2]$ hadoop jar hadoop-mapreduce-examples-2.7.2.jar wordcount in out16/02/20 11:23:01 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.1.200:803216/02/20 11:23:04 INFO input.FileInputFormat: Total input paths to process : 216/02/20 11:23:04 INFO mapreduce.JobSubmitter: number of splits:216/02/20 11:23:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1455982762238_000116/02/20 11:23:05 INFO impl.YarnClientImpl: Submitted application application_1455982762238_000116/02/20 11:23:06 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1455982762238_0001/16/02/20 11:23:06 INFO mapreduce.Job: Running job: job_1455982762238_000116/02/20 11:23:31 INFO mapreduce.Job: Job job_1455982762238_0001 running in uber mode : false16/02/20 11:23:31 INFO mapreduce.Job:  map 0% reduce 0%16/02/20 11:24:36 INFO mapreduce.Job:  map 100% reduce 0%16/02/20 11:24:45 INFO mapreduce.Job:  map 100% reduce 100%16/02/20 11:24:46 INFO mapreduce.Job: Job job_1455982762238_0001 completed successfully16/02/20 11:24:47 INFO mapreduce.Job: Counters: 49        File System Counters                FILE: Number of bytes read=55                FILE: Number of bytes written=353323                FILE: Number of read operations=0                FILE: Number of large read operations=0                FILE: Number of write operations=0                HDFS: Number of bytes read=237                HDFS: Number of bytes written=25                HDFS: Number of read operations=9                HDFS: Number of large read operations=0                HDFS: Number of write operations=2        Job Counters                 Launched map tasks=2                Launched reduce tasks=1                Data-local map tasks=2                Total time spent by all maps in occupied slots (ms)=98982                Total time spent by all reduces in occupied slots (ms)=7305                Total time spent by all map tasks (ms)=98982                Total time spent by all reduce tasks (ms)=7305                Total vcore-milliseconds taken by all map tasks=98982                Total vcore-milliseconds taken by all reduce tasks=7305                Total megabyte-milliseconds taken by all map tasks=101357568                Total megabyte-milliseconds taken by all reduce tasks=7480320        Map-Reduce Framework                Map input records=2                Map output records=4                Map output bytes=41                Map output materialized bytes=61                Input split bytes=212                Combine input records=4                Combine output records=4                Reduce input groups=3                Reduce shuffle bytes=61                Reduce input records=4                Reduce output records=3                Spilled Records=8                Shuffled Maps =2                Failed Shuffles=0                Merged Map outputs=2                GC time elapsed (ms)=678                CPU time spent (ms)=11250                Physical memory (bytes) snapshot=476303360                Virtual memory (bytes) snapshot=6224809984                Total committed heap usage (bytes)=258678784        Shuffle Errors                BAD_ID=0                CONNECTION=0                IO_ERROR=0                WRONG_LENGTH=0                WRONG_MAP=0                WRONG_REDUCE=0        File Input Format Counters                 Bytes Read=25        File Output Format Counters                 Bytes Written=25[grid@slave1 hadoop-2.7.2]$ hadoop fs -lsFound 2 itemsdrwxr-xr-x   - grid supergroup          0 2016-02-20 10:58 indrwxr-xr-x   - grid supergroup          0 2016-02-20 11:24 out[grid@slave1 hadoop-2.7.2]$ hadoop fs -ls ./outFound 2 items-rw-r--r--   2 grid supergroup          0 2016-02-20 11:24 out/_SUCCESS-rw-r--r--   2 grid supergroup         25 2016-02-20 11:24 out/part-r-00000[grid@slave1 hadoop-2.7.2]$ hadoop fs -cat ./out/part-r-00000hadoop  1hello   2world   1[grid@slave1 hadoop-2.7.2]$ 

至此,整个过程完美收官。在最后的验证部分,细心的可以看到,我是在slave1上执行的wordcount,所以说,数据的分析工作可以在任何一个节点上进行,不一定非要在namedate上。
0 0
原创粉丝点击