Time的hadoop学习笔记之--搭建有三台主机的Hadoop集群

来源:互联网 发布:java集合结构图 编辑:程序博客网 时间:2024/06/05 17:47

搭建有三台主机的Hadoop集群:

原材料: 笔记本、vm虚拟机、centOS镜像、CDH安装包

 

1      准备工作,主机及网络配置:

安装好linux系统后,为演示方便,我们将三台主机分别命名为hadoop0、hadoop1、hadoop2;

主机与ip对应关系:

 

Hostname

ip

hadoop0

192.168.1.100

Hadoop1

192.168.1.101

Hadoop2

192.168.1.102

 

以修改hadoop0为例:

1.1  修改主机名

你有两种选择:

①临时生效,重启后又恢复原值(个人不建议):

[root@localhost~]# hostname

localhost.localdomain

[root@localhost~]# hostname hadoop0

[root@localhost~]# hostname

hadoop0

②修改配置文件(永久生效):

[root@localhost~]# vi /etc/sysconfig/network

NETWORKING=yes  #启动网络

NETWORKING_IPV6=no#关闭ipv6

HOSTNAME=hadoop0  #主机名

1.2  配置ip,采用静态ip(vmware的网络配置为”仅主机模式(Host-only)”)

[root@localhost~]# setup

 

[root@hadooop0~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0

DEVICE=eth0

BOOTPROTO=none

HWADDR=00:0c:29:17:e1:ba

IPV6INIT=yes

NM_CONTROLLED=yes

ONBOOT=yes

TYPE=Ethernet

UUID="525eaf08-af4a-411b-986e-990ddff828e4"

IPADDR=192.168.1.100

NETMASK=255.255.255.0

USERCTL=no

1.3  关闭防火墙和selinux

关防火墙:

①    临时生效,执行后立即起效

[root@hadoop0 ~]#service iptables stop

[root@hadoop0 ~]#service iptables status

iptables:Firewall is not running.

② 永久生效,重启后起效:

[root@hadoop0 ~]#chkconfig iptables off

[root@hadoop0 ~]#chkconfig iptables --list

iptables          0:off   1:off   2:off   3:off   4:off   5:off   6:off

关闭selinux:

①    临时生效:

[root@hadoop0 ~]#setenforce 0

[root@hadoop0 ~]#getenforce

Permissive

②    永久生效

修改/etc/selinux/config 文件

将SELINUX=enforcing改为SELINUX=disabled

重启机器即可

[root@hadoop0~]# vi /etc/selinux/config

# This file controlsthe state of SELinux on the system.

# SELINUX= can take oneof these three values:

#     enforcing - SELinux security policy isenforced.

#     permissive - SELinux prints warningsinstead of enforcing.

#     disabled - No SELinux policy is loaded.

#SELINUX=enforcing

SELINUX=disabled

# SELINUXTYPE= can takeone of these two values:

#     targeted - Targeted processes areprotected,

#     mls - Multi Level Security protection.

SELINUXTYPE=targeted

1.4  修改host集群列表

[root@hadoop0~]# vi /etc/hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4

::1         localhost localhost.localdomainlocalhost6 localhost6.localdomain6

192.168.1.100 hadoop0

192.168.1.101 hadoop1

192.168.1.102hadoop2

1.5  配置jdk(可跳过,cdh安装时会进行jdk安装配置)

这里在进行cdh安装的时候会有安装jdk这一步,所以可以跳过的。

不过如果希望自己安装,那还是可以看一下。看看linux多少位的,否则,64位的linux安装32的jdk有时需要依赖包,避免蛋疼,杰哥劝你还是老老实实的下64位的吧!

[root@hadoop0~]# getconf LONG_BIT

64

[root@hadoop0~]# ./jdk-6u45-linux-x64.bin

直到提示Done,安装完成。

修改配置文件:

经验告诉我们修改任何系统文件前先都要备份,免得你改乱了,以下该原则不在赘述!

[root@hadoop0usr]# vi /etc/profile

在最后加入

export JAVA_HOME=/usr/jdk1.6.0_45/

export PATH=$JAVA_HOME/bin:$PATH

exportCLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

生效:

[root@hadoop0usr]# source /etc/profile

验证生效:

[root@hadoop0usr]# java -version

java version"1.6.0_45"

Java(TM) SE RuntimeEnvironment (build 1.6.0_45-b06)

JavaHotSpot(TM) 64-Bit Server VM (build 20.45-b01, mixed mode)

 

1.6  为管理节点配置无密码访问

[root@hadoop0 ~]#  ssh-keygen -t rsa

Generatingpublic/private rsa key pair.

Enter file in which tosave the key (/root/.ssh/id_rsa):

Created directory'/root/.ssh'.

Enter passphrase (emptyfor no passphrase):

Enter same passphraseagain:

Your identification hasbeen saved in /root/.ssh/id_rsa.

Your public key hasbeen saved in /root/.ssh/id_rsa.pub.

The key fingerprint is:

d4:76:ba:61:18:a0:bf:61:78:b3:65:a1:b4:a8:3a:47root@hadoop0

The key's randomartimage is:

+--[ RSA 2048]----+

|      .         |

|     . . .      |

|    . . + o .   |

|     = + = o    |

|    o O S +     |

|  E. o B . o    |

| ..   o  .      |

|...              |

|.o               |

+-----------------+

[root@hadoop0 ~]#

[root@hadoop0~]# cd /root/.ssh/

[root@hadoop0.ssh]# ll

total8

-rw-------1 root root 1675 Oct 27 14:57 id_rsa

-rw-r--r-- 1 rootroot  394 Oct 27 14:57 id_rsa.pub

追加私钥到公钥中:

         [root@hadoop0 .ssh]#  cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys

         [root@hadoop0 .ssh]# scp~/.ssh/authorized_keys root@hadoop1:~/.ssh/

root@hadoop1'spassword:

authorized_keys                                                                                                                                             100%  394    0.4KB/s   00:00   

验证无密码访问:

[root@hadoop0 .ssh]#ssh hadoop2

Last login: Tue Oct 2714:49:38 2015 from 192.168.1.103

[root@hadoop2 ~]#

 

1.7  本地yum源配置

挂载光盘镜像到html目录:

[root@hadoop0 html]#mount -o loop CentOS-6.6-x86_64-bin-DVD1.iso /var/www/html/centosIso

[root@hadoop0html]# pwd

/var/www/html

[root@hadoop0 html]# cpcentosIso/ linuxiso

修改配置文件:

[root@hadoop0yum.repos.d]# pwd

/etc/yum.repos.d

[root@hadoop0yum.repos.d]# ll

total 24

-rw-r--r--. 1 root root1991 Oct 23  2014 CentOS-Base.repo

-rw-r--r--. 1 rootroot  647 Oct 23  2014 CentOS-Debuginfo.repo

-rw-r--r--. 1 rootroot  289 Oct 23  2014 CentOS-fasttrack.repo

-rw-r--r--. 1 rootroot  630 Oct 23  2014 CentOS-Media.repo

-rw-r--r--.1 root root 5394 Oct 23  2014CentOS-Vault.repo

CentOS-Base.repo 是yum网络源的配置文件 CentOS-Media.repo 是yum本地源的配置文件

         修改 CentOS-Media.repo

         [root@hadoop0 yum.repos.d]# viCentOS-Media.repo

[c6-media]

name=CentOS-$releasever- Media

baseurl=http://192.168.1.100/linuxiso/

gpgcheck=1

enabled=1

gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6

其它几台主机使用同样的配置文件,来读取hadoop0的http服务上的资源即可

[root@hadoop0yum.repos.d]# mv CentOS-Base.repo CentOS-Base.repo.bak

否则执行yum install的时候,会报网络连接错误

[root@hadoop0 yum.repos.d]# yum clean all

[root@hadoop0 yum.repos.d]# yum makecache

Loaded plugins: fastestmirror, refresh-packagekit,security

Determining fastest mirrors

c6-media                                                                                                                                                               | 4.0kB     00:00    

c6-media/group_gz                                                                                                                                                      | 216 kB     00:00     

c6-media/filelists_db                                                                                                                                                  | 6.0 MB     00:00    

c6-media/primary_db                                                                                                                                                     |4.5 MB     00:00    

c6-media/other_db                                                                                                                                                      | 2.8MB     00:00    

Metadata Cache Created

验证:

[root@hadoop0 yum.repos.d]#  yum list

[root@hadoop0 yum.repos.d]# yum -y install samba

Loaded plugins: fastestmirror, refresh-packagekit,security

Setting up Install Process

Loading mirror speeds from cached hostfile

Resolving Dependencies

--> Running transaction check

---> Package samba.x86_64 0:3.6.23-12.el6 will beinstalled

--> Finished Dependency Resolution

………………此处略去部分日志

………………

Installed:

  samba.x86_640:3.6.23-12.el6                                                                                                                                                                

 

Complete!

[root@hadoop0 yum.repos.d]#

 

1.8  开启httpd服务

[root@hadoop0 ~]# chkconfig httpd on

[root@hadoop0 cloudera]# service httpd start

Starting httpd: httpd: Could not reliably determine theserver's fully qualified domain name, using 192.168.1.100 for ServerName

                                                          [  OK  ]

[root@hadoop0 cloudera]# service httpd status

httpd (pid  2510) isrunning...

[root@hadoop0 cloudera]#

 

创建到cloudera安装文件的软链接

[root@hadoop0 html]# ln -s /var/cloudera/ cloudera


 

1.9  Ntp时间同步设置

先验证是否安装

[root@hadoop0 ~]# rpm -q ntp

ntp-4.2.6p5-1.el6.centos.x86_64

设置开机自启:

[root@hadoop0 ~]# chkconfig ntpd on

启动:

[root@hadoop0 ~]# service ntpd start

[root@hadoop0 ~]# ntpstat

unsynchronised

  time serverre-starting

   polling serverevery 8 s

设置时间并写入BIOS:

[root@hadoop0 ~]# date -s 11/03/2015

Tue Nov  3 00:00:00PST 2015

[root@hadoop0 ~]# date -s 10:50:00

Tue Nov  3 10:50:00PST 2015

[root@hadoop0 ~]# clock –w

修改服务端配置:

[root@hadoop0 ~]# vi /etc/ntp.conf

# the administrative functions.

restrict 127.0.0.1

restrict -6 ::1

# Hosts on local network are less restricted.

restrict192.168.1.0 mask 255.255.255.0 nomodify notrap

server 127.127.1.0

fudge 127.127.1.0stratum 1

includefile /etc/ntp/crypto/pw

keys /etc/ntp/keys

[root@hadoop0 ~]# service ntpd restart

[root@hadoop0 ~]# ntpstat

synchronised to local net at stratum 2

   time correct towithin 7948 ms

   polling serverevery 64 s

修改客户端配置(采用crontab定时执行的话可以不配置):

[root@hadoop1 ~]# vi /etc/ntp.conf

restrict 127.0.0.1

restrict -6 ::1

server192.168.1.100

手工同步:

[root@hadoop1 ~]# ntpdate 192.168.1.100

 3 Nov 13:46:52ntpdate[16993]: step time server 192.168.1.100 offset 57626.250506 sec

[root@hadoop1 ~]# date

Tue Nov  3 13:47:23PST 2015

在客户端设置定时任务,每5分钟同步一次:

[root@hadoop1 ~]# crontab -e

*/5 * * * * /usr/sbin/ntpdate 192.168.1.100 >/root/ntpdate.log 2>&1

常见报错:

①    the NTP socket is in use, exiting

[root@hadoop1 ~]# /usr/sbin/ntpdate 192.168.1.100

 3 Nov 15:23:37ntpdate[20372]: the NTP socket is in use, exiting

[root@hadoop1 ~]# service ntpd stop

Shutting down ntpd:                                        [  OK  ]

[root@hadoop1 init.d]# /usr/sbin/ntpdate 192.168.1.100

 3 Nov 15:27:50ntpdate[20474]: adjust time server 192.168.1.100 offset -0.000149 sec

2      安装clouderaManager

2.1  配置cm与cdh的yum源

Loaded plugins: fastestmirror, refresh-packagekit,security

Setting up Install Process

Loading mirror speeds from cached hostfile

cloudera-manager                                                                                                                                                       |  951 B     00:00    

cloudera-manager/primary                                                                                                                                                |4.1 kB     00:00    

cloudera-manager 

………………

Installed:

  createrepo.noarch0:0.9.9-22.el6                                                                                                                                                            

 

Dependency Installed:

  deltarpm.x86_640:3.5-0.5.20090913git.el6                                                 python-deltarpm.x86_64 0:3.5-0.5.20090913git.el6                                                

 

Complete!

[root@hadoop0 html]#

[root@hadoop0 html]# yum list | grep cloudera

cloudera-manager-agent.x86_64            5.3.8-1.cm538.p0.271.el6       cloudera-manager

cloudera-manager-daemons.x86_64          5.3.8-1.cm538.p0.271.el6       cloudera-manager

cloudera-manager-server.x86_64           5.3.8-1.cm538.p0.271.el6       cloudera-manager

cloudera-manager-server-db-2.x86_64      5.3.8-1.cm538.p0.271.el6       cloudera-manager

enterprise-debuginfo.x86_64              5.3.8-1.cm538.p0.271.el6       cloudera-manager

jdk.x86_64                              2000:1.6.0_31-fcs             cloudera-manager

oracle-j2sdk1.7.x86_64                   1.7.0+update67-1               cloudera-manager

配置yum源:

安装createrepo服务

[root@hadoop0 cm5.3.0]# ll

total 516

-rwxr-xr-x 1 root root 514295 Oct 27 21:16cloudera-manager-installer.bin

drwxr-xr-x 2 root root  4096 Oct 27 23:06 repodata

-rw-r--r-- 1 root root  1690 Oct 27 21:16 RPM-GPG-KEY-cloudera

drwxr-xr-x 4 root root  4096 Oct 27 21:16 RPMS

[root@hadoop0 cloudera]# yum -y install createrepo

分别在cm5.3.0对应安装文件目录下执行 createrepo .创建 repodata 文件

[root@hadoop0 cm5.3.0]# createrepo .

Spawning worker 0 with 7 pkgs

Workers Finished

Gathering worker results

 

Saving Primary metadata

Saving file lists metadata

Saving other metadata

Generating sqlite DBs

Sqlite DBs complete

将cm配置文件的cloudera-manager.repo 放到目录 /etc/yum.repos.d

[root@hadoop0 yum.repos.d]# more cloudera-manager.repo

[cloudera-manager]

name = Cloudera Manager, Version 5.3.0

baseurl = http://192.168.1.100/cloudera/cm5.3.0

gpgkey =http://192.168.1.100/cloudera/cm5.3.0/RPM-GPG-KEY-cloudera

gpgcheck = 1

默认配置会下载到本地目录:

[root@hadoop0 parcel-repo]# pwd

/opt/cloudera/parcel-repo

[root@hadoop0 parcel-repo]# ll

total 1464928

-rw-r--r-- 1 root root 1500028710 Nov  1 09:05 CDH-5.3.8-1.cdh5.3.8.p0.5-el6.parcel

-rw-r--r-- 1 root root         41 Nov 1 09:05 CDH-5.3.8-1.cdh5.3.8.p0.5-el6.parcel.sha

-rw-r--r-- 1 root root     42655 Nov  1 09:05 manifest.json

drwxr-xr-x 2 root root       4096 Nov 1 19:19 repodata

2.2  执行安装

[root@hadoop0 cloudera]# ll

total 512

-rw-r--r-- 1 root root 514295 Oct 27 19:37cloudera-manager-installer.bin

drwxr-xr-x 3 root root  4096 Oct 27 19:37 cm5.3.0

drwxr-xr-x 2 root root  4096 Oct 27 19:37 parcel

[root@hadoop0 cloudera]# chmod  755 cloudera-manager-installer.bin

执行安装,根据提示一路点击next就OK啦!

[root@hadoop0 cloudera]# ./cloudera-manager-installer.bin--skip_repo_package=1

至此,cm安装完成。

 

3      登录clouderaManager,安装cdh

安装过程用cm的控制台页面,根据提示进行配置及添加相应服务,如下图:

 

如果在安装到最后,获取心跳的时候,总是报超时的话,那么,基本是下面几个原因:

Hosts文件配置问题,仔细查看hosts主机名与ip是否对应。(安装过程中因为一个主机的大小写没注意,看了好久,最后才找到)

检查防火墙,selinux是否关闭。

基本以上都做到了的话,那么你的安装会很顺畅。我觉得,有两点很重要。 一就是仔细,一定要仔细,按照步骤来,基本不会有太大问题;二就是有了问题多看日志,基本上根据日志,通过经验及查资料都能最后解决。

集群安装完成后,其它的Hbase,HDFS,Hive,Impala等相关资源,自己根据需要,在clouderaManager里面根据需要点击添加服务进行添加就好了。

4      CM卸载

[root@hadoop0 .ssh]# service cloudera-scm-server stop

Stopping cloudera-scm-server:                              [  OK  ]

[root@hadoop0 .ssh]# service cloudera-scm-server-db stop

waiting for server to shut down.... done

server stopped

[root@hadoop0 .ssh]#

[root@hadoop0 .ssh]# yum remove cloudera-manager-server

[root@hadoop0 .ssh]# yum remove cloudera-manager-server-db

[root@localhost cmf]# service cloudera-scm-agenthard_stop_confirmed

Stopping cloudera-scm-agent:                               [  OK  ]

supervisord is already stopped

[root@hadoop0 .ssh]# yum remove 'cloudera-manager-*' hadoop hue-common 'bigtop-*'

[root@hadoop0 .ssh]# rm -Rf /usr/share/cmf/var/lib/cloudera* /var/cache/yum/cloudera*

[root@hadoop0 .ssh]# rm /tmp/.scm_prepare_node.lock

 

 

 

5      部分参数修改:

5.1  修改vm.swappiness参数

安装过程会有如下提示:

Cloudera建议将 /proc/sys/vm/swappiness设置为 0。当前设置为 60。使用 sysctl 命令在运行时更改该设置并编辑 /etc/sysctl.conf 以在重启后保存该设置。您可以继续进行安装,但可能会遇到问题,Cloudera Manager 报告您的主机由于交换运行状况不佳。以下主机受到影响:hadoop[0-2]

处理:

[root@hadoop0 ~]# sysctl -q vm.swappiness

vm.swappiness = 60

也就是说,你的内存在使用到100-60=40%的时候,就开始出现有交换分区的使用。大家知道,内存的速度会比磁盘快很多,这样子会加大系统io,同时造的成大量页的换进换出,严重影响系统的性能,所以我们在操作系统层面,要尽可能使用内存,对该参数进行调整。

[root@hadoop0 etc]# vi /etc/sysctl.conf

#在最后增加一行

vm.swappiness = 10

[root@hadoop0 etc]# sysctl -q vm.swappiness

vm.swappiness = 10

5.2  修改最大并发打开文件数

修改 /etc 下面的profile,增加一行 ulimit -n 65535, 将最大并发打开文件数设置为65535

source /etc/profile生效环境变量。

 

1 0
原创粉丝点击