hadoop环境搭建

来源:互联网 发布:js 红到蓝渐变 编辑:程序博客网 时间:2024/06/05 12:43

虽然已经运行好几个月了但是一直没时间整理,现在整理一下。

一、环境准备

我是用的是Linux version 2.6.32-504.1.3.el6.x86_64  cdh4.7.7

1. 关闭防火墙

Service iptables stop

Chkconfig iptables off 永久性的关闭

Service iptables status查看状态

2. 修改ip dns

Vim /etc/sysconfig/network-scripts/ifcfg-eth0

[root@hadoop6 ~]# cd /etc/sysconfig/network-scripts
[root@hadoop6 network-scripts]# ls
ifcfg-eth0  ifdown       ifdown-ippp  ifdown-post    ifdown-sit     ifup-aliases  ifup-ippp  ifup-plip   ifup-ppp     ifup-tunnel       net.hotplug
ifcfg-eth1  ifdown-bnep  ifdown-ipv6  ifdown-ppp     ifdown-tunnel  ifup-bnep     ifup-ipv6  ifup-plusb  ifup-routes  ifup-wireless     network-functions
ifcfg-lo    ifdown-eth   ifdown-isdn  ifdown-routes  ifup           ifup-eth      ifup-isdn  ifup-post   ifup-sit     init.ipv6-global  network-functions-ipv6
[root@hadoop6 network-scripts]# vi /etc/sysconfig/network-scripts/ifcfg-eth0


DEVICE=eth0
TYPE=Ethernet
UUID=b4d8a22e-f413-4f18-8d7c-5e5dced6e93d
ONBOOT=yes
NM_CONTROLLED=yes
BOOTPROTO=static
IPADDR=192.168.17.134
NETMASK=255.255.255.0
GATEWAY=192.168.17.2


DNS1=211.99.25.1
DNS2=202.106.0.20
DNS3=202.106.46.151
DNS4=8.8.8.8
DEFROUTE=yes
PEERDNS=no
PEERROUTES=yes
IPV4_FAILURE_FATAL=yes
IPV6INIT=no
NAME="System eth0"

Service network restart 使其生效

Ifconfig查看配置是否生效

3. 修改主机名

Vim /etc/sysconfig/network永久性改变

 

重启生效 reboot

Hostname查看

4. 关闭selinux

Vim /etc/selinux/config永久性的关闭


# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
# SELINUX=enforcing
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
#     targeted - Targeted processes are protected,
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
# SELINUX=enforcing
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
#     targeted - Targeted processes are protected,
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted

你也可以使用setenforce 0 命令来设置但这是临时性的

5、 禁用ipv6

/etc/modprobe.d目录下添加一个文件,写入 install ipv6 /bin/true 后,保存退出,重启计算机。
vim /etc/modprobe.d/ipv6off.conf
install ipv6 /bin/true

6. 调整vm.overcommit_memory的值为swap内存交换频率设置为零

内核参数说明如下:                                                                                                                                                                                                      

overcommit_memory文件指定了内核针对内存分配的策略,其值可以是012。                               

0, 表示内核将检查是否有足够的可用内存供应用进程使用;如果有足够的可用内存,内存申请允许;否则,内存申请失败,并把错误返回给应用进程。 

1, 表示内核允许分配所有的物理内存,而不管当前的内存状态如何。

2, 表示内核允许分配超过所有物理内存和交换空间总和的内存

vim /etc/sysctl.conf

# Kernel sysctl configuration file for Red Hat Linux
#
# For binary values, 0 is disabled, 1 is enabled.  See sysctl(8) and
# sysctl.conf(5) for more details.


# Controls IP packet forwarding
net.ipv4.ip_forward = 0


# Controls source route verification
net.ipv4.conf.default.rp_filter = 1


# Do not accept source routing
net.ipv4.conf.default.accept_source_route = 0


# Controls the System Request debugging functionality of the kernel
kernel.sysrq = 0


# Controls whether core dumps will append the PID to the core filename.
# Useful for debugging multi-threaded applications.
kernel.core_uses_pid = 1


# Controls the use of TCP syncookies
net.ipv4.tcp_syncookies = 1


# Disable netfilter on bridges.
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-arptables = 0


# Controls the default maxmimum size of a mesage queue
kernel.msgmnb = 65536


# Controls the maximum size of a message, in bytes
kernel.msgmax = 65536


# Controls the maximum shared segment size, in bytes
kernel.shmmax = 68719476736


# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 4294967296


# control the value of vm.overcommit_memory
vm.overcommit.memory=1


# control the value of vm.swappiness
vm.swappiness=0


7. 调整ulinuxnproc在所有机器上

在/etc/security/limits.conf文件

hdfs - nofile 32768

hbase - nofile 32768

如果这个配置没有生效你可以在/etc/security/limits.d

 

 

我认为正确的做法,应该是修改/etc/security/limits.conf
里面有很详细的注释,比如
* soft nofile 32768
* hard nofile 65536

service sshd restart

8. 安装jdk配环境变量

我用的是jdk-8u20-linux-x64rpm

Rpm –ivh jdk-8u20-linux-x64rpm

修改文件vim /etc/profile

fi


for i in /etc/profile.d/*.sh ; do
    if [ -r "$i" ]; then
        if [ "${-#*i}" != "$-" ]; then
            . "$i"
        else
            . "$i" >/dev/null 2>&1
        fi
    fi
done


unset i
unset -f pathmunge
# set java environment
export JAVA_HOME=/usr/java/jdk1.6.0_45
export PATH=$JAVA_HOME/bin:$PATH

9. Ssh免密码登录

ssh–keygen –t rsa 这个命令一路回车会在/root/.ssh下产生两个文件

进入到文件目录下cd /root/.ssh 

pub文件的内容copyauthorized文件cp id_rsa.pub authorized_keys

删除pub文件rm id_rsa.pub

把所有zuthorized_keys的内容copy到一个文件中,在copy到各个机器见(初次登陆需要yes确认)

10. 时间同步所有机器

略。

二、安装cdh

$ sudo yum --nogpgcheck localinstall cloudera-cdh-4-0.x86_64.rpm

$ sudo rpm --import 

http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera


这是用了在线安装,如果是用自己配制的yum源内容就要改成你自己的路径。

三、zookeeper集群的安装和配置

1.  yum安装zookeeper

因为有依赖关系就不需要在安装zookeeper,系统会自动安装

只需要

Yum install zookeeper-server

如果不是这样安装就还需要yum install zookeeper



2. 在/var/lib/zookeepervim myid 内容为(1-255)的数字0. 在集群各节点进行初始化

Service zookeeper-server init –myid=1

Service zookeeper-server start

Jps

配置集群中的各个节点

Vim /etc/zookeeper/conf/zoo.cfg

server.1=hadoop1:2888:3888

server.2=hadoop2:2888:3888

server.3=hadoop3:2888:3888

server.4=hadoop4:2888:3888

server.5=hadoop5:2888:3888

server.6=hadoop6:2888:3888

3. 进行集群验证

你可以通过netstat –an | grep 2181   or 2888  or 3888

/usr/lib/zookeeper/bin/zkServer.sh status

遇到问题不要慌!可能是别的节点没起来也会报错。可以通过日志查看错误.

hdfs zkfc -formatZK

四、在各节点安装你需要的安装包

Hadoop1

Hadoop2

Hadoop3

Hadoop4

Hadoop5

Hadoop6

namenode

datanode

datanode

datanode

datanode

namenode

管理节点

数据节点

数据节点

数据节点

数据节点

管理节点

 

安装在哪里

安装什么

在JobTracker主机

sudo yum clean all; sudo yum install

hadoop-0.20-mapreduce-jobtracker

在NameNode主机

sudo yum clean all; sudo yum install

hadoop-hdfs-namenode

在Secondary NameNode host (if used)主机

sudo yum clean all; sudo yum install

hadoop-hdfs-secondarynamenode

除了JobTracker, NameNode,and Secondary (or Standby) NameNode 所在的主机,其余的都需要

sudo yum clean all; sudo yum install

hadoop-0.20-mapreduce-tasktracker

hadoop-hdfs-datanode

集群中所有主机

sudo yum clean all; sudo yum install

hadoop-client



五、部署MRv1到集群

1. copy hadoop配置

复制hadoop的配置文件conf.disk到你的自定义目录中,在集群所有机器上执行

sudo cp -r /etc/hadoop/conf.dist /etc/hadoop/conf.hadoop

$ sudo alternatives --verbose --install /etc/hadoop/conf hadoop-conf 

/etc/hadoop/conf.hadoop 50 

$ sudo alternatives --set hadoop-conf /etc/hadoop/conf.hadoop

2. 创建目录

 Namenode节点创建目录 
sudo mkdir -p /data/1/dfs/nn 

sudo chown -R hdfs:hdfs /data/1/dfs/nn 

sudo chmod 700 /data/1/dfs/nn 

chown hdfs:hadoop /tmp

chmod -R 1777 /tmp

mkdir -p /var/lib/hadoop-hdfs/cache/mapred/mapred/staging

chmod 1777 /var/lib/hadoop-hdfs/cache/mapred/mapred/staging

chown -R mapred /var/lib/hadoop-hdfs/cache/mapred

mkdir -p /tmp/mapred/system

chown mapred:hadoop /tmp/mapred/system 

mkdir -p /user/root

chown root /user/root

Datanode节点创建目录

mkdir -p /data/1/dfs/dn /data/2/dfs/dn /data/3/dfs/dn /data/4/dfs/dn

chown -R hdfs:hdfs /data/1/dfs/dn /data/2/dfs/dn /data/3/dfs/dn /data/4/dfs/dn

sudo mkdir -p /data/1/dfs/jn 

sudo chown -R hdfs:hdfs /data/1/dfs/jn

mkdir -p /var/lib/hadoop-hdfs/cache/mapred/mapred/staging

chmod 1777 /var/lib/hadoop-hdfs/cache/mapred/mapred/staging

chown -R mapred /var/lib/hadoop-hdfs/cache/mapred

chown hdfs:hadoop /tmp

chmod -R 1777 /tmp

mkdir -p /tmp/mapred/system

chown mapred:hadoop /tmp/mapred/system

mkdir -p /user/root

chown root /user/root

mkdir -p /data/1/mapred/local /data/2/mapred/local /data/3/mapred/local /data/4/mapred/local

chown -R mapred:hadoop /data/1/mapred/local /data/2/mapred/local /data/3/mapred/local /data/4/mapred/local

六、 修改配置文件

1.core-site.xml

<configuration>
  <!--namespace-->
  <property>
    <name>fs.default.name</name>
    <value>hdfs://wise</value>
  </property>
  <!--directory of tmp-->
  <property>
    <name>hadoop.tmp.dir</name>
        <value>/data/1/tmp</value>
  </property>
  <!--set ha zk-->
  <property>
    <name>ha.zookeeper.quorum</name>
    <value>hadoop1:2181,hadoop2:2181,hadoop3:2181,hadoop4:2181,hadoop5:2181,hadoop6:2181</value>
  </property>
  <!--set ha trash time-->
  <property>
    <name>fs.trash.interval</name>
    <value>10080</value>
  </property>


  <property>
    <name>fs.trash.checkpoint.interval</name>
    <value>10080</value>
  </property>
 <!--
  <property>
    <name>io.file.buffer.size</name>
        <value>32768</value>
        <description>default is 4kb we change it to 128kb the buffer size is used for read and write</description>
  </property>
  -->
  <property>
     <name>hadoop.proxyuser.hadoop.hosts</name>
     <value>*</value>
  </property>
  <property>
     <name>hadoop.proxyuser.hadoop.groups</name>
     <value>*</value>
  </property>


</configuration>

2.hdfs-site.xml



<configuration>


  <!--set block replication-->
  <property>
    <name>dfs.replication</name>
        <value>1</value>
  </property>
  <!--set webhdfs enabled-->
  <property>
    <name>dfs.webhdfs.enabled</name>
        <value>true</value>
  </property>
  <!--set dfs permissions-->
  <property>
    <name>dfs.permissions</name>
        <value>false</value>
  </property>
  <!--
  <property>
    <name>dfs.permissions.superusergroup</name>
    <value>hadoop</value>
  </property>
  -->
  <!--set HA-->
  <property>
    <name>dfs.nameservices</name>
    <value>wise</value>
  </property>


  <property>
    <name>dfs.ha.namenodes.wise</name>
    <value>hadoop1,hadoop6</value>
  </property>


  <property>
    <name>dfs.namenode.rpc-address.wise.hadoop1</name>
    <value>hadoop1:8020</value>
  </property>


  <property>
    <name>dfs.namenode.rpc-address.wise.hadoop6</name>
    <value>hadoop6:8020</value>
  </property>


  <property>
    <name>dfs.namenode.http-address.wise.hadoop1</name>
    <value>hadoop1:50070</value>
  </property>


  <property>
    <name>dfs.namenode.http-address.wise.hadoop6</name>
    <value>hadoop6:50070</value>
  </property>


  <property>
    <name>dfs.namenode.shared.edits.dir</name>
    <value>qjournal://hadoop2:8485;hadoop3:8485;hadoop4:8485/wise</value>
  </property>


  <property>
    <name>dfs.journalnode.edits.dir</name>
    <value>/data/1/dfs/jn</value>
  </property>


  <property>
    <name>dfs.client.failover.proxy.provider.wise</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
  </property>


  <property>
    <name>dfs.ha.fencing.methods</name>
    <value>shell(/bin/true)</value>
  </property>
  <!--
  <property>
    <name>dfs.ha.fencing.methods</name>
    <value>sshfence(/bin/true)</value>
  </property>


  <property>
    <name>dfs.ha.fencing.ssh.private-key-files</name>
    <value>/root/.ssh/id_rsa</value>
  </property>
  -->
  <property>
    <name>dfs.ha.fencing.ssh.connect-timeout</name>
        <value>20000</value>
  </property>


  <property>
    <name>dfs.ha.automatic-failover.enabled</name>
    <value>true</value>
  </property>


  <property>
    <name>dfs.datanode.max.xcievers</name>
    <value>8192</value>
  </property>
 <!--
  <property>
    <name>dfs.blocksize</name>
        <value>268435468</value>
  </property>
  -->
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:///data/1/dfs/nn</value>
  </property>


  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:///data/1/dfs/dn</value>
  </property>


</configuration>

3.mapred-site.xml



<configuration>


  <property>
    <name>mapred.local.dir</name>
    <value>/data/1/mapred/local</value>
  </property>


  <property>
    <name>mapreduce.jobtracker.restart.recover</name>
    <value>true</value>
  </property>


  <property>
    <name>mapred.job.tracker</name>
    <value>logicaljt</value>
    <!-- host:port string is replaced with a logical name -->
  </property>


  <property>
    <name>mapred.jobtrackers.logicaljt</name>
    <value>jt1,jt2</value>
    <description>Comma-separated list of JobTracker IDs.</description>
  </property>


  <property>
    <name>mapred.jobtracker.rpc-address.logicaljt.jt1</name>
    <!-- RPC address for jt1 -->
    <value>hadoop1:8021</value>
  </property>


  <property>
    <name>mapred.jobtracker.rpc-address.logicaljt.jt2</name>
    <!-- RPC address for jt2 -->
    <value>hadoop6:8021</value>
  </property>


  <property>
    <name>mapred.job.tracker.http.address.logicaljt.jt1</name>
    <!-- HTTP bind address for jt1 -->
    <value>hadoop1:50030</value>
  </property>


  <property>
    <name>mapred.job.tracker.http.address.logicaljt.jt2</name>
    <!-- HTTP bind address for jt2 -->
    <value>hadoop6:50030</value>
  </property>


  <property>
    <name>mapred.ha.jobtracker.rpc-address.logicaljt.jt1</name>
    <!-- RPC address for jt1 HA daemon -->
    <value>hadoop1:8023</value>
  </property>


  <property>
    <name>mapred.ha.jobtracker.rpc-address.logicaljt.jt2</name>
    <!-- RPC address for jt2 HA daemon -->
    <value>hadoop6:8023</value>
  </property>


  <property>
    <name>mapred.ha.jobtracker.http-redirect-address.logicaljt.jt1</name>
    <!-- HTTP redirect address for jt1 -->
    <value>hadoop1:50031</value>
  </property>


  <property>
    <name>mapred.ha.jobtracker.http-redirect-address.logicaljt.jt2</name>
    <!-- HTTP redirect address for jt2 -->
    <value>hadoop6:50031</value>
  </property>


  <property>
    <name>mapred.jobtracker.restart.recover</name>
    <value>true</value>
  </property>


  <property>
    <name>mapred.job.tracker.persist.jobstatus.active</name>
    <value>true</value>
  </property>


  <property>
    <name>mapred.job.tracker.persist.jobstatus.hours</name>
    <value>1</value>
  </property>


  <property>
    <name>mapred.job.tracker.persist.jobstatus.dir</name>
    <value>/data/1/jobtracker/jobsInfo</value>
  </property>


  <property>
    <name>mapred.client.failover.proxy.provider.logicaljt</name>
    <value>org.apache.hadoop.mapred.ConfiguredFailoverProxyProvider</value>
  </property>


  <property>
    <name>mapred.client.failover.max.attempts</name>
    <value>15</value>
  </property>


  <property>
    <name>mapred.client.failover.sleep.base.millis</name>
    <value>500</value>
  </property>


  <property>
    <name>mapred.client.failover.sleep.max.millis</name>
    <value>1500</value>
  </property>


  <property>
    <name>mapred.client.failover.connection.retries</name>
    <value>0</value>
  </property>


  <property>
    <name>mapred.client.failover.connection.retries.on.timeouts</name>
    <value>0</value>
  </property>


  <property>
    <name>mapred.ha.fencing.methods</name>
    <value>shell(/bin/true)</value>
  </property>


  <property>
    <name>mapred.ha.automatic-failover.enabled</name>
    <value>true</value>
  </property>


  <property>
    <name>mapred.ha.zkfc.port</name>
    <value>8018</value>
    <!-- Pick a different port for each failover controller when running one machine-->
  </property>


</configuration>

4. vi slaves

hadoop2
hadoop3
hadoop4
hadoop5
~

copy到各个机器去

六、安装步骤

1. 先把namenode和datanode,journalnode安装上

把配置文件添加进去,

sudo yum install hadoop-hdfs-namenode 

sudo yum install hadoop-hdfs-datanode

sudo yum install hadoop-hdfs-journalnode(我只在haoop2,hadoop3hadoop4上安装了)

sudo service hadoop-hdfs-journalnode start

sudo service hadoop-hdfs-datanode start

初始化namenode我在hadoop1上继初始化的

sudo -u hdfs hdfs namenode -format

启动hadoop1上的namenode

sudo service hadoop-hdfs-namenode start 

初始化standby namenode 并启动(在hadoop6上进行)

sudo -u hdfs hdfs namenode -bootstrapStandby

sudo service hadoop-hdfs-namenode start


2. 配置失败自动重启
在namenode节点(hadoop1&hadoop6)
 sudo yum install hadoop-hdfs-zkfc
 sudo service hadoop-hdfs-zkfc start
然后可以通过web查看
并通过kill -9 进程数来验证自启动效果
3. 安装jobtrackerHA(在hadoop1&hadoop6)
sudo yum install hadoop-0.20-mapreduce-jobtrackerha
sudo yum install hadoop-0.20-mapreduce-zkfc
格式化zk
sudo service hadoop-0.20-mapreduce-zkfc init
或者
sudo -u mapred hadoop mrzkfc -formatZK
启动
sudo service hadoop-0.20-mapreduce-zkfc start
sudo service hadoop-0.20-mapreduce-jobtrackerha start

查看状态
sudo -u mapred hadoop mrhaadmin -getServiceState <id>

sudo -u mapred hadoop mrhaadmin -transitionToActive <id>
sudo -u mapred hadoop mrhaadmin -getServiceState <id>
验证自启动














0 0
原创粉丝点击