hadoop集群安装部署

来源:互联网 发布:洋葱网络 编辑:程序博客网 时间:2024/04/30 13:25

目标:
集群,运行mr一个例子
1.克隆3台机子,组成一个集群      
2.hostname,ip
3.ssh          --难点
4.配置环境变量
5.配置-hadoop的配置文件
6.格式化分布式文件系统


集群安装部署准备:

在hadoop单例模式上克隆ubuntu

克隆:虚拟机-> 管理-> 克隆


克隆完毕!


1.修改主机名
$sudo gedit /etc/hostname
第一台:sudo vim /etc/hostname  --改为master
第二台:sudo vim /etc/hostname  --改为slave

第三台:sudo vim /etc/hostname  --改为slave1
2.静态ip
sudo gedit /etc/network/interfaces

1 auto lo2 iface lo inet loopback3 auto eth04 iface eth0 inet static5 address 192.168.73.1306 netmask 255.255.255.07 network 192.168.73.08 boardcast 192.168.73.2559 gateway 192.168.73.210 dns-nameservers    8.8.8.8    8.8.4.4

第1行和第5行说明lo接口和eth0接口在系统启动时被自动配置
第2行将lo接口设置为一个本地回环(loopback)地址; 
第4行指出eth0接口具有一个静态的(static)IP配置; 
第5行-第10行分别设置eth0接口的ip、掩码、网络号、广播地址、网关、域名地址解析。

address 192.168.73.130   :130是自己给的,加了这些配置是属于静态的,前面的是在编辑->虚拟网络编辑器->net-                                                >查看的子网地址

netmask 255.255.255.0        子网掩码,不用动它
network 192.168.73.0           子网地址
boardcast 192.168.73.255    广播,255不要动它
gateway 192.168.73.2          网关 最后一个设为2,跟windows不一样,windows一般是1
dns-nameservers                  8.8.8.8 公共域名解析服务DNS

***************************************************************************************************************************

比如另一个例子:

************************************************************************************************************************

修改过后,重新启动ubuntu

检查ping通:三台ping www.baidu.com(要先保证能联网)
      master能ping通slave的ip address
      slave能ping通master的ip address

      master,slave都能ping通gateway

     (slave1同理)


3.修改hosts配置文件
sudo vim /etc/hosts
在文件中写入所有的集群机子名+ip
hosts :
192.168.121.168 master
192.168.121.169 slave
192.168.121.170 slave1

能ping通:
ping master
ping slave
ping slave1

4.ssh免密码登录
前置条件:安装ssh 服务器+客户端

sudo apt-get install ssh


检查ssh有没有安装成功:

ps -e|grep sshd

/etc/init.d/ssh start

------------------------------------------------------------------------------------------------------------------------

                                                      完成ssh的安装

------------------------------------------------------------------------------------------------------------------------

原理:非对称加密(加密用私钥private,解密用公钥public key)
数字签名
paris key--private,public
authorized_keys

1.每台机子上生成公钥,私钥对
A--pub pri
B--pub pri
C--pub pri
2.每台机子上pub生成一个authorized_keys(授权key)

ssh-keygen -t rsa -P ''       ------P参数表示密码
cd .ssh
ls -al
cd /home/hadoop/.ssh

连接自己的主机:
ls -al
cat id_rsa.pub
cat id_rsa.pub > authorized_keys
cd
ssh master
能够互相连到主机后,就可以下一步,实现了无密码登录 


3.把每台机子的authorized_keys发给一台机子master:

(在slave上把目录.ssh/id_rsa.pub上的东西发到master主机上,目标目录是/home/hadoop/.ssh/id_rsa_slave.pub)

$scp .ssh/id_rsa.pub hadoop@master:/home/hadoop/.ssh/id_rsa_slave.pub
(可以登录到master上)
ls -al(有id_rsa_slave.pub文件)


4.master具有所有机子的pub
使用所有的pub生成一个特殊的authorized_keys
(将id_rsa_slave.pub追加到authorized_keys)
$cat id_rsa_slave.pub>>authorized_keys


5.master把特殊的authorized_keys发给所有集群机子
(在master主机上,把authorized_keys发给所有的slave机子)

scp authorized_keys hadoop@slave:/home/hadoop/.ssh/authorized_keys

-------------------------------------------------------------------------------------------------------------

                             完成ssh的配置,各台机可以不输密码

-------------------------------------------------------------------------------------------------------------

二、配置hadoop集群
1.hadoop.tar.gz解压
2.配置-hadoop配置文件-重点
hdfs--->
core-site.xml
hdfs-site.xml
hadoop-env.sh
slaves
************************************************************************
(1)core-site.xml配置文件
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at


    http://www.apache.org/licenses/LICENSE-2.0


  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->


<!-- Put site-specific property overrides in this file. -->


<configuration>
<property>
   <name>fs.defaultFS</name>
   <value>hdfs://master:9000</value>
 </property>
<property>
     <name>hadoop.tmp.dir</name>
     <value>/home/hadoop/soft/tmp</value>
 </property> 


</configuration>

(2)hdfs-site.xml配置文件
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at


    http://www.apache.org/licenses/LICENSE-2.0


  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->


<!-- Put site-specific property overrides in this file. -->


<configuration>
     <property>
       <name>dfs.datanode.ipc.address</name>
       <value>0.0.0.0:50020</value>
     </property>
     <property>
       <name>dfs.datanode.http.address</name>
      <value>0.0.0.0:50075</value>
    </property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/soft/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/soft/data/datanode</value>
</property>
    <property>
      <name>dfs.replication</name>
      <value>1</value>
    </property>
  <property>
   <name>dfs.permissions</name>
     <value>false</value>
  </property>
</configuration>

(3)hadoop-env.sh配置文件
export JAVA_HOME=/home/hadoop/soft/jdk1.7.0_80
(4)slaves配置文件
master
slave



配置完成后,把配置文件 scp 所有集群机子上
$scp -r hadoop-2.6.0/etc/hadoop/ liu@slave1:/home/liu/hadoop-2.6.0/etc/


3.使用hadoop命令  -配置环境变量
sudo vim /etc/profile

export  JAVA_HOME=/home/liu/jdk1.7.0_80   
export  HADOOP_HOME=/home/liu/hadoop-2.6.0

export   CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$HADOOP_HOME/share/hadoop/common/hadoop-common-2.6.0.jar:$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.6.0.jar:$HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar:$CLASSPATH


export   PATH=$JAVA_HOME/bin:$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

格式化hdfs
$hdfs namenode -format

启动 dfs
start-dfs.sh       

验证:
jps
4895 DataNode
4775 NameNode
出现DataNode,NameNode就OK了。


注意,在配置文件之前要自己建两个文件夹--data / tmp;每一台虚拟机都要。