<Hadoop>Hadoop 2.7.2 集群安装

来源:互联网 发布:邪恶漫画网站源码免费 编辑:程序博客网 时间:2024/06/06 08:29
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
http://www.edureka.co/blog/setting-up-a-multi-node-cluster-in-hadoop-2.X
http://www.powerxing.com/install-hadoop/
http://jingyan.baidu.com/article/e4d08ffdb3aa090fd2f60df3.html?qq-pf-to=pcqq.c2c
集群(靠谱) http://stackoverflow.com/questions/34617648/yarnapplicationstate-accepted-waiting-for-am-container-to-be-allocated-launch
集群(靠谱)  http://stackoverflow.com/questions/24481439/cant-run-a-mapreduce-job-on-hadoop-2-4-0/24544207#24544207
内存(靠谱) http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide/#cluster-installation
yarn-stie 细节说明  http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

一、Prerequisites
#Ubuntu 15.10虚拟机Master,桥接,生成 mac地址,已复制
0.改机器名     #sudo gedit /etc/hostname     #必须改机器名,因为虚拟机直接拷贝机器名是一样的
sudo gedit /etc/hosts   改名字
  

1.JAVA
#ipv6源
sudo gedit /etc/apt/sources.list
sudo apt-get update
sudo apt-get install openjdk-7-jdk
2.Adding a dedicated Hadoop system user      #必须新建账户,不然ssh不了
sudo addgroup hadoops
sudo adduser --ingroup hadoops hadoop
3.Configuring SSH
sudo apt-get install openssh-server
su - hadoop
ssh-keygen -t rsa -P ""
enter:  /home/hadoop/.ssh/id_rsa
cd .ssh
cat id_rsa.pub >> authorized_keys
登出当前账户,切换到hadoop账户
ssh localhost
第一次需要输入密码
exit
之后登入不需要密码
4.Disabling IPv6      #不用关的
一是彻底禁用ipv6,二是在Hadoop-evn.sh中添加"export HADOOP_OPTS="-Djava.net.preferlIPv4Stack=true"让Java程序使用ipv4

打开/etc/sysctl.conf,添加如下信息
#disable ipv6 
net.ipv6.conf.all.disable_ipv6 = 1 
net.ipv6.conf.default.disable_ipv6 = 1 
net.ipv6.conf.lo.disable_ipv6 = 1
查看是否关闭,1是关闭,0是未关
cat /proc/sys/net/ipv6/conf/all/disable_ipv6

二、hadoop单机
1.下载
http://www-eu.apache.org/dist/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz
2.添加权限
su
gedit /etc/sudoers  #添加一行 hadoop    ALL=(ALL:ALL) ALL 到 root   ALL=(ALL:ALL) ALL 下
exit
3.单机安装
cd /usr/local
sudo cp ~/Downloads/hadoop-2.7.2.tar.gz .
sudo tar xzf hadoop-2.7.2.tar.gz
mv hadoop-2.7.2 hadoop
sudo chown -R hadoop:hadoops hadoop  #相对于sudo chmod 777
4.Update $HOME/.bashrc    #用户级别的环境
cd ~
sudo gedit .bashrc
添加
#######
export HADOOP_HOME=/usr/local/hadoop

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

unalias fs &> /dev/null

alias fs="hadoop fs"

unalias hls &> /dev/null

alias hls="fs -ls"
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
######
source .bashrc


三、multi-node 集群配置
0.关闭防火墙   #不是必须的对ubuntu来说
sudo ufw disable
1.配置6个文件
cd /usr/local/hadoop/etc/hadoop/

gedit core-site.xml
#####add    ,默认master的8020端口,老的是9000端口
<configuration>
<property>
  <name>fs.default.name</name>
  <value>hdfs://master</value>
</property>
<property>
  <name>hadoop.tmp.dir</name>
    <value>file:///usr/local/hadoop/tmp</value>
</property>
</configuration>
#####

gedit hdfs-site.xml
####         ####不指定hadoop.tmp.dir的话会使用系统的tmp,重启后会出错,看下面的配置备份,副本数不少于datanode数

<configuration>
        <property>
             <name>dfs.replication</name>
             <value>2</value>
        </property>
        <property>
             <name>dfs.namenode.name.dir</name>
             <value>file:///usr/local/hadoop/tmp/dfs/name</value>
        </property>
        <property>
             <name>dfs.datanode.data.dir</name>
             <value>file:///usr/local/hadoop/tmp/dfs/data</value>
        </property>
    <property>
            <name>dfs.permissions</name>
            <value>false</value>
        </property>
</configuration>

####

gedit yarn-site.xml
###内存可以不配
<configuration>
<property>
        <name>yarn.resourcemanager.hostname</name>
        <value>master</value>
        <description>The hostname of the RM.</description>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
<property>
        <name>yarn.scheduler.minimum-allocation-mb</name>
        <value>128</value>
        <description>Minimum limit of memory to allocate to each container request at the Resource Manager.</description>
    </property>
    <property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>2048</value>
        <description>Maximum limit of memory to allocate to each container request at the Resource Manager.</description>
    </property>
    <property>
        <name>yarn.scheduler.minimum-allocation-vcores</name>
        <value>1</value>
        <description>The minimum allocation for every container request at the RM, in terms of virtual CPU cores. Requests lower than this won't take effect, and the specified value will get allocated the minimum.</description>
    </property>
    <property>
        <name>yarn.scheduler.maximum-allocation-vcores</name>
        <value>2</value>
        <description>The maximum allocation for every container request at the RM, in terms of virtual CPU cores. Requests higher than this won't take effect, and will get capped to this value.</description>
    </property>
    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>4096</value>
        <description>Physical memory, in MB, to be made available to running containers</description>
    </property>
    <property>
        <name>yarn.nodemanager.resource.cpu-vcores</name>
        <value>4</value>
        <description>Number of CPU cores that can be allocated for containers.</description>
    </property>
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>1.5</value>
</property>
</configuration>
###

cp mapred-site.xml.template  mapred-site.xml
gedit mapred-site.xml
####内存可以不配
<configuration>
<property>
        <name>yarn.app.mapreduce.am.resource.mb</name>
        <value>1024</value>
    </property>
    <property>
        <name>yarn.app.mapreduce.am.command-opts</name>
        <value>-Xmx768m</value>
    </property>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
        <description>Execution framework.</description>
    </property>
    <property>
        <name>mapreduce.map.cpu.vcores</name>
        <value>1</value>
        <description>The number of virtual cores required for each map task.</description>
    </property>
    <property>
        <name>mapreduce.reduce.cpu.vcores</name>
        <value>1</value>
        <description>The number of virtual cores required for each map task.</description>
    </property>
    <property>
        <name>mapreduce.map.memory.mb</name>
        <value>1024</value>
        <description>Larger resource limit for maps.</description>
    </property>
    <property>
        <name>mapreduce.map.java.opts</name>
        <value>-Xmx768m</value>
        <description>Heap-size for child jvms of maps.</description>
    </property>
    <property>
        <name>mapreduce.reduce.memory.mb</name>
        <value>1024</value>
        <description>Larger resource limit for reduces.</description>
    </property>
    <property>
        <name>mapreduce.reduce.java.opts</name>
        <value>-Xmx768m</value>
        <description>Heap-size for child jvms of reduces.</description>
    </property>
    <property>
        <name>mapreduce.jobtracker.address</name>
        <value>jobtracker.alexjf.net:8021</value>
    </property>
</configuration>
####

gedit masters
###
master
###

gedit slaves
######
slave1
slave2
######


2.安装同步器  #不是必须的,不要安装
sudo apt-get install rsync

3.关机,克隆出slave1,生成mac地址

4.ssh 互访
在slave端  
cd .ssh/
rm ./*
ssh-keygen -t rsa -P ""
enter:  /home/hadoop/.ssh/id_rsa
cat id_rsa.pub >> authorized_keys
scp id_rsa.pub  hadoop@10.103.90.163:/home/hadoop/.ssh/s1.pub
在master端
cat s1.pub >> authorized_keys
scp id_rsa.pub  hadoop@10.103.90.153:/home/hadoop/.ssh/m.pub
在slave端
cat m.pub >> authorized_keys
#新建terminal可以ssh无密码互访为准

5.hosts
在所有节点
sudo gedit /etc/hosts
添加
#######
10.103.90.163   master
10.103.90.153   slave1
#######

6.都重启 reboot

7.格式化  #主节点
hdfs namenode -format

8.重新生成ssh
各自
cd ~/.ssh
rm ./*
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat id_dsa.pub >> authorized_keys    #追加到本地
从机
scp id_dsa.pub  hadoop@10.103.90.163:/home/hadoop/.ssh/s1.pub
主机
cat s1.pub >> authorized_keys
同理。

9.启动  #主节点上,都尽量在主节点操作
start-all.sh
包括 start-dfs.sh和start-yarn.sh  #前面启动 NameNode和DataNode,后者启动ResourceManager和NodeManager

10.检查
jps  #主节点上看到 NameNode,从节点上看到 DataNode          #ps是查看进程,jps是查看java进程,结束进程用 kill -9 pid
hadoop dfsadmin -report #看到两个节点




11.测试
#或者已重命名hadoop fs 为fs
hadoop fs -ls /
hadoop fs -mkdir /input
hadoop fs -put etc/hadoop/* /input
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /input/ /output0    #注:输出文件夹需要是不存在的
fs -cat /output0/*

12.浏览器查看
http://10.103.90.163:8088
http://10.103.90.163:50070

13.若需要重新格式化
stop-all.sh    #在sbin下的
删除所以节点 tmp目录下所有
重新执行 hdfs namenode -format
可以保证clusterId一致


14.删除实验数据打包虚拟机
hadoop fs -rm -r /input
hadoop fs -rm -r /ouput0
hadoop fs -ls /
#hdfs的根为  /


15.关机前先stop


16。启动和关闭jobhistory
mr-jobhistory-daemon.sh start historyserver
mr-jobhistory-daemon.sh stop historyserver




问题1:
重启后丢失Namenode
需要修改hadoop.dir.tmp目录  在core-site中,不能共用系统的tmp


问题2:
运行卡住
xml配置文件必须对齐,tab和空格不能混用


问题3:修改机器
只需修改 /etc/hosts 里改ip 不需要重启,和ssh 互通(主的public key 存到 slaves),配置slaves文件,hosts,hostname

问题4:8088口没有job,map>map   
那是运行在本地了 LocalJobRunner
修改 mapred.framework.name 为mapreduce.framework.name
<property>
<name>mapreduce.framework.name</name>
     <value>yarn</value>
</property>
<property>

问题5:卡在AM wait RM  (AM是Nodemanager ,RM是ResourceManager),或者Connection Refused。
必须配置所有节点hosts和hostname 并且重启,重新配ssh互访(把master的公钥放到slave上),重新格式化

问题6:如何查看logs
从网页端或者 
logs
userlogs是applications用的
.log是完整的log
.out是精简版
datanode 在log在datanode机器上


1 0
原创粉丝点击