hadoop笔记整理(-)

来源:互联网 发布:影楼相册排版软件 编辑:程序博客网 时间:2024/05/23 10:55

java环境搭建:

查看本机java版本:[xiangkun@hadoop-senior01 ~]$ rpm -qa|grep java卸载本机java版本:[xiangkun@hadoop-senior01 ~]$ rpm -e --nodeps java-1.6.0-执行权限:[xiangkun@hadoop-senior01 softwares]$ chmod u+x ./*解压到modules目录:[xiangkun@hadoop-senior01 softwares]$ tar -zxf jkd-1.8.0  -C /opt/modules/配置环境变量:###JAVA_HOMEexport JAVA_HOME=/opt/modules/jdk1.8.0_131export PATH=$PATH:$JAVA_HOME/bin

Hadoop安装:

apache所有文件的归档(历史版本):http://archive.apache.org/dist/hadoop所有版本的归档:http://archive.apache.org/dist/hadoop/common

解压:[xiangkun@hadoop-senior01 softwares]$ tar -zxf hadoop-2.5.0.tar.gz -C /opt/modules/hadoop-env.xmlexport JAVA_HOME=/opt/modules/jdk1.8.0_131

mapreduce 三种运行方式

  • Local (Standalone) Mode 本地模式
  • Pseudo-Distributed Mode 尾分布式模式
  • Fully-Distributed Mode 分布式模式

第一种 :Local Mode

cd 到hadoop安装目录:  $ mkdir input  $ cp etc/hadoop/*.xml input  $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar grep input output 'dfs[a-z.]+'  $ cat output/*

这里写图片描述

经典案例:使用mapreduce统计单词的个数


xiangkun@xiangkun-X550LD:/opt/modules/hadoop-2.5.0$ sudo mkdir wcinput

xiangkun@xiangkun-X550LD:/opt/modules/hadoop-2.5.0$ cd wcinput

xiangkun@xiangkun-X550LD:/opt/modules/hadoop-2.5.0/wcinput$ sudo touch wc.input // 创建一个文件

xiangkun@xiangkun-X550LD:/opt/modules/hadoop-2.5.0/wcinput$ vim wc.input //编辑这个文件

hadoop hdfs
hadoop yarn
hadoop mapreduce

$bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount wcinput wcoutput

$ cat wcoutput/*

第二种 :Pseudo-Distributed Mode

 etc/hadoop/core-site.xml:配置一:<configuration>    <property>        <name>fs.defaultFS</name>        ###这主机名,就指定了namenode运行的机器        <value>hdfs://localhost:9000</value>    </property></configuration>配置二:<configuration>    <property>        <name>hadoop.tmp.dir</name>        <value>/opt/modules/hadoop-2.5.0/data/tmp</value>    </property></configuration>

etc/hadoop/hdfs-site.xml:

<configuration>    <property>        <name>dfs.replication</name>        <value>1</value>    </property>    ###指定secondarynamenode运行在那台机器上    <property>        <name>dfs.namenode.secondary.http-address</name>        <value>hadoop-senior01.xiangkun:50090</value>    </property>### </configuration>

格式化:

  $ bin/hdfs namenode -format

启动:

[xiangkun@hadoop-senior01 hadoop-2.5.0]$ sudo sbin/hadoop-daemon.sh start namenode[sudo] password for xiangkun: starting namenode, logging to /opt/modules/hadoop-2.5.0/logs/hadoop-root-namenode-hadoop-senior01.xiangkun.out[xiangkun@hadoop-senior01 hadoop-2.5.0]$ jps4001 Jps3878 NameNode[xiangkun@hadoop-senior01 hadoop-2.5.0]$ sbin/hadoop-daemon.sh start datanodestarting datanode, logging to /opt/modules/hadoop-2.5.0/logs/hadoop-xiangkun-datanode-hadoop-senior01.xiangkun.out[xiangkun@hadoop-senior01 hadoop-2.5.0]$ jps4032 DataNode3878 NameNode4103 Jps

hdfs:启动后,通过web 访问的端口是50070
 
 

 在hdfs文件系统创建一个目录[xiangkun@hadoop-senior01 hadoop-2.5.0]$ bin/hdfs dfs -mkdir -p /usr/xiangkun/mapreduce/wordcount/input给这个目录上传一个文件:[xiangkun@hadoop-senior01 hadoop-2.5.0]$ bin/hdfs dfs -put wcinput/wc.input   /usr/xiangkun/mapreduce/wordcount/input执行mapreduce程序:[xiangkun@hadoop-senior01 hadoop-2.5.0]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar  wordcount /usr/xiangkun/mapreduce/wordcount/input /usr/xiangkun/mapreduce/wordcount/output查看执行结果:[xiangkun@hadoop-senior01 hadoop-2.5.0]$ bin/hdfs dfs -cat /usr/xiangkun/mapreduce/wordcount/output/*

配置yarn,将hdfs文件系统运行在yarn上

yarn-env.shJAVA_HOME=/opt/modules/jdk1.8.0_131
slaves   (决定datanode ,nodemanager在那台机器)hadoop-senior01.xiangkun
yarn-site.xml    <property>        <name>yarn.nodemanager.aux-services</name>        <value>mapreduce_shuffle</value>    </property>    <property>        <name>yarn.resourcemanager.hostname</name>        <value>hadoop-senior01.xiangkun</value>    </property>

启动yarn:

[xiangkun@hadoop-senior01 hadoop-2.5.0]$ sbin/yarn-daemon.sh start resourcemanager[xiangkun@hadoop-senior01 hadoop-2.5.0]$ sbin/yarn-daemon.sh start nodemanager[xiangkun@hadoop-senior01 hadoop-2.5.0]$ jps4032 DataNode5299 Jps5268 NodeManager3878 NameNode5021 ResourceManager
mapped-env.shexport JAVA_HOME=/opt/modules/jdk1.8.0_131
mapred-site.xml    <property>        <name>mapreduce.framework.name</name>        <value>yarn</value>    </property>

yarn:启动后,通过web 访问的端口是8088
 
将hdfs运行在yarn上:

[xiangkun@hadoop-senior01 hadoop-2.5.0]$ bin/hdfs dfs -rm -R /usr/xiangkun/mapreduce/wordcount/output/[xiangkun@hadoop-senior01 hadoop-2.5.0]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar  wordcount /usr/xiangkun/mapreduce/wordcount/input /usr/xiangkun/mapreduce/wordcount/output

tips:

  1. 数据流只经过datanode,不经过namenode,namenode仅仅存储元数据
  2. 默认的default.xml是相应的jar包中。

  3. 日志查看方式:

    • more :翻页查看
    • tail: 文件的末尾 (man tail: 查看tail使用方法)
    • tail -100f: 文件倒数100行日志

yarn历史监控启动:

配置日志聚集属性:

yarn-site.xml<property>     <name>yarn.log-aggregation-enable</name>     <value>true</value></property>##该属性,表示日志保存的时间<property>     <name>yarn.log-aggregation-retain-seconds</name>     <value>640800</value></property>
[xiangkun@hadoop-senior01 hadoop-2.5.0]$sbin/mr-jobhistory-daemon.sh start historyserver

配置删除的文件在垃圾箱保存的时间

core-site.xml<property>    <name>fs.trash.interval</name>    <value>640800</value></property>

启动方式总结:

1.  各个服务组件逐一启动     1.hdfs:       hadoop-daemon.sh start|stop namenode|datanode|secondarynamenode       2.yarn       yarn-daemon.sh start|stop resourcemanager|nodemanager       3.mapreduce       mr-historyserver-daemon.sh start|stop historyserver 2.各个模块分开启动:     1.hdfs         start-dfs.sh         stop-dfs.sh      namenode先链接自己,在链接别的阶段datanode,      配置ssh无密码登录:      [xiangkun@hadoop-senior01 .ssh]$ pwd        /home/xiangkun/.ssh      [xiangkun@hadoop-senior01 .ssh]$ ssh-keygen -t rsa      ###发送到其它机器(用户名一样)      [xiangkun@hadoop-senior01 .ssh]$ ssh-copy-id hostname     2.yarn         start-yarn.sh         stop-yarn.sh   3.全部启动(不推荐:因为启动hdfs,yarn都需要在主节点上,实际上是分布式的,namenode ,resoursemanager 在不同的节点上)        1.start-all.sh        2.stop-all.sh

namenode运行在那台机器:

 etc/hadoop/core-site.xml:配置一:<configuration>    <property>        <name>fs.defaultFS</name>        ###这主机名,就指定了namenode运行的机器        <value>hdfs://localhost:9000</value>    </property></configuration>

datanode运行在那台机器:

slaves   (决定datanode ,nodemanager在那台机器)hadoop-senior01.xiangkun

secondarynamenode运行在那台机器:

etc/hadoop/hdfs-site.xml:

<configuration>    ###指定secondarynamenode运行在那台机器上    <property>        <name>dfs.namenode.secondary.http-address</name>        <value>hadoop-senior01.xiangkun:50090</value>    </property>### </configuration>

ResourceManager/NodeManager运行在那台机器:

yarn-site.xml<property>        <name>yarn.resourcemanager.hostname</name>        <value>hadoop-senior01.xiangkun</value></property><property>        <name>yarn.nodemanager.aux-services</name>        <value>mapreduce_shuffle</value>    </property>

MapReduce HistoryServer运行在那台机器:

mapped-site.xml<property>        <name>mapreduce.jobhistory.address</name>        <value>hadoop-senior01.xiangkun:10020</value>    </property>    <property>        <name>mapreduce.jobhistory.webapp.address</name>        <value>hadoop-senior01.xiangkun:19888</value>    </property>

hdfs:sbin/start-all.sh启动顺序:namenode–>datanode—>secodarynamenode—>resourcemanager—>nodemanager
然后启动mapreduce:sbin/mr-jobhistory-daemon.sh start historyserver

原创粉丝点击