hadoop笔记整理(-)
来源:互联网 发布:影楼相册排版软件 编辑:程序博客网 时间:2024/05/23 10:55
java环境搭建:
查看本机java版本:[xiangkun@hadoop-senior01 ~]$ rpm -qa|grep java卸载本机java版本:[xiangkun@hadoop-senior01 ~]$ rpm -e --nodeps java-1.6.0-执行权限:[xiangkun@hadoop-senior01 softwares]$ chmod u+x ./*解压到modules目录:[xiangkun@hadoop-senior01 softwares]$ tar -zxf jkd-1.8.0 -C /opt/modules/配置环境变量:###JAVA_HOMEexport JAVA_HOME=/opt/modules/jdk1.8.0_131export PATH=$PATH:$JAVA_HOME/bin
Hadoop安装:
apache所有文件的归档(历史版本):http://archive.apache.org/dist/hadoop所有版本的归档:http://archive.apache.org/dist/hadoop/common
解压:[xiangkun@hadoop-senior01 softwares]$ tar -zxf hadoop-2.5.0.tar.gz -C /opt/modules/hadoop-env.xmlexport JAVA_HOME=/opt/modules/jdk1.8.0_131
mapreduce 三种运行方式
- Local (Standalone) Mode 本地模式
- Pseudo-Distributed Mode 尾分布式模式
- Fully-Distributed Mode 分布式模式
第一种 :Local Mode
cd 到hadoop安装目录: $ mkdir input $ cp etc/hadoop/*.xml input $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar grep input output 'dfs[a-z.]+' $ cat output/*
经典案例:使用mapreduce统计单词的个数
xiangkun@xiangkun-X550LD:/opt/modules/hadoop-2.5.0$ sudo mkdir wcinput
xiangkun@xiangkun-X550LD:/opt/modules/hadoop-2.5.0$ cd wcinput
xiangkun@xiangkun-X550LD:/opt/modules/hadoop-2.5.0/wcinput$ sudo touch wc.input // 创建一个文件
xiangkun@xiangkun-X550LD:/opt/modules/hadoop-2.5.0/wcinput$ vim wc.input //编辑这个文件
hadoop hdfs
hadoop yarn
hadoop mapreduce
$bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount wcinput wcoutput
$ cat wcoutput/*
第二种 :Pseudo-Distributed Mode
etc/hadoop/core-site.xml:配置一:<configuration> <property> <name>fs.defaultFS</name> ###这主机名,就指定了namenode运行的机器 <value>hdfs://localhost:9000</value> </property></configuration>配置二:<configuration> <property> <name>hadoop.tmp.dir</name> <value>/opt/modules/hadoop-2.5.0/data/tmp</value> </property></configuration>
etc/hadoop/hdfs-site.xml:
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> ###指定secondarynamenode运行在那台机器上 <property> <name>dfs.namenode.secondary.http-address</name> <value>hadoop-senior01.xiangkun:50090</value> </property>### </configuration>
格式化:
$ bin/hdfs namenode -format
启动:
[xiangkun@hadoop-senior01 hadoop-2.5.0]$ sudo sbin/hadoop-daemon.sh start namenode[sudo] password for xiangkun: starting namenode, logging to /opt/modules/hadoop-2.5.0/logs/hadoop-root-namenode-hadoop-senior01.xiangkun.out[xiangkun@hadoop-senior01 hadoop-2.5.0]$ jps4001 Jps3878 NameNode[xiangkun@hadoop-senior01 hadoop-2.5.0]$ sbin/hadoop-daemon.sh start datanodestarting datanode, logging to /opt/modules/hadoop-2.5.0/logs/hadoop-xiangkun-datanode-hadoop-senior01.xiangkun.out[xiangkun@hadoop-senior01 hadoop-2.5.0]$ jps4032 DataNode3878 NameNode4103 Jps
hdfs:启动后,通过web 访问的端口是50070
在hdfs文件系统创建一个目录[xiangkun@hadoop-senior01 hadoop-2.5.0]$ bin/hdfs dfs -mkdir -p /usr/xiangkun/mapreduce/wordcount/input给这个目录上传一个文件:[xiangkun@hadoop-senior01 hadoop-2.5.0]$ bin/hdfs dfs -put wcinput/wc.input /usr/xiangkun/mapreduce/wordcount/input执行mapreduce程序:[xiangkun@hadoop-senior01 hadoop-2.5.0]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount /usr/xiangkun/mapreduce/wordcount/input /usr/xiangkun/mapreduce/wordcount/output查看执行结果:[xiangkun@hadoop-senior01 hadoop-2.5.0]$ bin/hdfs dfs -cat /usr/xiangkun/mapreduce/wordcount/output/*
配置yarn,将hdfs文件系统运行在yarn上
yarn-env.shJAVA_HOME=/opt/modules/jdk1.8.0_131
slaves (决定datanode ,nodemanager在那台机器)hadoop-senior01.xiangkun
yarn-site.xml <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>hadoop-senior01.xiangkun</value> </property>
启动yarn:
[xiangkun@hadoop-senior01 hadoop-2.5.0]$ sbin/yarn-daemon.sh start resourcemanager[xiangkun@hadoop-senior01 hadoop-2.5.0]$ sbin/yarn-daemon.sh start nodemanager[xiangkun@hadoop-senior01 hadoop-2.5.0]$ jps4032 DataNode5299 Jps5268 NodeManager3878 NameNode5021 ResourceManager
mapped-env.shexport JAVA_HOME=/opt/modules/jdk1.8.0_131
mapred-site.xml <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
yarn:启动后,通过web 访问的端口是8088
将hdfs运行在yarn上:
[xiangkun@hadoop-senior01 hadoop-2.5.0]$ bin/hdfs dfs -rm -R /usr/xiangkun/mapreduce/wordcount/output/[xiangkun@hadoop-senior01 hadoop-2.5.0]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount /usr/xiangkun/mapreduce/wordcount/input /usr/xiangkun/mapreduce/wordcount/output
tips:
- 数据流只经过datanode,不经过namenode,namenode仅仅存储元数据
默认的default.xml是相应的jar包中。
日志查看方式:
- more :翻页查看
- tail: 文件的末尾 (man tail: 查看tail使用方法)
- tail -100f: 文件倒数100行日志
yarn历史监控启动:
配置日志聚集属性:
yarn-site.xml<property> <name>yarn.log-aggregation-enable</name> <value>true</value></property>##该属性,表示日志保存的时间<property> <name>yarn.log-aggregation-retain-seconds</name> <value>640800</value></property>
[xiangkun@hadoop-senior01 hadoop-2.5.0]$sbin/mr-jobhistory-daemon.sh start historyserver
配置删除的文件在垃圾箱保存的时间
core-site.xml<property> <name>fs.trash.interval</name> <value>640800</value></property>
启动方式总结:
1. 各个服务组件逐一启动 1.hdfs: hadoop-daemon.sh start|stop namenode|datanode|secondarynamenode 2.yarn yarn-daemon.sh start|stop resourcemanager|nodemanager 3.mapreduce mr-historyserver-daemon.sh start|stop historyserver 2.各个模块分开启动: 1.hdfs start-dfs.sh stop-dfs.sh namenode先链接自己,在链接别的阶段datanode, 配置ssh无密码登录: [xiangkun@hadoop-senior01 .ssh]$ pwd /home/xiangkun/.ssh [xiangkun@hadoop-senior01 .ssh]$ ssh-keygen -t rsa ###发送到其它机器(用户名一样) [xiangkun@hadoop-senior01 .ssh]$ ssh-copy-id hostname 2.yarn start-yarn.sh stop-yarn.sh 3.全部启动(不推荐:因为启动hdfs,yarn都需要在主节点上,实际上是分布式的,namenode ,resoursemanager 在不同的节点上) 1.start-all.sh 2.stop-all.sh
namenode运行在那台机器:
etc/hadoop/core-site.xml:配置一:<configuration> <property> <name>fs.defaultFS</name> ###这主机名,就指定了namenode运行的机器 <value>hdfs://localhost:9000</value> </property></configuration>
datanode运行在那台机器:
slaves (决定datanode ,nodemanager在那台机器)hadoop-senior01.xiangkun
secondarynamenode运行在那台机器:
etc/hadoop/hdfs-site.xml:
<configuration> ###指定secondarynamenode运行在那台机器上 <property> <name>dfs.namenode.secondary.http-address</name> <value>hadoop-senior01.xiangkun:50090</value> </property>### </configuration>
ResourceManager/NodeManager运行在那台机器:
yarn-site.xml<property> <name>yarn.resourcemanager.hostname</name> <value>hadoop-senior01.xiangkun</value></property><property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property>
MapReduce HistoryServer运行在那台机器:
mapped-site.xml<property> <name>mapreduce.jobhistory.address</name> <value>hadoop-senior01.xiangkun:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hadoop-senior01.xiangkun:19888</value> </property>
hdfs:sbin/start-all.sh启动顺序:namenode–>datanode—>secodarynamenode—>resourcemanager—>nodemanager
然后启动mapreduce:sbin/mr-jobhistory-daemon.sh start historyserver
- hadoop整理笔记
- hadoop笔记整理(-)
- hadoop环境异常整理【工作笔记】
- Hadoop权威指南学习笔记整理
- hadoop整理
- hadoop streaming 技术整理
- hadoop 资源整理
- Hadoop博客整理
- Hadoop配置项整理
- Hadoop配置项整理
- Hadoop 使用问题整理
- Hadoop相关概念整理
- Hadoop资料整理
- hadoop全方位资料整理
- hadoop面试整理
- hadoop hive 操作整理
- Hadoop基础知识简单整理
- Hadoop文档整理
- “iOS应用架构谈 开篇”笔记
- 常用的字符串输出函数
- 数据请求优化之容器缓存
- 跨域问题
- C# AJAX点击文本框,从后台获取数据显示下拉列表(带有皮肤)
- hadoop笔记整理(-)
- socket学习
- Codeforces Round #424 (Div. 2) A. Unimodal Array
- 异常 1130 host is not allowed to connect to
- ajax
- java的学习方向与经验!
- 关于钻石太多的显示问题
- synchronized关键字修饰方法
- NetBeans中控制台输入乱码问题