Hadoop学习之HDFS/MapReduce/YARN进程介绍
来源:互联网 发布:手机加字幕软件 编辑:程序博客网 时间:2024/06/02 01:30
一、Hadoop之三大组件
HDFS———->数据存储
MapReduce—>作业计算框架
Yarn———–>资源调度
二、HDFS
1、启动HDFS查看进程
[hadoop@hadoop001 hadoop]$ sbin/start-dfs.shStarting namenodes on [hadoop001]hadoop001: starting namenode, logging to /opt/sourcecode/hadoop-2.8.1/logs/hadoop-hadoop-namenode-hadoop001.outhadoop001: starting datanode, logging to /opt/sourcecode/hadoop-2.8.1/logs/hadoop-hadoop-datanode-hadoop001.outStarting secondary namenodes [hadoop001]hadoop001: starting secondarynamenode, logging to /opt/sourcecode/hadoop-2.8.1/logs/hadoop-hadoop-secondarynamenode-hadoop001.out
2、由此可见HDFS进程有三个,分别为NameNode(nn)、DateNode(dn)、SecondaryNameNode(snn)
3、NameNode 通过core-site.html配置中192.168.187.111(hadoop001)启动
<property> <name>fs.defaultFS</name> <value>hdfs://192.168.187.111:9000</value> </property>
4、DateNode通过etc/slave配置来启动,里面默认值位localhost,需将localhost改成hadoop001。
[hadoop@hadoop001 hadoop]$ cat slaveshadoop001
5、SecondaryNameNode启动从官网得知默认地址为0.0.0.0,所需在hdfs-site.xml添加以下配置内容
<property> <name>dfs.namenode.secondary.http-address</name> <value>192.168.187.111:50090</value> </property> <property> <name>dfs.namenode.secondary.https-address</name> <value>192.168.187.111:50091</value> </property>
6、以上配置内容如果不改采取默认值那么启动HDFS如下,DataNode/SecondaryNameNode分别从localhost、0.0.0.0启动而且会让其输入密码。
[hadoop@hadoop001 hadoop]$ sbin/start-dfs.shStarting namenodes on [hadoop001]hadoop001: starting namenode, logging to /opt/sourcecode/hadoop-2.8.1/logs/hadoop-hadoop-namenode-hadoop001.outlocalhost: starting datanode, logging to /opt/sourcecode/hadoop-2.8.1/logs/hadoop-hadoop-datanode-hadoop001.outStarting secondary namenodes [hadoop001]0.0.0.0: starting secondarynamenode, logging to /opt/sourcecode/hadoop-2.8.1/logs/hadoop-hadoop-secondarynamenode-hadoop001.out
7、查看官网得知HDFS进程要启动成功需要配置core-site.xml、hdfs-site.xml两个文件
core-site.xml文件配置
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property></configuration>
hdfs-site.xml文件配置(完全分布式副本集应该为3,这里1是伪分布式)
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property></configuration>
二、MapReduce
MapReduce本身没有进程,只有当有map作业运行时才会存在进程。
You can run a MapReduce job on YARN in a pseudo-distributed mode by setting a few parameters and running ResourceManager daemon and NodeManager daemon in addition.
从官网这句话可以看出map作业是运行在YARN上的,所以mapred-site.xml配置文件需做如下配置
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property></configuration>
三、YARN
1、要启动yarn进程必须先配置yarn-site.xml文件,配置如下
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property></configuration>
2、启动yarn进程
sbin/start-yarn.sh
[hadoop@hadoop001 hadoop]$ sbin/start-yarn.shstarting yarn daemonsstarting resourcemanager, logging to /opt/sourcecode/hadoop-2.8.1/logs/yarn-hadoop-resourcemanager-hadoop001.outhadoop001: starting nodemanager, logging to /opt/sourcecode/hadoop-2.8.1/logs/yarn-hadoop-nodemanager-hadoop001.out
由此看出YARN有resourcemanager/nodemanager两个进程,从官网得知当YARN进程启动后可以通过http://localhost:8088/网址来查看集群健康(内存、磁盘、作业、IO等)情况。
当运行一个map job时可以在web界面查看job状态(RUNING、SUCCEEDED、FAILED)
- Hadoop学习之HDFS/MapReduce/YARN进程介绍
- Hadoop三剑客:hdfs、yarn、mapreduce
- Hadoop之MapReduce & HDFS
- HDFS简介,YARN、MapReduce原理介绍
- hadoop之hdfs和yarn
- hadoop初识之二:三大组件(HDFS,MapReduce,Yarn)以及mapreduce运行在yarn上的过程
- 互联网大数据框架介绍(一)Hadoop,HDFS,yarn,Mapreduce
- 【Hadoop】Hadoop/Yarn中hdfs与mapreduce相关问题汇总
- 【Hadoop】Hadoop/Yarn中hdfs与mapreduce相关问题汇总
- hadoop学习记(2)--HDFS+yarn+MapReduce关系与原理
- hadoop初识之四:HDFS、Yarn及mapreduce 回顾,配置文件的补充及yarn日志聚集功能配置
- hadoop学习笔记 MapReduce + HDFS
- Hadoop学习:HDFS和MapReduce
- hadoop搭建之HDFS,MapReduce
- Hadoop核心架构体系(HDFS+MapReduce+Hbase+Hive+Yarn)
- Hadoop 2.0(YARN/HDFS)学习资料汇总
- hadoop之HDFS介绍
- Hadoop之HDFS介绍
- Tomcat+Servlet保存Cookie到浏览器
- Java常见面试题—栈分配与TLAB
- (十)信号事件集成到多路I/O机制
- Go环境搭建:windows+liteide
- AngularJS 动画总结
- Hadoop学习之HDFS/MapReduce/YARN进程介绍
- java 基础加强(myeclipse,debug,junit,JDK5新特性,反射)
- 暑期集训 Contest 2
- 数字信号处理第二章:Z变换及离散时间系统系统分析
- maven学习基础知识篇一
- GeoJSON 和 TopoJSON
- Zabbix alerter processes more than 75% busy
- redis is configured to save rdb snapshots
- C++如何调用父类的方法?