第120讲:Hadoop的MapReduce和Yarn的配置实战详解学习笔记
来源:互联网 发布:php模拟get提交数据 编辑:程序博客网 时间:2024/06/05 00:28
第120讲:Hadoop的MapReduce和Yarn的配置实战详解学习笔记
本讲主要讲解MapReduce和Yarn的配置方法
核心配置有两个:mapreduce-site.xml和yarn-site.xml
1.MapReduce配置:
Parameter
Value
Notes
mapreduce.framework.name
yarn
Execution framework set to Hadoop YARN.
mapreduce.map.memory.mb
1536
Larger resource limit for maps.
mapreduce.map.java.opts
-Xmx1024M
Larger heap-size for child jvms of maps.
mapreduce.reduce.memory.mb
3072
Larger resource limit for reduces.
mapreduce.reduce.java.opts
-Xmx2560M
Larger heap-size for child jvms of reduces.
mapreduce.task.io.sort.mb
512
Higher memory-limit while sorting data for efficiency.
mapreduce.task.io.sort.factor
100
More streams merged at once while sorting files.
mapreduce.reduce.shuffle.parallelcopies
50
Higher number of parallel copies run by reduces to fetch outputs from very large number of maps.
· Configurations for MapReduce JobHistory Server:
Parameter
Value
Notes
mapreduce.jobhistory.address
MapReduce JobHistory Serverhost:port
Default port is 10020.
mapreduce.jobhistory.webapp.address
MapReduce JobHistory Server Web UIhost:port
Default port is 19888.
mapreduce.jobhistory.intermediate-done-dir
/mr-history/tmp
Directory where history files are written by MapReduce jobs.
mapreduce.jobhistory.done-dir
/mr-history/done
Directory where history files are managed by the MR JobHistory Server.
可以通过端口19888访问jobhistory。
mapreduce-site.xml最小配置:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
2.Yarn配置:
etc/hadoop/yarn-site.xml
Configurations for ResourceManager and NodeManager:
Parameter
Value
Notes
yarn.acl.enable
true /false
Enable ACLs? Defaults to false.
yarn.admin.acl
Admin ACL
ACL to set admins on the cluster. ACLs are of for comma-separated-usersspacecomma-separated-groups. Defaults to special value of * which means anyone. Special value of just space means no one has access.
yarn.log-aggregation-enable
false
Configuration to enable or disable log aggregation
· Configurations for ResourceManager:
Parameter
Value
Notes
yarn.resourcemanager.address
ResourceManager host:port for clients to submit jobs.
host:port If set, overrides the hostname set inyarn.resourcemanager.hostname.
yarn.resourcemanager.scheduler.address
ResourceManager host:port for ApplicationMasters to talk to Scheduler to obtain resources.
host:port If set, overrides the hostname set inyarn.resourcemanager.hostname.
yarn.resourcemanager.resource-tracker.address
ResourceManager host:port for NodeManagers.
host:port If set, overrides the hostname set inyarn.resourcemanager.hostname.
yarn.resourcemanager.admin.address
ResourceManager host:port for administrative commands.
host:port If set, overrides the hostname set inyarn.resourcemanager.hostname.
yarn.resourcemanager.webapp.address
ResourceManager web-ui host:port.
host:port If set, overrides the hostname set inyarn.resourcemanager.hostname.
yarn.resourcemanager.hostname
ResourceManager host.
host Single hostname that can be set in place of setting allyarn.resourcemanager*address resources. Results in default ports for ResourceManager components.
yarn.resourcemanager.scheduler.class
ResourceManager Scheduler class.
CapacityScheduler (recommended), FairScheduler (also recommended), or FifoScheduler
yarn.scheduler.minimum-allocation-mb
Minimum limit of memory to allocate to each container request at the Resource Manager.
In MBs
yarn.scheduler.maximum-allocation-mb
Maximum limit of memory to allocate to each container request at the Resource Manager.
In MBs
yarn.resourcemanager.nodes.include-path /yarn.resourcemanager.nodes.exclude-path
List of permitted/excluded NodeManagers.
If necessary, use these files to control the list of allowable NodeManagers.
· Configurations for NodeManager:
Parameter
Value
Notes
yarn.nodemanager.resource.memory-mb
Resource i.e. available physical memory, in MB, for given NodeManager
Defines total available resources on the NodeManager to be made available to running containers
yarn.nodemanager.vmem-pmem-ratio
Maximum ratio by which virtual memory usage of tasks may exceed physical memory
The virtual memory usage of each task may exceed its physical memory limit by this ratio. The total amount of virtual memory used by tasks on the NodeManager may exceed its physical memory usage by this ratio.
yarn.nodemanager.local-dirs
Comma-separated list of paths on the local filesystem where intermediate data is written.
Multiple paths help spread disk i/o.
yarn.nodemanager.log-dirs
Comma-separated list of paths on the local filesystem where logs are written.
Multiple paths help spread disk i/o.
yarn.nodemanager.log.retain-seconds
10800
Default time (in seconds) to retain log files on the NodeManager Only applicable if log-aggregation is disabled.
yarn.nodemanager.remote-app-log-dir
/logs
HDFS directory where the application logs are moved on application completion. Need to set appropriate permissions. Only applicable if log-aggregation is enabled.
yarn.nodemanager.remote-app-log-dir-suffix
logs
Suffix appended to the remote log dir. Logs will be aggregated to ${yarn.nodemanager.remote-app-log-dir}/${user}/${thisParam} Only applicable if log-aggregation is enabled.
yarn.nodemanager.aux-services
mapreduce_shuffle
Shuffle service that needs to be set for Map Reduce applications.
· Configurations for History Server (Needs to be moved elsewhere):
Parameter
Value
Notes
yarn.log-aggregation.retain-seconds
-1
How long to keep aggregation logs before deleting them. -1 disables. Be careful, set this too small and you will spam the name node.
yarn.log-aggregation.retain-check-interval-seconds
-1
Time between checks for aggregated log retention. If set to 0 or a negative value then the value is computed as one-tenth of the aggregated log retention time. Be careful, set this too small and you will spam the name node.
mapreduce运行在yarn上,需要指定mapreduce_shuffle句柄。
yarn是一个通用的资源管理框架,mapreduce需要显式指定,因为yarn上可能运行很多其他框架。
yarn.nodemanager.aux-servicesmapreduce_shuffle Shuffle service that needs to be set for Map Reduce applications.
最小化配置方式:(yarn-site.xml)
<property>
<name>yarn.resourcemanager.hostname</name>
<value>Master</value>
</property>
<property>
<name>yarn.nodemanager.aux_services</name>
<value>mapreduce_shuffle</value>
</property>
以上内容是从王家林老师DT大数据课程第120讲的学习笔记。
DT大数据微信公众账号:DT_Spark
王家林老师QQ:1740415547
王家林老师微信号:18610086859
For free, For everyone, Forever, For Love! DT大数据梦工厂陆续发布的大数据纯实战视频全面彻底的涵盖大数据领域所有的最具有价值的技术,包括但不限于Scala、Akka、Kafka、NIO/Netty、Hadoop、Hive、HBase、Canssandra、Spark、Flink、R语言、机器学习等,如果您在学习DT大数据梦工厂免费视频的过程发现视频确实不错,可以通过王家林老师的微信号18610086859捐助大数据系列免费实战课程,以支持1000讲大数据实战视频在2016年5月1日前的顺利完满录制!
目前已经发布的王家林免费大数据和云计算视频全系列如下:
1,《Hadoop深入浅出实战经典》http://pan.baidu.com/s/1mgpfRPu
2,《Spark纯实战公益大讲坛》http://pan.baidu.com/s/1jGpNGwu
3,《Scala深入浅出实战经典》http://pan.baidu.com/s/1sjDWG25
4,《Docker公益大讲坛》http://pan.baidu.com/s/1kTpL8UF
5,《Spark亚太研究院Spark公益大讲堂》http://pan.baidu.com/s/1i30Ewsd
6,DT大数据梦工厂Spark、Scala、Hadoop的所有视频、PPT和代码在百度云网盘的链接:
http://pan.baidu.com/share/home?uk=4013289088#category/type=0&qq-pf-to=pcqq.group
第120讲视频网站地址:
51CTO
http://edu.51cto.com/lesson/id-77705.html
- 第120讲:Hadoop的MapReduce和Yarn的配置实战详解学习笔记
- Hadoop MapReduce和Yarn的关系
- 第121讲:Hadoop集群的格式化、集群运行实战解析等学习笔记
- 第117讲:Hadoop集群之安装IP配置、Slaves、namenode和secondarynamenode的配置学习笔记
- 【hadoop 2学习】Hadoop下一代的MapReduce----YARN
- Mapreduce和yarn的内存配置
- hadoop新MapReduce框架yarn学习笔记
- 第128讲:Hadoop集群管理工具dfsadmin实战详解学习笔记
- 第129讲:Hadoop集群管理工具fsck实战详解学习笔记
- 第131讲:Hadoop集群管理工具均衡器Balancer 实战详解学习笔记
- 大数据学习笔记之二十九 Hadoop的第二代MapReduce YARN
- hadoop里面的MapReduce和yarn的运行原理
- Hadoop、MapReduce、YARN和Spark的区别与联系
- hadoop-企业版环境搭建(三)-mapreduce和yarn的安装
- 第116讲:Hadoop集群之安装Java、创建Hadoop用户、配置SSH等实战学习笔记
- YARN and MapReduce的【内存】优化配置详解
- 下一代的APACHE HADOOP MAPREDUCE : YARN
- YARN Apache Hadoop 的下一代MapReduce
- Trinea性能优化之性能优化实例
- ce'shi
- 屏蔽拨号弹出用户和密码的提示框
- 不一样的hello world
- 对Adaboost和SVM的一点直观认识
- 第120讲:Hadoop的MapReduce和Yarn的配置实战详解学习笔记
- Unity3D游戏客户端开发——2015秋季校招求职总结
- android环境配置
- unity 新GUI系统阻挡原生collider的方法
- Opencv学习手册(一)--- 图像文件读入和显示
- 王家林老师的免费大数据视频,欢迎大家下载学习。
- php获取客户端IP
- 【创龙TMS320C6748开发板试用】+ 中断学习
- 中国剩余定理