程序博客网 > php模拟get提交数据

第120讲：Hadoop的MapReduce和Yarn的配置实战详解学习笔记

来源：互联网发布：php模拟get提交数据编辑：程序博客网时间：2024/06/05 00:28

第120讲：Hadoop的MapReduce和Yarn的配置实战详解学习笔记

本讲主要讲解MapReduce和Yarn的配置方法

核心配置有两个：mapreduce-site.xml和yarn-site.xml

1.MapReduce配置：

Parameter

Value

Notes

mapreduce.framework.name

yarn

Execution framework set to Hadoop YARN.

mapreduce.map.memory.mb

1536

Larger resource limit for maps.

mapreduce.map.java.opts

-Xmx1024M

Larger heap-size for child jvms of maps.

mapreduce.reduce.memory.mb

3072

Larger resource limit for reduces.

mapreduce.reduce.java.opts

-Xmx2560M

Larger heap-size for child jvms of reduces.

mapreduce.task.io.sort.mb

512

Higher memory-limit while sorting data for efficiency.

mapreduce.task.io.sort.factor

100

More streams merged at once while sorting files.

mapreduce.reduce.shuffle.parallelcopies

50

Higher number of parallel copies run by reduces to fetch outputs from very large number of maps.

· Configurations for MapReduce JobHistory Server:

Parameter

Value

Notes

mapreduce.jobhistory.address

MapReduce JobHistory Serverhost:port

Default port is 10020.

mapreduce.jobhistory.webapp.address

MapReduce JobHistory Server Web UIhost:port

Default port is 19888.

mapreduce.jobhistory.intermediate-done-dir

/mr-history/tmp

Directory where history files are written by MapReduce jobs.

mapreduce.jobhistory.done-dir

/mr-history/done

Directory where history files are managed by the MR JobHistory Server.

可以通过端口19888访问jobhistory。

mapreduce-site.xml最小配置：

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

2.Yarn配置：

etc/hadoop/yarn-site.xml

Configurations for ResourceManager and NodeManager:

Parameter

Value

Notes

yarn.acl.enable

true /false

Enable ACLs? Defaults to false.

yarn.admin.acl

Admin ACL

ACL to set admins on the cluster. ACLs are of for comma-separated-usersspacecomma-separated-groups. Defaults to special value of * which means anyone. Special value of just space means no one has access.

yarn.log-aggregation-enable

false

Configuration to enable or disable log aggregation

· Configurations for ResourceManager:

Parameter

Value

Notes

yarn.resourcemanager.address

ResourceManager host:port for clients to submit jobs.

host:port If set, overrides the hostname set inyarn.resourcemanager.hostname.

yarn.resourcemanager.scheduler.address

ResourceManager host:port for ApplicationMasters to talk to Scheduler to obtain resources.

host:port If set, overrides the hostname set inyarn.resourcemanager.hostname.

yarn.resourcemanager.resource-tracker.address

ResourceManager host:port for NodeManagers.

host:port If set, overrides the hostname set inyarn.resourcemanager.hostname.

yarn.resourcemanager.admin.address

ResourceManager host:port for administrative commands.

host:port If set, overrides the hostname set inyarn.resourcemanager.hostname.

yarn.resourcemanager.webapp.address

ResourceManager web-ui host:port.

host:port If set, overrides the hostname set inyarn.resourcemanager.hostname.

yarn.resourcemanager.hostname

ResourceManager host.

host Single hostname that can be set in place of setting allyarn.resourcemanager*address resources. Results in default ports for ResourceManager components.

yarn.resourcemanager.scheduler.class

ResourceManager Scheduler class.

CapacityScheduler (recommended), FairScheduler (also recommended), or FifoScheduler

yarn.scheduler.minimum-allocation-mb

Minimum limit of memory to allocate to each container request at the Resource Manager.

In MBs

yarn.scheduler.maximum-allocation-mb

Maximum limit of memory to allocate to each container request at the Resource Manager.

In MBs

yarn.resourcemanager.nodes.include-path /yarn.resourcemanager.nodes.exclude-path

List of permitted/excluded NodeManagers.

If necessary, use these files to control the list of allowable NodeManagers.

· Configurations for NodeManager:

Parameter

Value

Notes

yarn.nodemanager.resource.memory-mb

Resource i.e. available physical memory, in MB, for given NodeManager

Defines total available resources on the NodeManager to be made available to running containers

yarn.nodemanager.vmem-pmem-ratio

Maximum ratio by which virtual memory usage of tasks may exceed physical memory

The virtual memory usage of each task may exceed its physical memory limit by this ratio. The total amount of virtual memory used by tasks on the NodeManager may exceed its physical memory usage by this ratio.

yarn.nodemanager.local-dirs

Comma-separated list of paths on the local filesystem where intermediate data is written.

Multiple paths help spread disk i/o.

yarn.nodemanager.log-dirs

Comma-separated list of paths on the local filesystem where logs are written.

Multiple paths help spread disk i/o.

yarn.nodemanager.log.retain-seconds

10800

Default time (in seconds) to retain log files on the NodeManager Only applicable if log-aggregation is disabled.

yarn.nodemanager.remote-app-log-dir

/logs

HDFS directory where the application logs are moved on application completion. Need to set appropriate permissions. Only applicable if log-aggregation is enabled.

yarn.nodemanager.remote-app-log-dir-suffix

logs

Suffix appended to the remote log dir. Logs will be aggregated to ${yarn.nodemanager.remote-app-log-dir}/${user}/${thisParam} Only applicable if log-aggregation is enabled.

yarn.nodemanager.aux-services

mapreduce_shuffle

Shuffle service that needs to be set for Map Reduce applications.

· Configurations for History Server (Needs to be moved elsewhere):

Parameter

Value

Notes

yarn.log-aggregation.retain-seconds

-1

How long to keep aggregation logs before deleting them. -1 disables. Be careful, set this too small and you will spam the name node.

yarn.log-aggregation.retain-check-interval-seconds

-1

Time between checks for aggregated log retention. If set to 0 or a negative value then the value is computed as one-tenth of the aggregated log retention time. Be careful, set this too small and you will spam the name node.

mapreduce运行在yarn上，需要指定mapreduce_shuffle句柄。

yarn是一个通用的资源管理框架，mapreduce需要显式指定，因为yarn上可能运行很多其他框架。

yarn.nodemanager.aux-servicesmapreduce_shuffle Shuffle service that needs to be set for Map Reduce applications.

最小化配置方式：(yarn-site.xml)

<property>

<name>yarn.resourcemanager.hostname</name>

<value>Master</value>

</property>

<property>

<name>yarn.nodemanager.aux_services</name>

<value>mapreduce_shuffle</value>

</property>

以上内容是从王家林老师DT大数据课程第120讲的学习笔记。
DT大数据微信公众账号：DT_Spark

王家林老师QQ:1740415547

王家林老师微信号：18610086859
For free, For everyone, Forever, For Love! DT大数据梦工厂陆续发布的大数据纯实战视频全面彻底的涵盖大数据领域所有的最具有价值的技术，包括但不限于Scala、Akka、Kafka、NIO/Netty、Hadoop、Hive、HBase、Canssandra、Spark、Flink、R语言、机器学习等，如果您在学习DT大数据梦工厂免费视频的过程发现视频确实不错，可以通过王家林老师的微信号18610086859捐助大数据系列免费实战课程，以支持1000讲大数据实战视频在2016年5月1日前的顺利完满录制！

目前已经发布的王家林免费大数据和云计算视频全系列如下：
1，《Hadoop深入浅出实战经典》http://pan.baidu.com/s/1mgpfRPu
2，《Spark纯实战公益大讲坛》http://pan.baidu.com/s/1jGpNGwu
3，《Scala深入浅出实战经典》http://pan.baidu.com/s/1sjDWG25
4，《Docker公益大讲坛》http://pan.baidu.com/s/1kTpL8UF
5，《Spark亚太研究院Spark公益大讲堂》http://pan.baidu.com/s/1i30Ewsd
6，DT大数据梦工厂Spark、Scala、Hadoop的所有视频、PPT和代码在百度云网盘的链接：
http://pan.baidu.com/share/home?uk=4013289088#category/type=0&qq-pf-to=pcqq.group

第120讲视频网站地址：

51CTO

http://edu.51cto.com/lesson/id-77705.html

0 0

php模拟get提交数据

php模拟get提交数据

原创粉丝点击

热门问题 老师的惩罚人脸识别我在镇武司摸鱼那些年重生之率土为王我在大康的咸鱼生活盘龙之生命进化天生仙种凡人之先天五行春回大明朝姑娘不必设防，我是瞎子荨麻疹能吃山药吗山药炒黑木耳淮山是山药吗胡萝卜山药排骨汤乌鸡山药汤的做法山药和胡萝卜山药可以减肥吗山药可以炖排骨吗淮山是什么淮山的功效淮山淮山的功效与作用淮山怎么做好吃怀山淮山的做法大全家常淮山药淮山药图片淮山怎么炒淮山的做法淮山汤的做法淮山怎么炒好吃三药拔丝拔丝香蕉的做法拔丝香蕉拔丝芋头拔丝红薯拔丝芋头的做法拔丝红薯的做法拔丝鸡蛋淮山不能与什么同吃炒淮山的做法淮山不能和什么一起吃玉米胡萝卜猪骨汤孕妇能不能吃芋头妈妈我要吃烤山药什么人不能吃山药孕妇能不能吃山药淮山药和铁棍山药寻麻疹可以吃山药吗月子里可以吃山药吗