关于yarn内存的介绍,很实用给出了很好的建议
来源:互联网 发布:网络培训课件 编辑:程序博客网 时间:2024/06/05 04:15
一篇外网的文章,收藏很好
HOW TO PLAN AND CONFIGUREYARN AND MAPREDUCE 2 IN HDP 2.0
by
Rohit Bakhshi
As part of HDP 2.0 Beta, YARN takes theresource management capabilities that were in MapReduce and packages them sothey can be used by new engines. This also streamlines MapReduce to dowhat it does best, process data. With YARN, you can nowrun multiple applications in Hadoop, all sharing a common resourcemanagement.
In this blogpost we’ll walk through how to plan for and configure processing capacity in yourenterprise HDP 2.0 cluster deployment. This will cover YARN and MapReduce 2.We’ll use an example physical cluster of slave nodes each with 48 GB ram, 12disks and 2 hex core CPUs (12 total cores).
YARN takes intoaccount all the available compute resources on each machine in the cluster.Based on the available resources, YARN will negotiate resource requests fromapplications (such as MapReduce) running in the cluster. YARN then providesprocessing capacity to each application by allocating Containers. A Containeris the basic unit of processing capacity in YARN, and is an encapsulation ofresource elements (memory, cpu etc.).
CONFIGURINGYARN
In a Hadoopcluster, it’s vital to balance the usage of RAM, CPU and disk so thatprocessing is not constrained by any one of these cluster resources. As ageneral recommendation,we’vefound that allowing for 1-2 Containers per disk and per core gives the bestbalance for cluster utilization. So with our example cluster node with12 disks and 12 cores, we will allow for 20 maximum Containers to be allocatedto each node.
Each machine inour cluster has 48 GB of RAM. Some of this RAM should be reserved for OperatingSystem usage. On each node, we’ll assign 40 GB RAM for YARN to use and keep 8GB for the Operating System. The following property sets the maximum memoryYARN can utilize on the node:
In yarn-site.xml
<name>yarn.nodemanager.resource.memory-mb</name>
<value>40960</value>
The next step isto provide YARN guidance on how to break up the total resources available intoContainers. You do this by specifying the minimum unit of RAM to allocate for aContainer. We want to allow for a maximum of 20 Containers, and thus need (40GB total RAM) / (20 # of Containers) = 2 GB minimum per container:
In yarn-site.xml
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
YARN will allocateContainers with RAM amounts greater than the yarn.scheduler.minimum-allocation-mb.
CONFIGURINGMAPREDUCE 2
MapReduce 2 runson top of YARN and utilizes YARN Containers to schedule and execute its map andreduce tasks.
When configuringMapReduce 2 resource utilization on YARN, there are three aspects to consider:
1. Physical RAM limit for each Map And Reducetask
2. The JVM heap size limit for each task
3. The amount of virtual memory each task willget
You can definehow much maximum memory each Map and Reduce task will take. Since each Map andeach Reduce will run in a separate Container, these maximum memory settingsshould be at least equal to or more than the YARN minimum Container allocation.
For our examplecluster, we have the minimum RAM for a Container (yarn.scheduler.minimum-allocation-mb) = 2 GB. We’ll thusassign 4 GB for Map task Containers, and 8 GB for Reduce tasks Containers.
In mapred-site.xml:
<name>mapreduce.map.memory.mb</name>
<value>4096</value>
<name>mapreduce.reduce.memory.mb</name>
<value>8192</value>
Each Containerwill run JVMs for the Map and Reduce tasks. The JVM heap size should be set tolower than the Map and Reduce memory defined above, so that they are within thebounds of the Container memory allocated by YARN.
In mapred-site.xml:
<name>mapreduce.map.java.opts</name>
<value>-Xmx3072m</value>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx6144m</value>
The abovesettings configure the upper limit of the physical RAM that Map and Reducetasks will use. The virtual memory (physical + paged memory) upper limit foreach Map and Reduce task is determined by the virtual memory ratio each YARNContainer is allowed. This is set by the following configuration, and thedefault value is 2.1:
In yarn-site.xml:
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
Thus, with theabove settings on our example cluster, each Map task will get the followingmemory allocations with the following:
· Total physical RAM allocated = 4 GB
· JVM heap space upper limit within the Maptask Container = 3 GB
· Virtual memory upper limit = 4*2.1 = 8.2 GB
With YARN andMapReduce 2, there are no longer pre-configured static slots for Map and Reducetasks. The entire cluster is available for dynamic resource allocation of Mapsand Reduces as needed by the job. In our example cluster, with the aboveconfigurations, YARN will be able to allocate on each node up to 10 mappers(40/4) or 5 reducers (40/8) or a permutation within that.
NEXT STEPS
With HDP 2.0 Beta, you canuse Apache Ambari toconfigure YARN and MapReduce 2. Download HDP 2.0 Betaand deploytoday!
HOW TO PLAN AND CONFIGUREYARN AND MAPREDUCE 2 IN HDP 2.0
by
Rohit Bakhshi
As part of HDP 2.0 Beta, YARN takes theresource management capabilities that were in MapReduce and packages them sothey can be used by new engines. This also streamlines MapReduce to dowhat it does best, process data. With YARN, you can nowrun multiple applications in Hadoop, all sharing a common resourcemanagement.
In this blogpost we’ll walk through how to plan for and configure processing capacity in yourenterprise HDP 2.0 cluster deployment. This will cover YARN and MapReduce 2.We’ll use an example physical cluster of slave nodes each with 48 GB ram, 12disks and 2 hex core CPUs (12 total cores).
YARN takes intoaccount all the available compute resources on each machine in the cluster.Based on the available resources, YARN will negotiate resource requests fromapplications (such as MapReduce) running in the cluster. YARN then providesprocessing capacity to each application by allocating Containers. A Containeris the basic unit of processing capacity in YARN, and is an encapsulation ofresource elements (memory, cpu etc.).
CONFIGURINGYARN
In a Hadoopcluster, it’s vital to balance the usage of RAM, CPU and disk so thatprocessing is not constrained by any one of these cluster resources. As ageneral recommendation,we’vefound that allowing for 1-2 Containers per disk and per core gives the bestbalance for cluster utilization. So with our example cluster node with12 disks and 12 cores, we will allow for 20 maximum Containers to be allocatedto each node.
Each machine inour cluster has 48 GB of RAM. Some of this RAM should be reserved for OperatingSystem usage. On each node, we’ll assign 40 GB RAM for YARN to use and keep 8GB for the Operating System. The following property sets the maximum memoryYARN can utilize on the node:
In yarn-site.xml
<name>yarn.nodemanager.resource.memory-mb</name>
<value>40960</value>
The next step isto provide YARN guidance on how to break up the total resources available intoContainers. You do this by specifying the minimum unit of RAM to allocate for aContainer. We want to allow for a maximum of 20 Containers, and thus need (40GB total RAM) / (20 # of Containers) = 2 GB minimum per container:
In yarn-site.xml
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
YARN will allocateContainers with RAM amounts greater than the yarn.scheduler.minimum-allocation-mb.
CONFIGURINGMAPREDUCE 2
MapReduce 2 runson top of YARN and utilizes YARN Containers to schedule and execute its map andreduce tasks.
When configuringMapReduce 2 resource utilization on YARN, there are three aspects to consider:
1. Physical RAM limit for each Map And Reducetask
2. The JVM heap size limit for each task
3. The amount of virtual memory each task willget
You can definehow much maximum memory each Map and Reduce task will take. Since each Map andeach Reduce will run in a separate Container, these maximum memory settingsshould be at least equal to or more than the YARN minimum Container allocation.
For our examplecluster, we have the minimum RAM for a Container (yarn.scheduler.minimum-allocation-mb) = 2 GB. We’ll thusassign 4 GB for Map task Containers, and 8 GB for Reduce tasks Containers.
In mapred-site.xml:
<name>mapreduce.map.memory.mb</name>
<value>4096</value>
<name>mapreduce.reduce.memory.mb</name>
<value>8192</value>
Each Containerwill run JVMs for the Map and Reduce tasks. The JVM heap size should be set tolower than the Map and Reduce memory defined above, so that they are within thebounds of the Container memory allocated by YARN.
In mapred-site.xml:
<name>mapreduce.map.java.opts</name>
<value>-Xmx3072m</value>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx6144m</value>
The abovesettings configure the upper limit of the physical RAM that Map and Reducetasks will use. The virtual memory (physical + paged memory) upper limit foreach Map and Reduce task is determined by the virtual memory ratio each YARNContainer is allowed. This is set by the following configuration, and thedefault value is 2.1:
In yarn-site.xml:
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
Thus, with theabove settings on our example cluster, each Map task will get the followingmemory allocations with the following:
· Total physical RAM allocated = 4 GB
· JVM heap space upper limit within the Maptask Container = 3 GB
· Virtual memory upper limit = 4*2.1 = 8.2 GB
With YARN andMapReduce 2, there are no longer pre-configured static slots for Map and Reducetasks. The entire cluster is available for dynamic resource allocation of Mapsand Reduces as needed by the job. In our example cluster, with the aboveconfigurations, YARN will be able to allocate on each node up to 10 mappers(40/4) or 5 reducers (40/8) or a permutation within that.
NEXT STEPS
With HDP 2.0 Beta, you canuse Apache Ambari toconfigure YARN and MapReduce 2. Download HDP 2.0 Betaand deploytoday!
- 关于yarn内存的介绍,很实用给出了很好的建议
- 很实用的建议
- 关于修改DB2主机名,IBM800给出的建议
- 很好的创业建议
- 关于HBase很好的一篇介绍文章
- 关于JNDI的一些介绍,很好。
- 关于C++指针很好的介绍
- 看到了一篇关于堆和栈介绍很好的文章
- 关于数据库的一些问题,同时给出了一些答案
- 16 家GIS 相关企业对GIS 专业的学生给出了宝贵的建议
- 网站设计师给出的宝贵建议
- 使用procedure analyse()分析mysql给出的关于表结构的优化建议
- 关于PJSIP介绍的的一篇很好的文章
- 关于PJSIP介绍的的一篇很好的文章
- 一个很好的关于play框架的介绍的文章
- 关于PJSIP介绍的的一篇很好的文章
- 关于PJSIP介绍的的一篇很好的文章
- 关于PJSIP介绍的的一篇很好的文章
- 使用Jenkins+gitlab自动化打包
- 用指针引用二维数组元素
- 【Java基础】——线程Thread VS Runnable
- 2017程序设计竞赛
- PCL框选顶点(要删除这些点)并在另一个窗口显示剩余的点
- 关于yarn内存的介绍,很实用给出了很好的建议
- UVA 11729 STL
- 2017CVTE暑期实习生招聘面试经验
- map中结构体做关键字的注意事项
- android studio2.3.1无法运行apk
- C++简单实现几种常用的设计模式
- http控制内容-缓存控制
- 浅谈ES6的let和const的异同点
- hbase参数调整