YARN内存管理

来源：互联网发布：如何向妈妈介绍云计算编辑：程序博客网时间：2024/04/28 21:15

   总结一：
关于内存的配置总共有以下几个方面：
以下的示例数据为gdc中的配置
（1）每个节点可用于container的内存与虚拟内存
NM的内存资源配置，主要是通过下面两个参数进行的（这两个值是Yarn平台特性，应在yarn-sit.xml中配置）：
yarn.nodemanager.resource.memory-mb 94208
yarn.nodemanager.vmem-pmem-ratio    2.1
说明：每个节点可用的最大内存，RM中的两个值不应该超过此值。此数值可以用于计算container最大数目，即：用此值除以RM中的最小容器内存。虚拟内存率，是占task所用内存的百分比，默认值为2.1倍，即每个task使用的最大虚拟内存为物理内存的2.1倍。注意：第一个参数是不可修改的，一旦设置，整个运行过程中不可动态修改，且该值的默认大小是 8G，即使计算机内存不足8G也会按着8G内存来使用。

（2）每个Map和Reduce可用的最大与最小物理内存限制；
RM的内存资源配置，主要是通过下面的两个参数进行的（这两个值是Yarn平台特性，应在yarn-sit.xml中配置好）：
yarn.scheduler.minimum-allocation-mb   2048
yarn.scheduler.maximum-allocation-mb 8192
说明：单个容器可申请的最小与最大内存，应用在运行申请内存时不能超过最大值，小于最小值则分配最小值，从这个角度看，最小值有点想操作系统中的页。最小值还有另外一种用途，计算一个节点的最大container数目注：这两个值一经设定不能动态改变(此处所说的动态改变是指应用运行时)。

（3）每个task实际使用的内存
AM内存配置相关参数，此处以MapReduce为例进行说明（这两个值是AM特性，应在mapred-site.xml中配置），如下：
mapreduce.map.memory.mb   4096
mapreduce.reduce.memory.mb   8192
说明：这两个参数指定用于MapReduce的两个任务（Map and Reduce task）的内存大小，其值应该在RM中的最大最小container之间。如果没有配置则通过如下简单公式获得：
max(MIN_CONTAINER_SIZE, (Total Available RAM) / containers))
一般的reduce应该是map的2倍。注：这两个值可以在应用启动时通过参数改变；

（4）每个task中的jvm使用的内存
AM中其它与内存相关的参数，还有JVM相关的参数，这些参数可以通过，如下选项配置：
mapreduce.map.java.opts    -Xmx3072m
mapreduce.reduce.java.opts   -Xmx6144m
说明：这两个参主要是为需要运行JVM程序（java、scala等）准备的，通过这两个设置可以向JVM中传递参数的，与内存有关的是，-Xmx，-Xms等选项。此数值大小，应该小于AM中的map.mb和reduce.mb
除了Xmx以外，永久代以及栈均需要内存。若栈中的调用过多，导致使用的内存加上jvm内存超过定义的最大内存使用量，task会被直接kill 掉。

因此，
（1）一个节点理论能运行最多的task为：
yarn.nodemanager.resource.memory-mb / yarn.scheduler.minimum-allocation-mb
（2）实际上，若全部运行map，可运行的map task数量为：
yarn.nodemanager.resource.memory-mb / mapreduce.map.memory.mb
当然，参数mapreduce.map.memory.mb可以在运行job时指定

总结二：
As a general recommendation, we’ve found that allowing for 1-2 Containers per disk and per core gives the best balance for cluster utilization. So with our example cluster node with 12 disks and 12 cores, we will allow for 20 maximum Containers to be allocated to each node.
＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝

导读：Hadoop YARN同时支持内存和CPU两种资源的调度（默认只支持内存，如果想进一步调度CPU，需要自己进行一些配置），本文将介绍YARN是如何对这些资源进行调度和隔离的。

Hadoop YARN同时支持内存和CPU两种资源的调度（默认只支持内存，如果想进一步调度CPU，需要自己进行一些配置），本文将介绍YARN是如何对这些资源进行调度和隔离的。

在YARN中，资源管理由ResourceManager和NodeManager共同完成，其中，ResourceManager中的调度器负责资源的分配，而NodeManager则负责资源的供给和隔离。ResourceManager将某个NodeManager上资源分配给任务（这就是所谓的 “资源调度”）后，NodeManager需按照要求为任务提供相应的资源，甚至保证这些资源应具有独占性，为任务运行提供基础的保证，这就是所谓的资源隔离。

在正式介绍具体的资源调度和隔离之前，先品味一下内存和CPU这两种资源的特点，这是两种性质不同的资源。内存资源的多少会会决定任务的生死，如果内存不够，任务可能会运行失败；相比之下，CPU资源则不同，它只会决定任务运行的快慢，不会对生死产生影响。

附文：http://blog.chinaunix.net/uid-28311809-id-4383551.html

在这篇博客中，主要介绍了Yarn对MRv1的改进，以及Yarn简单的内存配置和Yarn的资源抽象container。
我么知道MRv1存在的主要问题是：在运行时，JobTracker既负责资源管理又负责任务调度，这导致了它的扩展性、资源利用率低等问题。之所以存在这样的问题，是与其最初的设计有关，如下图：

从上图可以看到，MRv1是围绕着MapReduce进行，并没有过多地考虑以后出现的其它数据处理方式。按着上图的设计思路，我们每开发一种数据处理方式（例如spark），都要重复实现相应的集群资源管理和数据处理。因此，Yarn就很自然的被开发出来了。
Yarn对MRv1的最大改进就是将资源管理与任务调度分离，使得各种数据处理方式能够共享资源管理，如下图所示：

从上图我们可以看到，Yarn是一种统一资源管理方式，是从MRv1中的JobTracker分离出来的。这样的好处显而易见：资源共享，扩展性好等。
MRv1与Yarn的主要区别：在MRv1中，由JobTracker负责资源管理和作业控制，而Yarn中，JobTracker被分为两部分：ResourceManager（RM）和ApplicationMaster（AM）。如下图所示：

从上图中，我们可以清晰的看到，对于MRv1无论是资源管理里还是任务调度都是有JobTracker来完成得。这导致了，JobTracker负荷太大不便于管理和扩展而对于Yarn，我们看可以清晰地看到资源管理和任务调度被分为了两个部分：RM和AM。
Yarn与MRv1的差异对编程的影响：我们知道，MRv1主要由三部分组成：编程模型(API)、数据处理引擎(MapTask和 ReduceTask)和运行环境(JobTracker和TaskTracker);Yarn继承了MRv1的编程模型和数据处理，改变的只是运行环境，所以对编程没有什么影响。
为了更好的说明Yarn的资源管理，首先来看下Yarn的框架，如下图所示：

从上图可以看到，当客户向RM提交作业时，由AM负责向RM提出资源申请，和向NameManager（NM）提出task执行。也就是说在这个过程中，RM负责资源调度，AM 负责任务调度。几点重要说明：RM负责整个集群的资源管理与调度；Nodemanager(NM)负责单个节点的资源管理与调度；NM定时的通过心跳的形式与RM进行通信，报告节点的健康状态与内存使用情况；AM通过与RM交互获取资源，然后然后通过与NM交互，启动计算任务。
下面对上面的内容通过内存资源配置进行详细说明：下面对上面的内容通过内存资源配置进行详细说明：

RM的内存资源配置，主要是通过下面的两个参数进行的（这两个值是Yarn平台特性，应在yarn-sit.xml中配置好）：
yarn.scheduler.minimum-allocation-mb
yarn.scheduler.maximum-allocation-mb
说明：单个容器可申请的最小与最大内存，应用在运行申请内存时不能超过最大值，小于最小值则分配最小值，从这个角度看，最小值有点想操作系统中的页。最小值还有另外一种用途，计算一个节点的最大container数目注：这两个值一经设定不能动态改变(此处所说的动态改变是指应用运行时)。

NM的内存资源配置，主要是通过下面两个参数进行的（这两个值是Yarn平台特性，应在yarn-sit.xml中配置）：
yarn.nodemanager.resource.memory-mb
yarn.nodemanager.vmem-pmem-ratio
说明：每个节点可用的最大内存，RM中的两个值不应该超过此值。此数值可以用于计算container最大数目，即：用此值除以RM中的最小容器内存。虚拟内存率，是占task所用内存的百分比，默认值为2.1倍;注意：第一个参数是不可修改的，一旦设置，整个运行过程中不可动态修改，且该值的默认大小是 8G，即使计算机内存不足8G也会按着8G内存来使用。

AM内存配置相关参数，此处以MapReduce为例进行说明（这两个值是AM特性，应在mapred-site.xml中配置），如下：
mapreduce.map.memory.mb
mapreduce.reduce.memory.mb
说明：这两个参数指定用于MapReduce的两个任务（Map and Reduce task）的内存大小，其值应该在RM中的最大最小container之间。如果没有配置则通过如下简单公式获得：
max(MIN_CONTAINER_SIZE, (Total Available RAM) / containers))
一般的reduce应该是map的2倍。注：这两个值可以在应用启动时通过参数改变；

AM中其它与内存相关的参数，还有JVM相关的参数，这些参数可以通过，如下选项配置：
mapreduce.map.java.opts
mapreduce.reduce.java.opts
说明：这两个参主要是为需要运行JVM程序（java、scala等）准备的，通过这两个设置可以向JVM中传递参数的，与内存有关的是，-Xmx，-Xms等选项。此数值大小，应该在AM中的map.mb和reduce.mb之间。

我们对上面的内容进行下总结，当配置Yarn内存的时候主要是配置如下三个方面：每个Map和Reduce可用物理内存限制；对于每个任务的JVM对大小的限制；虚拟内存的限制；

下面通过一个具体错误实例，进行内存相关说明，错误如下：
Container[pid=41884,containerID=container_1405950053048_0016_01_000284] is running beyond virtual memory limits. Current usage: 314.6 MB of 2.9 GB physical memory used; 8.7 GB of 6.2 GB virtual memory used. Killing container.
配置如下：

点击(此处)折叠或打开

<property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>100000</value>
    </property>
    <property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>10000</value>
    </property>
    <property>
        <name>yarn.scheduler.minimum-allocation-mb</name>
        <value>3000</value>
    </property>
   <property>
        <name>mapreduce.reduce.memory.mb</name>
        <value>2000</value>
    </property>
通过配置我们看到，容器的最小内存和最大内存分别为：3000m和10000m，而reduce设置的默认值小于2000m，map没有设置，所以两个值均为3000m，也就是log中的“2.9 GB physical
memory used”。而由于使用了默认虚拟内存率(也就是2.1倍)，所以对于Map Task和Reduce Task总的虚拟内存为都为3000*2.1=6.2G。而应用的虚拟内存超过了这个数值，故报错。解决办
法：在启动Yarn是调节虚拟内存率或者应用运行时调节内存大小。

在上Yarn的框架管理中，无论是AM从RM申请资源，还是NM管理自己所在节点的资源，都是通过container进行的。Container是Yarn的资源抽象，此处的资源包括内存和cup等。下面对
container，进行比较详细的介绍。为了是大家对container有个比较形象的认识，首先看下图：

从上图中我们可以看到，首先AM通过请求包ResourceRequest从RM申请资源，当获取到资源后，AM对其进行封装，封装成ContainerLaunchContext对象，通过这个对象，AM与NM进行通讯，
以便启动该任务。下面通过ResourceRequest、container和ContainerLaunchContext的protocol buffs定义，对其进行具体分析。

ResourceRequest结构如下：
点击(此处)折叠或打开

message ResourceRequestProto {
optional PriorityProto priority = 1; // 资源优先级
optional string resource_name = 2; // 期望资源所在的host
optional ResourceProto capability = 3; // 资源量（mem、cpu）
optional int32 num_containers = 4; // 满足条件container个数
optional bool relax_locality = 5 ; //default = true;
}
对上面结构进行简要按序号说明：
2：在提交申请时，期望从哪台主机上获得，但最终还是AM与RM协商决定；
3：只包含两种资源，即：内存和cpu，申请方式：<memory_num,cup_num>
注：1、由于2与4并没有限制资源申请量，则AP在资源申请上是无限的。2、Yarn采用覆盖式资源申请方式，即：AM每次发出的资源请求会覆盖掉之前在同一节点且优先级相同的资源请求,
也就是说同一节点中相同优先级的资源请求只能有一个。

container结构：
点击(此处)折叠或打开

message ContainerProto {
optional ContainerIdProto id = 1; //container id
optional NodeIdProto nodeId = 2; //container（资源）所在节点
optional string node_http_address = 3;
optional ResourceProto resource = 4; //分配的container数量
optional PriorityProto priority = 5; //container的优先级
optional hadoop.common.TokenProto container_token = 6; //container token，用于安全认证
}
注：每个container一般可以运行一个任务，当AM收到多个container时，将进一步分给某个人物。如：MapReduce

ContainerLaunchContext结构：

点击(此处)折叠或打开

message ContainerLaunchContextProto {
repeated StringLocalResourceMapProto localResources = 1; //该Container运行的程序所需的在资源，例如：jar包
optional bytes tokens = 2;//Security模式下的SecurityTokens
repeated StringBytesMapProto service_data = 3;
repeated StringStringMapProto environment = 4; //Container启动所需的环境变量
repeated string command = 5; // 该Container所运行程序的命令,比如运行的为java程序,即$JAVA_HOME/bin/java org.ourclassrepeated ApplicationACLMapProto application_ACLs = 6;//该Container所属的Application的访问
控制列表
}
下面结合一段代码，仅以ContainerLaunchContext为例进行描述(本应该写个简单的有限状态机的，便于大家理解，但时间不怎么充分)：

点击(此处)折叠或打开

申请一个新的ContainerLaunchContext：
ContainerLaunchContext ctx = Records.newRecord(ContainerLaunchContext.class);
          填写必要的信息：
ctx.setEnvironment(...);
childRsrc.setResource(...);
ctx.setLocalResources(...);
ctx.setCommands(...);
启动任务：
startReq.setContainerLaunchContext(ctx);

最后对container进行如下总结：container是Yarn的资源抽象，封装了节点上的一些资源，主要是CPU与内存；container是AM向NM申请的，其运行是由AM向资源所在NM发起的，并最终运行
的。有两类container：一类是AM运行需要的container；另一类是AP为执行任务向RM申请的。
本文出自： http://blog.chinaunix.net/uid/28311809/abstract/1.html

另可参考：

By Rohit Bakhshi
on
September 10th, 2013
Share on facebook Share on twitter Share on linkedin Share on google_plusone_share Share on reddit Share on hackernews
As part of HDP 2.0 Beta, YARN takes the resource management capabilities that were in MapReduce and packages them so they can be used by new engines. This also streamlines MapReduce to do what it does best, process data. With YARN, you can now run multiple applications in Hadoop, all sharing a common resource management.

In this blog post we’ll walk through how to plan for and configure processing capacity in your enterprise HDP 2.0 cluster deployment. This will cover YARN and MapReduce 2. We’ll use an example physical cluster of slave nodes each with 48 GB ram, 12 disks and 2 hex core CPUs (12 total cores).

yarnYARN takes into account all the available compute resources on each machine in the cluster. Based on the available resources, YARN will negotiate resource requests from applications (such as MapReduce) running in the cluster. YARN then provides processing capacity to each application by allocating Containers. A Container is the basic unit of processing capacity in YARN, and is an encapsulation of resource elements (memory, cpu etc.).

Configuring YARN

In a Hadoop cluster, it’s vital to balance the usage of RAM, CPU and disk so that processing is not constrained by any one of these cluster resources. As a general recommendation, we’ve found that allowing for 1-2 Containers per disk and per core gives the best balance for cluster utilization. So with our example cluster node with 12 disks and 12 cores, we will allow for 20 maximum Containers to be allocated to each node.

Each machine in our cluster has 48 GB of RAM. Some of this RAM should be reserved for Operating System usage. On each node, we’ll assign 40 GB RAM for YARN to use and keep 8 GB for the Operating System. The following property sets the maximum memory YARN can utilize on the node:

In yarn-site.xml

1
2
   yarn.nodemanager.resource.memory-mb
    40960
The next step is to provide YARN guidance on how to break up the total resources available into Containers. You do this by specifying the minimum unit of RAM to allocate for a Container. We want to allow for a maximum of 20 Containers, and thus need (40 GB total RAM) / (20 # of Containers) = 2 GB minimum per container:

In yarn-site.xml

1
2
yarn.scheduler.minimum-allocation-mb
2048
YARN will allocate Containers with RAM amounts greater than the yarn.scheduler.minimum-allocation-mb.

Configuring MapReduce 2

MapReduce 2 runs on top of YARN and utilizes YARN Containers to schedule and execute its map and reduce tasks.

When configuring MapReduce 2 resource utilization on YARN, there are three aspects to consider:

Physical RAM limit for each Map And Reduce task
The JVM heap size limit for each task
The amount of virtual memory each task will get
You can define how much maximum memory each Map and Reduce task will take. Since each Map and each Reduce will run in a separate Container, these maximum memory settings should be at least equal to or more than the YARN minimum Container allocation.

For our example cluster, we have the minimum RAM for a Container (yarn.scheduler.minimum-allocation-mb) = 2 GB. We’ll thus assign 4 GB for Map task Containers, and 8 GB for Reduce tasks Containers.

In mapred-site.xml:

1
2
3
4
mapreduce.map.memory.mb
4096
mapreduce.reduce.memory.mb
8192
Each Container will run JVMs for the Map and Reduce tasks. The JVM heap size should be set to lower than the Map and Reduce memory defined above, so that they are within the bounds of the Container memory allocated by YARN.

In mapred-site.xml:

1
2
3
4
mapreduce.map.java.opts
-Xmx3072m
mapreduce.reduce.java.opts
-Xmx6144m
The above settings configure the upper limit of the physical RAM that Map and Reduce tasks will use. The virtual memory (physical + paged memory) upper limit for each Map and Reduce task is determined by the virtual memory ratio each YARN Container is allowed. This is set by the following configuration, and the default value is 2.1:

In yarn-site.xml:

1
2
yarn.nodemanager.vmem-pmem-ratio
2.1
Thus, with the above settings on our example cluster, each Map task will get the following memory allocations with the following:

Total physical RAM allocated = 4 GB
JVM heap space upper limit within the Map task Container = 3 GB
Virtual memory upper limit = 4*2.1 = 8.2 GB
With YARN and MapReduce 2, there are no longer pre-configured static slots for Map and Reduce tasks. The entire cluster is available for dynamic resource allocation of Maps and Reduces as needed by the job. In our example cluster, with the above configurations, YARN will be able to allocate on each node up to 10 mappers (40/4) or 5 reducers (40/8) or a permutation within that.

Next Steps

With HDP 2.0 Beta, you can use Apache Ambari to configure YARN and MapReduce 2. Download HDP 2.0 Beta and deploy today!

Get the latest updates on our Blogs
Share on :
Share on facebook Share on twitter Share on linkedin Share on google_plusone_share Share on reddit Share on hackernews
Categorized by :
YARN

0 0