Tuning the Cluster for MapReduce v2 (YARN)

来源：互联网发布：php找工作编辑：程序博客网时间：2024/06/10 21:45

Tuning the Cluster for MapReduce v2 (YARN)

This topic applies to YARN clusters only, and describes how to tune and optimize YARN for your cluster. It introduces the following terms:

ResourceManager: A master daemon that authorizes submitted jobs to run, assigns an ApplicationMaster to them, and enforces resource limits.
NodeManager: A worker daemon that launches ApplicationMaster and task containers.
ApplicationMaster: A supervisory task that requests the resources needed for executor tasks. An ApplicationMaster runs on a different NodeManager for each application. The ApplicationMaster requests containers, which are sized by the resources a task requires to run.
vcore: Virtual CPU core; a logical unit of processing power. In a basic case, it is equivalent to a physical CPU core or hyperthreaded virtual CPU core.
Container: A resource bucket and process space for a task. A container’s resources consist of vcores and memory.

Identifying Hardware Resources and Service Demand

Begin YARN tuning by comparing hardware resources on the worker node to the sum demand of the worker services you intend to run. First, determine how many vcores, how much memory, and how many spindles are available for Hadoop operations on each worker node. Then, estimate service demand, or the resources needed to run a YARN NodeManager and HDFS DataNode process. There may be other Hadoop services that do not subscribe to YARN, including:

Impalad
HBase RegionServer
Solr

Worker nodes also run system support services and possibly third-party monitoring or asset management services. This includes the Linux operating system.

Estimating and Configuring Resource Requirements

After identifying hardware and software services, you can estimate the CPU cores and memory each service requires. The difference between the hardware complement and this sum is the amount of resources you can assign to YARN without creating contention. Cloudera recommends starting with these estimates:

10-20% of RAM for Linux and its daemon services
At least 16 GB RAM for an Impalad process
No more than 12-16 GB RAM for an HBase RegionServer process

In addition, you must allow resources for task buffers, such as the HDFS Sort I/O buffer. For vcore demand, consider the number of concurrent processes or tasks each service runs as an initial guide. For the operating system, start with a count of two.

The following table shows example demand estimates for a worker node with 24 vcores and 256 GB of memory. Services that are not expected to run are allocated zero resources.

Table 1. Resource Demand Estimates: 24 vcores, 256 GB RAMServicevcoresMemory (MB)Operating system2 YARN NodeManager1 HDFS DataNode11,024Impala Daemon116,348HBase RegionServer00Solr Server00Cloudera Manager agent11,024Task overhead052,429YARN containers18137,830Total24262,144

You can now configure YARN to use the remaining resources for its supervisory processes and task containers. Start with the NodeManager, which has the following settings:

Table 2. NodeManager PropertiesPropertyDescriptionDefaultyarn.nodemanager.resource.cpu-vcoresNumber of virtual CPU cores that can be allocated for containers.8yarn.nodemanager.resource.memory-mbAmount of physical memory, in MB, that can be allocated for containers.8 GB

Hadoop is a disk I/O-centric platform by design. The number of independent physical drives (“spindles”) dedicated to DataNode use limits how much concurrent processing a node can sustain. As a result, the number of vcores allocated to the NodeManager should be the lesser of either:

(total vcores) – (number of vcores reserved for non-YARN use), or
2 x (number of physical disks used for DataNode storage)

The amount of RAM allotted to a NodeManager for spawning containers should be the difference between a node’s physical RAM minus all non-YARN memory demand. So yarn.nodemanager.resource.memory-mb = total memory on the node - (sum of all memory allocations to other processes such as DataNode, NodeManager, RegionServer etc.) For the example node, assuming the DataNode has 10 physical drives, the calculation is:

Table 3. NodeManager RAM CalculationPropertyValueyarn.nodemanager.resource.cpu-vcoresmin(24 – 6, 2 x 10) = 18yarn.nodemanager.resource.memory-mb137,830 MB

Sizing the ResourceManager

The ResourceManager enforces limits on YARN container resources and can reject NodeManager container requests when required. The ResourceManager has six properties to specify the minimum, maximum, and incremental allotments of vcores and memory available for a request.

Table 4. ResourceManager PropertiesPropertyDescriptionDefaultyarn.scheduler.minimum-allocation-vcoresThe smallest number of virtual CPU cores that can be requested for a container.1yarn.scheduler.maximum-allocation-vcoresThe largest number of virtual CPU cores that can be requested for a container.32yarn.scheduler.increment-allocation-vcoresIf using the Fair Scheduler, virtual core requests are rounded up to the nearest multiple of this number.1yarn.scheduler.minimum-allocation-mbThe smallest amount of physical memory, in MB, that can be requested for a container.1 GByarn.scheduler.maximum-allocation-mbThe largest amount of physical memory, in MB, that can be requested for a container.64 GByarn.scheduler.increment-allocation-mbIf you are using the Fair Scheduler, memory requests are rounded up to the nearest multiple of this number.512 MB

If a NodeManager has 50 GB or more RAM available for containers, consider increasing the minimum allocation to 2 GB. The default memory increment is 512 MB. For minimum memory of 1 GB, a container that requires 1.2 GB receives 1.5 GB. You can set maximum memory allocation equal to yarn.nodemanager.resource.memory-mb.

The default minimum and increment value for vcores is 1. Because application tasks are not commonly multithreaded, you generally do not need to change this value. The maximum value is usually equal to yarn.nodemanager.resource.cpu-vcores. Reduce this value to limit the number of containers running concurrently on one node.

The example leaves more than 50 GB RAM available for containers, which accommodates the following settings:

Table 5. ResourceManager CalculationsPropertyValueyarn.scheduler.minimum-allocation-mb2,048 MByarn.scheduler.maximum-allocation-mb137,830 MByarn.scheduler.maximum-allocation-vcores18

Configuring YARN Settings

You can change the YARN settings that control MapReduce applications. A client can override these values if required, up to the constraints enforced by the ResourceManager or NodeManager. There are nine task settings, three each for mappers, reducers, and the ApplicationMaster itself:

Table 6. Gateway/Client PropertiesPropertyDescriptionDefaultmapreduce.map.memory.mbThe amount of physical memory, in MB, allocated for each map task of a job.1 GBmapreduce.map.java.opts.max.heapThe maximum Java heap size, in bytes, of the map processes.800 MBmapreduce.map.cpu.vcoresThe number of virtual CPU cores allocated for each map task of a job.1mapreduce.reduce.memory.mbThe amount of physical memory, in MB, allocated for each reduce task of a job.1 GBmapreduce.reduce.java.opts.max.heapThe maximum Java heap size, in bytes, of the reduce processes.800 MBmapreduce.reduce.cpu.vcoresThe number of virtual CPU cores for each reduce task of a job.1yarn.app.mapreduce.am.resource.mbThe physical memory requirement, in MB, for the ApplicationMaster.1 GBApplicationMaster Java maximum heap sizeThe maximum heap size, in bytes, of the Java MapReduce ApplicationMaster. Exposed in Cloudera Manager as part of the YARN service configuration. This value is folded into the propertyyarn.app.mapreduce.am.command-opts.800 MByarn.app.mapreduce.am.resource.cpu-vcoresThe virtual CPU cores requirement for the ApplicationMaster.1

The settings for mapreduce.[map | reduce].java.opts.max.heap specify the default memory allotted for mapper and reducer heap size, respectively. The mapreduce.[map| reduce].memory.mb settings specify memory allotted their containers, and the value assigned should allow overhead beyond the task heap size. Cloudera recommends applying a factor of 1.2 to the mapreduce.[map | reduce].java.opts.max.heap setting. The optimal value depends on the actual tasks. Cloudera also recommends settingmapreduce.map.memory.mb to 1–2 GB and setting mapreduce.reduce.memory.mb to twice the mapper value. The ApplicationMaster heap size is 1 GB by default, and can be increased if your jobs contain many concurrent tasks. Using these guides, size the example worker node as follows:

Table 7. Gateway/Client CalculationsPropertyValuemapreduce.map.memory.mb2048 MBmapreduce.reduce.memory.mb4096 MBmapreduce.map.java.opts.max.heap0.8 x 2,048 = 1,638 MBmapreduce.reduce.java.opts.max.heap0.8 x 4,096 = 3,277 MB

Defining Containers

With YARN worker resources configured, you can determine how many containers best support a MapReduce application, based on job type and system resources. For example, a CPU-bound workload such as a Monte Carlo simulation requires very little data but complex, iterative processing. The ratio of concurrent containers to spindle is likely greater than for an ETL workload, which tends to be I/O-bound. For applications that use a lot of memory in the map or reduce phase, the number of containers that can be scheduled is limited by RAM available to the container and the RAM required by the task. Other applications may be limited based on vcores not in use by other YARN applications or the rules employed by dynamic resource pools (if used).

To calculate the number of containers for mappers and reducers based on actual system constraints, start with the following formulas:

Table 8. Container FormulasPropertyValuemapreduce.job.mapsMIN(yarn.nodemanager.resource.memory-mb / mapreduce.map.memory.mb, yarn.nodemanager.resource.cpu-vcores / mapreduce.map.cpu.vcores, number of physical drives x workload factor) x number of worker nodesmapreduce.job.reducesMIN(yarn.nodemanager.resource.memory-mb / mapreduce.reduce.memory.mb, yarn.nodemanager.resource.cpu-vcores / mapreduce.reduce.cpu.vcores, # of physical drives x workload factor) x # of worker nodes

The workload factor can be set to 2.0 for most workloads. Consider a higher setting for CPU-bound workloads.

Many other factors can influence the performance of a MapReduce application, including:

Configured rack awareness
Skewed or imbalanced data
Network throughput
Co-tenancy demand (other services or applications using the cluster)
Dynamic resource pooling

You may also have to maximize or minimize cluster utilization for your workload or to meet Service Level Agreements (SLAs). To find the best resource configuration for an application, try various container and gateway/client settings and record the results.

For example, the following TeraGen/TeraSort script supports throughput testing with a 10-GB data load and a loop of varying YARN container and gateway/client settings. You can observe which configuration yields the best results.

#!/bin/shHADOOP_PATH=/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreducefor i in 2 4 8 16 32 64 # Number of mapper containers to testdofor j in 2 4 8 16 32 64 # Number of reducer containers to testdofor k in 1024 2048 # Container memory for mappers/reducers to testdoMAP_MB=`echo "($k*0.8)/1" | bc` # JVM heap size for mappersRED_MB=`echo "($k*0.8)/1" | bc` # JVM heap size for reducershadoop jar $HADOOP_PATH/hadoop-examples.jar teragen-Dmapreduce.job.maps=$i -Dmapreduce.map.memory.mb=$k-Dmapreduce.map.java.opts.max.heap=$MAP_MB 100000000/results/tg-10GB-${i}-${j}-${k} 1>tera_${i}_${j}_${k}.out 2>tera_${i}_${j}_${k}.errhadoop jar $HADOOP_PATH/hadoop-examples.jar terasort-Dmapreduce.job.maps=$i -Dmapreduce.job.reduces=$j -Dmapreduce.map.memory.mb=$k-Dmapreduce.map.java.opts.max.heap=$MAP_MB -Dmapreduce.reduce.memory.mb=$k-Dmapreduce.reduce.java.opts.max.heap=$RED_MB /results/ts-10GB-${i}-${j}-${k}1>>tera_${i}_${j}_${k}.out 2>>tera_${i}_${j}_${k}.errhadoop fs -rmr -skipTrash /results/tg-10GB-${i}-${j}-${k}                     hadoop fs -rmr -skipTrash /results/ts-10GB-${i}-${j}-${k}donedonedone

http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_yarn_tuning.html

0 0