Hadoop - Map/Reduce 中的执行参数汇总

来源:互联网 发布:url域名 网站 ip的区别 编辑:程序博客网 时间:2024/04/29 11:55

其实,在Hadoop的doc主页中,已经对于所有的参数有了详尽的介绍,这里只是为了方便以后查找,将这部分转载过来。

-------------------------

Configuring Memory Requirements For A Job

MapReduce tasks are launched with some default memory limits that are provided by the system or by the cluster's administrators. Memory intensive jobs might need to use more than these default values. Hadoop has some configuration options that allow these to be changed. Without such modifications, memory intensive jobs could fail due toOutOfMemory errors in tasks or could get killed when the limits are enforced by the system. This section describes the various options that can be used to configure specific memory requirements.

  • mapreduce.{map|reduce}.java.opts: If the task requires more Java heap space, this option must be used. The value of this option should pass the desired heap using the JVM option -Xmx. For example, to use 1G of heap space, the option should be passed in as -Xmx1024m. Note that other JVM options are also passed using the same option. Hence, append the heap space option along with other options already configured.
  • mapreduce.{map|reduce}.ulimit: The slaves where tasks are run could be configured with a ulimit value that applies a limit to every process that is launched on the slave. If the task, or any child that the task launches (like in streaming), requires more than the configured limit, this option must be used. The value is given in kilobytes. For example, to increase the ulimit to 1G, the option should be set to 1048576. Note that this value is a per process limit. Since it applies to the JVM as well, the heap space given to the JVM through the mapreduce.{map|reduce}.java.opts should be less than the value configured for the ulimit. Otherwise the JVM will not start.
  • mapreduce.{map|reduce}.memory.mb: In some environments, administrators might have configured a total limit on the virtual memory used by the entire process tree for a task, including all processes launched recursively by the task or its children, like in streaming. More details about this can be found in the section onMonitoring Task Memory Usage in the Cluster SetUp guide. If a task requires more virtual memory for its entire tree, this option must be used. The value is given in MB. For example, to set the limit to 1G, the option should be set to 1024. Note that this value does not automatically influence the per process ulimit or heap space. Hence, you may need to set those parameters as well (as described above) in order to give your tasks the right amount of memory.
  • mapreduce.{map|reduce}.memory.physical.mb: This parameter is similar tomapreduce.{map|reduce}.memory.mb, except it specifies how much physical memory is required by a task for its entire tree of processes. The parameter is applicable if administrators have configured a total limit on the physical memory used by all MapReduce tasks.

As seen above, each of the options can be specified separately for map and reduce tasks. It is typically the case that the different types of tasks have different memory requirements. Hence different values can be set for the corresponding options.

The memory available to some parts of the framework is also configurable. In map and reduce tasks, performance may be influenced by adjusting parameters influencing the concurrency of operations and the frequency with which data will hit disk. Monitoring the filesystem counters for a job- particularly relative to byte counts from the map and into the reduce- is invaluable to the tuning of these parameters.

Note: The memory related configuration options described above are used only for configuring the launched child tasks from the tasktracker. Configuring the memory options for daemons is documented underConfiguring the Environment of the Hadoop Daemons (Cluster Setup).

Map Parameters

A record emitted from a map and its metadata will be serialized into a buffer. As described in the following options, when the record data exceed a threshold, the contents of this buffer will be sorted and written to disk in the background (a "spill") while the map continues to output records. If the remainder of the buffer fills during the spill, the map thread will block. When the map is finished, any buffered records are written to disk and all on-disk segments are merged into a single file. Minimizing the number of spills to disk can decrease map time, but a larger buffer also decreases the memory available to the mapper.

NameTypeDescriptionmapreduce.task.io.sort.mbintThe cumulative size of the serialization and accounting buffers storing records emitted from the map, in megabytes.mapreduce.map.sort.spill.percentfloatThis is the threshold for the accounting and serialization buffer. When this percentage of theio.sort.mb has filled, its contents will be spilled to disk in the background. Note that a higher value may decrease the number of- or even eliminate- merges, but will also increase the probability of the map task getting blocked. The lowest average map times are usually obtained by accurately estimating the size of the map output and preventing multiple spills.

Other notes

  • If the spill threshold is exceeded while a spill is in progress, collection will continue until the spill is finished. For example, ifmapreduce.map.sort.spill.percent is set to 0.33, and the remainder of the buffer is filled while the spill runs, the next spill will include all the collected records, or 0.66 of the buffer, and will not generate additional spills. In other words, the thresholds are defining triggers, not blocking.
  • A record larger than the serialization buffer will first trigger a spill, then be spilled to a separate file. It is undefined whether or not this record will first pass through the combiner.

Shuffle/Reduce Parameters

As described previously, each reduce fetches the output assigned to it by the Partitioner via HTTP into memory and periodically merges these outputs to disk. If intermediate compression of map outputs is turned on, each output is decompressed into memory. The following options affect the frequency of these merges to disk prior to the reduce and the memory allocated to map output during the reduce.

NameTypeDescriptionmapreduce.task.io.sort.factorintSpecifies the number of segments on disk to be merged at the same time. It limits the number of open files and compression codecs during the merge. If the number of files exceeds this limit, the merge will proceed in several passes. Though this limit also applies to the map, most jobs should be configured so that hitting this limit is unlikely there.mapreduce.reduce.merge.inmem.thresholdintThe number of sorted map outputs fetched into memory before being merged to disk. Like the spill thresholds in the preceding note, this is not defining a unit of partition, but a trigger. In practice, this is usually set very high (1000) or disabled (0), since merging in-memory segments is often less expensive than merging from disk (see notes following this table). This threshold influences only the frequency of in-memory merges during the shuffle.mapreduce.reduce.shuffle.merge.percentfloatThe memory threshold for fetched map outputs before an in-memory merge is started, expressed as a percentage of memory allocated to storing map outputs in memory. Since map outputs that can't fit in memory can be stalled, setting this high may decrease parallelism between the fetch and merge. Conversely, values as high as 1.0 have been effective for reduces whose input can fit entirely in memory. This parameter influences only the frequency of in-memory merges during the shuffle.mapreduce.reduce.shuffle.input.buffer.percentfloatThe percentage of memory- relative to the maximum heapsize as typically specified inmapreduce.reduce.java.opts- that can be allocated to storing map outputs during the shuffle. Though some memory should be set aside for the framework, in general it is advantageous to set this high enough to store large and numerous map outputs.mapreduce.reduce.input.buffer.percentfloatThe percentage of memory relative to the maximum heapsize in which map outputs may be retained during the reduce. When the reduce begins, map outputs will be merged to disk until those that remain are under the resource limit this defines. By default, all map outputs are merged to disk before the reduce begins to maximize the memory available to the reduce. For less memory-intensive reduces, this should be increased to avoid trips to disk.

Other notes

  • If a map output is larger than 25 percent of the memory allocated to copying map outputs, it will be written directly to disk without first staging through memory.
  • When running with a combiner, the reasoning about high merge thresholds and large buffers may not hold. For merges started before all map outputs have been fetched, the combiner is run while spilling to disk. In some cases, one can obtain better reduce times by spending resources combining map outputs- making disk spills small and parallelizing spilling and fetching- rather than aggressively increasing buffer sizes.
  • When merging in-memory map outputs to disk to begin the reduce, if an intermediate merge is necessary because there are segments to spill and at leastmapreduce.task.io.sort.factor segments already on disk, the in-memory map outputs will be part of the intermediate merge.

Directory Structure

The task tracker has local directory, ${mapreduce.cluster.local.dir}/taskTracker/ to create localized cache and localized job. It can define multiple local directories (spanning multiple disks) and then each filename is assigned to a semi-random local directory. When the job starts, task tracker creates a localized job directory relative to the local directory specified in the configuration. Thus the task tracker directory structure looks as following:

  • ${mapreduce.cluster.local.dir}/taskTracker/distcache/ : The public distributed cache for the jobs of all users. This directory holds the localized public distributed cache. Thus localized public distributed cache is shared among all the tasks and jobs of all users.
  • ${mapreduce.cluster.local.dir}/taskTracker/$user/distcache/: The private distributed cache for the jobs of the specific user. This directory holds the localized private distributed cache. Thus localized private distributed cache is shared among all the tasks and jobs of the specific user only. It is not accessible to jobs of other users.
  • ${mapreduce.cluster.local.dir}/taskTracker/$user/jobcache/$jobid/: The localized job directory
    • ${mapreduce.cluster.local.dir}/taskTracker/$user/jobcache/$jobid/work/: The job-specific shared directory. The tasks can use this space as scratch space and share files among them. This directory is exposed to the users through the configuration propertymapreduce.job.local.dir. It is available as System property also. So, users (streaming etc.) can callSystem.getProperty("mapreduce.job.local.dir") to access the directory.
    • ${mapreduce.cluster.local.dir}/taskTracker/$user/jobcache/$jobid/jars/: The jars directory, which has the job jar file and expanded jar. The job.jar is the application's jar file that is automatically distributed to each machine. Any library jars that are dependencies of the application code may be packaged inside this jar in alib/ directory. This directory is extracted from job.jar and its contents are automatically added to the classpath for each task. The job.jar location is accessible to the application through the APIJob.getJar() . To access the unjarred directory, Job.getJar().getParent() can be called.
    • ${mapreduce.cluster.local.dir}/taskTracker/$user/jobcache/$jobid/job.xml: The job.xml file, the generic job configuration, localized for the job.
    • ${mapreduce.cluster.local.dir}/taskTracker/$user/jobcache/$jobid/$taskid: The task directory for each task attempt. Each task directory again has the following structure :
      • ${mapreduce.cluster.local.dir}/taskTracker/$user/jobcache/$jobid/$taskid/job.xml: A job.xml file, task localized job configuration, Task localization means that properties have been set that are specific to this particular task within the job. The properties localized for each task are described below.
      • ${mapreduce.cluster.local.dir}/taskTracker/$user/jobcache/$jobid/$taskid/output: A directory for intermediate output files. This contains the temporary map reduce data generated by the framework such as map output files etc.
      • ${mapreduce.cluster.local.dir}/taskTracker/$user/jobcache/$jobid/$taskid/work: The curernt working directory of the task. With jvm reuse enabled for tasks, this directory will be the directory on which the jvm has started
      • ${mapreduce.cluster.local.dir}/taskTracker/$user/jobcache/$jobid/$taskid/work/tmp: The temporary directory for the task. (User can specify the property mapreduce.task.tmp.dir to set the value of temporary directory for map and reduce tasks. This defaults to./tmp. If the value is not an absolute path, it is prepended with task's working directory. Otherwise, it is directly assigned. The directory will be created if it doesn't exist. Then, the child java tasks are executed with option-Djava.io.tmpdir='the absolute path of the tmp dir'. Pipes and streaming are set with environment variable,TMPDIR='the absolute path of the tmp dir'). This directory is created, ifmapreduce.task.tmp.dir has the value ./tmp

Task JVM Reuse

Jobs can enable task JVMs to be reused by specifying the job configuration mapreduce.job.jvm.numtasks. If the value is 1 (the default), then JVMs are not reused (i.e. 1 task per JVM). If it is -1, there is no limit to the number of tasks a JVM can run (of the same job). One can also specify some value greater than 1 using the api Job.getConfiguration().setInt(Job.JVM_NUM_TASKS_TO_RUN, int).

Configured Parameters

The following properties are localized in the job configuration for each task's execution:

NameTypeDescriptionmapreduce.job.idStringThe job idmapreduce.job.jarStringjob.jar location in job directorymapreduce.job.local.dirStringThe job specific shared scratch spacemapreduce.task.idStringThe task idmapreduce.task.attempt.idStringThe task attempt idmapreduce.task.ismapbooleanIs this a map taskmapreduce.task.partitionintThe id of the task within the jobmapreduce.map.input.fileStringThe filename that the map is reading frommapreduce.map.input.startlongThe offset of the start of the map input splitmapreduce.map.input.lengthlongThe number of bytes in the map input splitmapreduce.task.output.dirStringThe task's temporary output directory

Note: During the execution of a streaming job, the names of the "mapred" parameters are transformed. The dots ( . ) become underscores ( _ ). For example, mapreduce.job.id becomes mapreduce.job.id and mapreduce.job.jar becomes mapreduce.job.jar. To get the values in a streaming job's mapper/reducer use the parameter names with the underscores.