hadoop分布式平台优化

来源：互联网发布：数控铣简单图案编程编辑：程序博客网时间：2024/05/22 20:59

Hadoop性能调优不仅是自身的调优，还应包括底层硬件、操作系统等。下面逐一介绍：

1、底层硬件

Hadoop采用的是master/slave的架构，master（resourcemanager或namenode）要维护元数据信息、调度等，任务量及重要性远大于slave，因此尽量将master高配置。

2、操作系统

1）增大最大文件描述符的数量和网络连接上限（作用明显）

当任务较多时，OS内核受到这两方面的限制。

ulimit – n 2000；限制最大可以使用 2000 个文件描述符。我的系统是1024

sysctl -a#会显示所有的kernel参数及值。sysctl -w net.core.somaxconn=500 #默认为125，应于集群的ipc.server.listen.queue.size一致

3、Hadoop（2.5.1版本）

说明：表格中的都是默认值。

core-default.xml

1）trash机制

这个是hdfs文件删除自动转移到垃圾箱的选项，值为垃圾箱文件清除时间。在linux中也经常为误删而头痛，建议开启，我的值设置为1440（即一天），单位分钟

fs.trash.interval0Number of minutes after which the checkpoint gets deleted. If zero, the trash feature is disabled. This option may be configured both on the server and the client. If trash is disabled server side then the client side configuration is checked. If trash is enabled on the server side then the value configured on the server is used and the client configuration value is ignored.

mapred-default.xml

1）tasktracker并发任务数

建议：map+reduce+1==num_cpu_cores

mapreduce.tasktracker.map.tasks.maximum2The maximum number of map tasks that will be run simultaneously by a task tracker.mapreduce.tasktracker.reduce.tasks.maximum2The maximum number of reduce tasks that will be run simultaneously by a task tracker.

2）调整心跳间隔，值可改为300。

yarn.app.mapreduce.am.scheduler.heartbeat.interval-ms1000 The interval in ms at which the MR AppMaster should send heartbeats to the ResourceManager3）启动带外心跳，值改为true

mapreduce.tasktracker.outofband.heartbeatfalseExpert: Set this to true to let the tasktracker send an out-of-band heartbeat on task-completion for better latency.4）磁盘块配置，设置多块磁盘，减少I/O压力

mapreduce.cluster.local.dir${hadoop.tmp.dir}/mapred/localThe local directory where MapReduce stores intermediate data files. May be a comma-separated list of directories on different devices in order to spread disk i/o. Directories that do not exist are ignored.5）RPC Handler数量

mapreduce.jobtracker.handler.count10The number of server threads for the JobTracker. This should be roughly 4% of the number of tasktracker nodes.6）HTTP线程数目

在shuffle阶段，reduce task通过http请求从各个tasktracker上读取map task中间结果。

mapreduce.tasktracker.http.threads40The number of worker threads that for the http server. This is used for map output fetching7）调整预读缓冲区大小

mapreduce.ifile.readahead trueConfiguration key to enable/disable IFile readahead.mapreduce.ifile.readahead.bytes4194304Configuration key to set the IFile readahead length in bytes.8）reduce stack启动时间

当集群的资源紧张时，应提高该值。

mapreduce.job.reduce.slowstart.completedmaps0.05Fraction of the number of maps in the job which should be complete before reduces are scheduled for the job.

0 0