hadoop分布式平台优化

来源:互联网 发布:现代战争5免谷歌免网络 编辑:程序博客网 时间:2024/05/22 11:43

Hadoop性能调优不仅是自身的调优,还应包括底层硬件、操作系统等。下面逐一介绍:

1、底层硬件

Hadoop采用的是master/slave的架构,master(resourcemanager或namenode)要维护元数据信息、调度等,任务量及重要性远大于slave,因此尽量将master高配置。

2、操作系统

1)增大最大文件描述符的数量和网络连接上限(作用明显)

当任务较多时,OS内核受到这两方面的限制。

ulimit – n 2000;限制最大可以使用 2000 个文件描述符。我的系统是1024
sysctl -a#会显示所有的kernel参数及值。sysctl -w net.core.somaxconn=500 #默认为125,应于集群的ipc.server.listen.queue.size一致
3、Hadoop(2.5.1版本)

mapred-default.xml:

1)tasktracker并发任务数

建议:map+reduce+1==num_cpu_cores

mapreduce.tasktracker.map.tasks.maximum2The maximum number of map tasks that will be run simultaneously by a task tracker.mapreduce.tasktracker.reduce.tasks.maximum2The maximum number of reduce tasks that will be run simultaneously by a task tracker.

2)调整心跳间隔,值可改为300。

yarn.app.mapreduce.am.scheduler.heartbeat.interval-ms1000The interval in ms at which the MR AppMaster should send heartbeats to the ResourceManager3)启动带外心跳,值改为truemapreduce.tasktracker.outofband.heartbeatfalseExpert: Set this to true to let the tasktracker send an out-of-band heartbeat on task-completion for better latency.4)磁盘块配置,设置多块磁盘,减少I/O压力mapreduce.cluster.local.dir${hadoop.tmp.dir}/mapred/localThe local directory where MapReduce stores intermediate data files. May be a comma-separated list of directories on different devices in order to spread disk i/o. Directories that do not exist are ignored.5)RPC Handler数量mapreduce.jobtracker.handler.count10The number of server threads for the JobTracker. This should be roughly 4% of the number of tasktracker nodes.6)HTTP线程数目

在shuffle阶段,reduce task通过http请求从各个tasktracker上读取map task中间结果。

mapreduce.tasktracker.http.threads40The number of worker threads that for the http server. This is used for map output fetching7)调整预读缓冲区大小mapreduce.ifile.readaheadtrueConfiguration key to enable/disable IFile readahead.mapreduce.ifile.readahead.bytes4194304Configuration key to set the IFile readahead length in bytes.8)reduce stack启动时间

当集群的资源紧张时,应提高该值。

mapreduce.job.reduce.slowstart.completedmaps0.05Fraction of the number of maps in the job which should be complete before reduces are scheduled for the job.
0 0
原创粉丝点击