配置Hadoop M/R 采用Fair Scheduler算法代替FIFO

来源:互联网 发布:淘宝手机刷到单流程图 编辑:程序博客网 时间:2024/06/11 01:58

http://www.blogjava.net/paulwong/archive/2013/01/31/394997.html

采用Cloudera版本的hadoop/hbase:

hadoop-0.20.2-cdh3u0

hbase-0.90.1-cdh3u0

zookeeper-3.3.3-cdh3u0

默认已支持FairScheduler调度算法.

只需改配置使期用FairSchedule而非默认的JobQueueTaskScheduler即可.

配置fair-scheduler.xml (/$HADOOP_HOME/conf/):

<?xml version="1.0"?>
<property>
    <name>mapred.fairscheduler.allocation.file</name>
    <value>[HADOOP_HOME]/conf/fair-scheduler.xml</value>
</property>
<allocations>
    <pool name="qiji-task-pool">
        <minMaps>5</minMaps>
        <minReduces>5</minReduces>
        <maxRunningJobs>
            <maxRunningJobs>5</maxRunningJobs>
            <minSharePreemptionTimeout>300</minSharePreemptionTimeout>
            <weight>1.0</weight>
    </pool>
    <user name="ecap">
        <maxRunningJobs>
            <maxRunningJobs>6</maxRunningJobs>
    </user>
    <poolMaxJobsDefault>10</poolMaxJobsDefault>
    <userMaxJobsDefault>8</userMaxJobsDefault>
    <defaultMinSharePreemptionTimeout>600
    </defaultMinSharePreemptionTimeout>
    <fairSharePreemptionTimeout>600</fairSharePreemptionTimeout>
</allocations>



配置$HADOOP_HOME/conf/mapred-site.xml,最后添加:

<property>
    <name>mapred.jobtracker.taskScheduler</name>
    <value>org.apache.hadoop.mapred.FairScheduler</value>
</property>
<property>
    <name>mapred.fairscheduler.allocation.file</name>
    <value>/opt/hadoop/conf/fair-scheduler.xml</value>
</property>
<property>
    <name>mapred.fairscheduler.assignmultiple</name>
    <value>true</value>
</property>
<property>
    <name>mapred.fairscheduler.sizebasedweight</name>
    <value>true</value>
</property>



然后重新运行集群,这样有几个Job(上面配置是5个并行)并行运行时,不会因为一个Job把Map/Reduce占满而使其它Job处于Pending状态.

可从: http://<masterip>:50030/scheduler查看并行运行的状态.

posted on 2013-01-31 17:30 paulwong 阅读(990) 评论(1)  编辑  收藏 所属分类:HADOOP 、云计算

Feedback

# re: 配置Hadoop M/R 采用Fair Scheduler算法代替FIFO2013-05-17 10:07Christopher

楼主你好,最近我也在配置hadoop的fair scheduler, 但是遇到了一些问题。
首先我使用的是cloudera-cdh-demo-vm-4.2.0-kvm。当我配置$HADOOP_HOME/conf/mapred-site.xml时,指定pool的分配文件,
<property>
<name>mapred.fairscheduler.allocation.file</name>
<value>/usr/lib/hadoop-0.20-mapreduce/conf/fair-scheduler.xml</value>
</property>
重启cluster之后,无法登录http://<masterip>:50030/scheduler查看并行运行的状态.
0 0
原创粉丝点击