hadoop Capacity Scheduler计算能力调度器配置
来源:互联网 发布:网络调查软件 编辑:程序博客网 时间:2024/04/28 05:28
计算能力调度器介绍
Capacity Scheduler支持以下特性:
(1) 计算能力保证。支持多个队列,某个作业可被提交到某一个队列中。每个队列会配置一定比例的计算资源,且所有提交到队列中的作业共享该队列中的资源。
(2) 灵活性。空闲资源会被分配给那些未达到资源使用上限的队列,当某个未达到资源的队列需要资源时,一旦出现空闲资源资源,便会分配给他们。
(3) 支持优先级。队列支持作业优先级调度(默认是FIFO)
(4) 多重租赁。综合考虑多种约束防止单个作业、用户或者队列独占队列或者集群中的资源。
(5) 基于资源的调度。 支持资源密集型作业,允许作业使用的资源量高于默认值,进而可容纳不同资源需求的作业。不过,当前仅支持内存资源的调度。
配置方法为1. 复制$HADOOP_HOME/contrib/capacity-scheduler/hadoop-capacity-scheduler.jar 到$HADOOP_HOME/lib目录中
2. 修改namenode节点中的conf/mapred-site.xml文件
<property> <name>mapred.jobtracker.taskScheduler</name> <value>org.apache.hadoop.mapred.CapacityTaskScheduler</value> </property> <property> <name>mapred.queue.names</name> <value>default,hadoop,hive</value> </property>3. 修改conf/capacity-scheduler.xml 配置文件
<?xml version="1.0"?><!-- This is the configuration file for the resource manager in Hadoop. --><!-- You can configure various scheduling parameters related to queues. --><!-- The properties for a queue follow a naming convention,such as, --><!-- mapred.capacity-scheduler.queue.<queue-name>.property-name. --><configuration> <!-- Capacity scheduler Job Initialization configuration parameters --> <property> <name>mapred.capacity-scheduler.init-poll-interval</name> <value>5000</value> <description>The amount of time in miliseconds which is used to poll the job queues for jobs to initialize. </description> </property> <property> <name>mapred.capacity-scheduler.init-worker-threads</name> <value>5</value> <description>Number of worker threads which would be used by Initialization poller to initialize jobs in a set of queue. If number mentioned in property is equal to number of job queues then a single thread would initialize jobs in a queue. If lesser then a thread would get a set of queues assigned. If the number is greater then number of threads would be equal to number of job queues. </description> </property> <property> <name>mapred.capacity-scheduler.maximum-system-jobs</name> <value>30</value> <description>Maximum number of jobs in the system which can be initialized, concurrently, by the Capacity Scheduler. </description> </property> <!--hadoop queue--> <property> <name>mapred.capacity-scheduler.queue.hadoop.capacity</name> <value>30</value> <description>Percentage of the number of slots in the cluster that are to be available for jobs in this queue. </description> </property> <property> <name>mapred.capacity-scheduler.queue.hadoop.maximum-capacity</name> <value>-1</value> <description> </description> </property> <property> <name>mapred.capacity-scheduler.queue.hadoop.supports-priority</name> <value>true</value> <description></description> </property> <property> <name>mapred.capacity-scheduler.queue.hadoop.minimum-user-limit-percent</name> <value>100</value> <description> </description> </property> <property> <name>mapred.capacity-scheduler.queue.hadoop.user-limit-factor</name> <value>3</value> <description></description> </property> <property> <name>mapred.capacity-scheduler.queue.hadoop.maximum-initialized-active-tasks</name> <value>200000</value> <description></description> </property> <property> <name>mapred.capacity-scheduler.queue.hadoop.maximum-initialized-active-tasks-per-user</name> <value>100000</value> <description></description> </property> <property> <name>mapred.capacity-scheduler.queue.hadoop.init-accept-jobs-factor</name> <value>10</value> <description></description> </property> <property> <name>mapred.capacity-scheduler.default-maximum-initialized-jobs-per-user</name> <value>5</value> <description>The maximum number of jobs to be pre-initialized for a user of the job queue. </description> </property> <!-- hive --><property> <name>mapred.capacity-scheduler.queue.hive.capacity</name> <value>30</value> <description></description> </property> <property> <name>mapred.capacity-scheduler.queue.hive.maximum-capacity</name> <value>-1</value> <description></description> </property> <property> <name>mapred.capacity-scheduler.queue.hive.supports-priority</name> <value>true</value> <description>If true, priorities of jobs will be taken into account in scheduling decisions. </description> </property> <property> <name>mapred.capacity-scheduler.queue.hive.minimum-user-limit-percent</name> <value>100</value> <description></description> </property> <property> <name>mapred.capacity-scheduler.queue.hive.user-limit-factor</name> <value>4</value> <description>The multiple of the queue capacity which can be configured to allow a single user to acquire more slots. </description> </property> <property> <name>mapred.capacity-scheduler.queue.hive.maximum-initialized-active-tasks</name> <value>200000</value> <description></description> </property> <property> <name>mapred.capacity-scheduler.queue.hive.maximum-initialized-active-tasks-per-user</name> <value>100000</value> <description></description> </property> <property> <name>mapred.capacity-scheduler.queue.hive.init-accept-jobs-factor</name> <value>10</value> <description></description> </property><!-- default --> <property> <name>mapred.capacity-scheduler.queue.default.capacity</name> <value>40</value> <description></description> </property> <property> <name>mapred.capacity-scheduler.queue.default.maximum-capacity</name> <value>-1</value> <description></description> </property> <property> <name>mapred.capacity-scheduler.queue.default.supports-priority</name> <value>true</value> <description></description> </property> <property> <name>mapred.capacity-scheduler.queue.default.minimum-user-limit-percent</name> <value>100</value> <description></description> </property> <property> <name>mapred.capacity-scheduler.queue.default.user-limit-factor</name> <value>4</value> <description></description> </property> <property> <name>mapred.capacity-scheduler.queue.default.maximum-initialized-active-tasks</name> <value>200000</value> <description></description> </property> <property> <name>mapred.capacity-scheduler.queue.default.maximum-initialized-active-tasks-per-user</name> <value>100000</value> <description></description> </property> <property> <name>mapred.capacity-scheduler.queue.default.init-accept-jobs-factor</name> <value>10</value> <description></description> </property></configuration>
保存文件后,重启jobtracker
以后修改capacity-scheduler.xml文件后只需要执行命令hadoop mradmin -refreshQueues 就可以重新加载配置项。
4. 最后,如何使用该队列呢:
mapreduce:在Job的代码中,设置Job属于的队列,例如hive:
conf.setQueueName("hive");
hive:在执行hive任务时,设置hive属于的队列,例如hive:
set mapred.job.queue.name=hive;
设置队列的任务名称set mapred.job.name=hadooptest;
设置队列的优先级别set mapred.job.priority=HIGH;
- hadoop Capacity Scheduler计算能力调度器配置
- Hadoop应用-------Hadoop计算能力调度器(Capacity Scheduler)应用和配置{hadoop mradmin -refreshQueues动态更新队列和容量}其他调度器比较
- hadoop2.*能力调度器capacity-scheduler
- Hadoop系列(5)之容量调度器Capacity Scheduler配置
- hadoop Capacity Scheduler调度器使用体验
- hadoop Capacity Scheduler 完整配置
- hadoop scheduler.capacity queues 配置
- yarn 调度器 resourcemanager 的 Capacity Scheduler 部分配置说明
- Yarn scheduler Capacity调度器概念以及配置
- hadoop计算能力调度器
- Hadoop计算能力调度器应用和配置
- Hadoop计算能力调度器应用和配置
- Hadoop计算能力调度器应用和配置
- Hadoop计算能力调度器应用和配置
- Hadoop 设置队列计算能力调度器应用和配置
- Hadoop Capacity Scheduler配置与使用
- Hadoop Capacity Scheduler配置使用记录
- Hadoop Capacity Scheduler配置使用记录
- 百度的金山上
- linux中select()函数分析
- 饭工和碗工
- JSuggest自动匹配下拉框使用方法
- select like 适用范围
- hadoop Capacity Scheduler计算能力调度器配置
- eclipse load pom error
- Javascript中String和StringBuffer的速度之争——dream参考之三
- C++xml操作之三---CMarkUp
- android vold浅析(1)
- ABAP DESCRIBE TABLE 用法(计算内表行数)
- MIF 百科(http://baike.baidu.com/view/2877561.htm)
- 注册表打开,注册表修改ie查看源文件默认工具
- 饭工和碗工