Speculative Execution in Hadoop
来源:互联网 发布:linux usleep头文件 编辑:程序博客网 时间:2024/05/20 08:41
http://blog.csdn.net/macyang/article/details/7880671
所谓的推测执行,就是当所有task都开始运行之后,Job Tracker会统计所有任务的平均进度,如果某个task所在的task node机器配置比较低或者CPU load很高(原因很多),导致任务执行比总体任务的平均执行要慢,此时Job Tracker会启动一个新的任务(duplicate task),原有任务和新任务(一个task会有多个attempt同时执行)哪个先执行完就把另外一个kill掉,这也是我们经常在Job Tracker页面看到任务执行成功,但是总有些任务被kill,就是这个原因。另外,根据mapreduce job的特点,同一个task执行多次的结果是一样的,所以task只要有一次执行成功,job就是成功的,被kill的task对job的结果没有影响。
配置参数:
mapred.map.tasks.speculative.execution=true
mapred.reduce.tasks.speculative.execution=true
这两个是推测执行的配置项,当然如果你从来不关心这两个选项也没关系,它们默认值是true
而Hadoop 会根据task progress score决定是否killed一个task:
Hadoop monitors task progress using a progress score between 0 and 1.
For a map, the progress score is the fraction of input data read.
For a reduce task, the execution is divided into three phases, each of which accounts for 1/3 of the score:
• The copy phase, when the task fetches map outputs.
• The sort phase, when map outputs are sorted by key.
• The reduce phase, when a user-defined function is applied to the list of map outputs with each key.
In each phase, the score is the fraction of data processed.
For example,
• a task halfway through the copy phase has a progress score of 1 / 2 * 1 / 3 = 1 / 6
• a task halfway through the reduce phase has a progress score of 1 / 3 + 1 / 3 + 1 / 2 * 1 / 3 = 5 / 6
Hadoop looks at the average progress score of each category of tasks (maps and reduces) to define a threshold for speculative execution. When a task’s progress score is less than the average for its category by a threshold, and the task has run for a certain amount of time, it is considered slow. The scheduler also ensures that at most one speculative copy of each task is running at a time. When running multiple jobs, Hadoop uses a FIFO discipline where the earliest submitted job is asked for a task to run, then the second, etc. There is also a priority system for putting jobs into higher-priority queues.
(来源:http://adhoop.wordpress.com/2012/02/24/speculative-execution-in-hadoop/)
扩展阅读: Hadoop.The.Definitive.Guide.3rd.Edition
版权声明:本文为博主原创文章,未经博主允许不得转载。
- Speculative Execution in Hadoop
- Speculative Execution in Hadoop
- Hadoop Speculative Execution - Hadoop推测执行
- hive推测执行(speculative execution)
- Hadoop Speculative Task
- Hadoop 推测式任务 Hadoop Speculative Task
- M/R推测性的执行(Speculative execution)
- Hadoop中Speculative Task调度策略
- Hadoop中Speculative Task调度策略
- Hadoop中Speculative Task调度策略
- Hadoop 坑爹的Speculative 机制
- Hadoop中的Speculative Task调度策略
- Hadoop中Speculative Task调度策略
- M/R的数据源为HBase时须关闭Speculative Execution
- Distributed Hadoop Execution
- Exceptions in Shipping Execution
- calculating execution time in c++
- angr path in symblic execution
- iOS7时iPAD旋转的键盘需要特殊处理
- CentOS 6.6下Xen虚拟化实战
- Android 键盘显示/隐藏监听事件
- 微信企业号
- 第一章 程序入门设计
- Speculative Execution in Hadoop
- Oracle PL/SQL性能DBMS_PROFILE
- Spring常用注解汇总
- c++中dll介绍(详细)
- 2015 CCPC G题 【DFS 暴力】
- Android MusicPlayer
- 全排列(含递归和非递归的解法)
- LEETCODE-Valid Palindrome
- linux下ab网站压力测试命令,输出结果的中文注解