CDH 5.3.0 一个小任务运行了12个小时的原因。

来源:互联网 发布:2016淘宝双11晚会 编辑:程序博客网 时间:2024/05/16 17:13
本来一个小任务,周末定时任务却跑了12个小时,查看日志,满屏都是:
2015-09-13 00:02:51,433 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:02015-09-13 00:02:51,433 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1 due to lack of space for maps2015-09-13 00:02:51,434 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=<memory:180224, vCores:0>2015-09-13 00:02:51,434 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 12015-09-13 00:02:52,439 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:02015-09-13 00:02:52,439 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 1 due to lack of space for maps2015-09-13 00:02:52,439 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=<memory:180224, vCores:0>2015-09-13 00:02:52,439 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 12015-09-13 00:02:53,441 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all scheduled reduces:0
分析原因可能container不足导致任务不能分配,于是查看那一段时间的vcores,mem的分配情况:内存的使用情况正常,不过vcores的使用却被沾满了。
看了跑的任务,所有的任务并不需要的那么多内存,但是有些spark-shell任务,指定了参数--num-executors   --executor-cores 参数过多,导致的vcores 一直被占用着。
-num-executors 命令行参数或者spark.executor.instances 配置项控制需要的 executor 个数。从 CDH 5.4/Spark 1.3 开始,你可以避免使用这个参数,只要你通过设置 spark.dynamicAllocation.enabled 参数打开 动态分配 。动态分配可以使的 Spark 的应用在有后续积压的在等待的 task 时请求 executor,并且在空闲时释放这些 executor
0 0
原创粉丝点击