之MapPartitionsWithIndexOperator
来源:互联网 发布:特征提取算法如何优化 编辑:程序博客网 时间:2024/04/20 07:36
小经典
效果:17/08/17 14:56:36 INFO TaskSchedulerImpl: Adding task set 0.0 with 3 tasks
17/08/17 14:56:36 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 4825 bytes)
17/08/17 14:56:36 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
partitionId:0value:1
partitionId:0value:2
partitionId:0value:3
17/08/17 14:56:37 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 703 bytes result sent to driver
17/08/17 14:56:37 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 4825 bytes)
17/08/17 14:56:37 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
partitionId:1value:4
partitionId:1value:5
partitionId:1value:6
17/08/17 14:56:37 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 660 bytes result sent to driver
17/08/17 14:56:37 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 183 ms on localhost (executor driver) (1/3)
17/08/17 14:56:37 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, executor driver, partition 2, PROCESS_LOCAL, 4882 bytes)
17/08/17 14:56:37 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)
partitionId:2value:7
partitionId:2value:8
partitionId:2value:9
partitionId:2value:10
17/08/17 14:56:37 INFO Executor: Finished task 2.0 in stage 0.0 (TID 2). 621 bytes result sent to driver
17/08/17 14:56:37 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 59 ms on localhost (executor driver) (2/3)
17/08/17 14:56:37 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 39 ms on localhost (executor driver) (3/3)
17/08/17 14:56:37 INFO DAGScheduler: ResultStage 0 (collect at MapPartitionsWithIndexOperator.scala:44) finished in 0.256 s
import org.apache.spark.SparkConf
import org.apache.spark.SparkContextimport scala.collection.mutable.ListBuffer
object MapPartitionsWithIndexOperator {
def main(args: Array[String]): Unit = {
/**
* 创建一个设置Spark运行参数的对象
* SparkConf对象可以设置运行模式,设置Application的名称
* 设置Application执行所需要的资源情况
*/
val conf = new SparkConf()
.setMaster("local")
.setAppName("Map_Operator")
/**
* 创建一个SparkContext的上下文对象
* SparkContext是通往集群的 唯一通道
* 负责任务分发,以及任务失败后的重试工作
*/
val sc = new SparkContext(conf)
/**
* makeRDD方法的第一个参数代表的是RDD中的 元素
* 第二个参数:RDD的分区数
* rdd[Int]
*/
val rdd = sc.makeRDD(1 to 10,3)
/**
* mapPartitions这个算子遍历的单位是partition
* 会将一个partition的数据量全部加载到一个集合里面
*/
val mapPartitionsWithIndexRDD = rdd.mapPartitionsWithIndex((index,iterator)=>{
val list = new ListBuffer[Int]()
while (iterator.hasNext) {
val num = iterator.next()
println("partitionId:" + index + "value:" + num)
list+=num
}
list.iterator
}, false).collect()
/**
* 释放资源
*/
sc.stop()
}
}
- 之MapPartitionsWithIndexOperator
- 復之之理
- 博学之,审问之,慎思之,明辨之,笃行之
- 博学之,审问之,慎思之,明辨之,笃行之
- 雪,之韵,之恋,之......
- 年终总结之天涯之无敌之言论
- 万源之源之drupal 之 drupal_flush_all_caches
- 编程之美------之数字之魅
- 之记录员
- 孔乙己之
- 浩杂收之
- “##”之作用
- vc++之
- 尽力而为之
- 怀念之。。。
- 共勉之!
- 之二
- 之三
- MySQL_整型、字符、浮点、时间类型
- Android学习参考推荐权威门户网站
- 数组随机取值,随机红包,冒泡排序
- Linux功耗管理(21)_Linux cpuidle framework(4)_menu governor
- C语言之生产者与消费者模型
- 之MapPartitionsWithIndexOperator
- Mybatis配置详解
- spring aop结合redis实现数据缓存
- HDOJ1004 Let the Balloon Rise(字符串——次数统计)
- iOS之《Effective Objective-C 2.0》读书笔记(4)
- js表单提交验证-input下显示提示信息
- ios Label上显示不同颜色文字
- mac系统 下安装pip
- echart 图例样式