spark RDD的元素顺序(ordering)测试
来源:互联网 发布:常见网络协议 编辑:程序博客网 时间:2024/06/10 14:44
通过实验发现:
foreach()遍历的顺序是乱的
但:
collect()取到的结果是依照原顺序的
take()取到的结果是依照原顺序的
为什么呢????
另外,可以发现:
take()取到了指定数目的元素,就不再多取了
scala> val rdd = sc.makeRDD((0 to 9), 4)scala> rdd.collectres27: Array[Int] = Array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)scala> rdd.partitionsres13: Array[org.apache.spark.Partition] = Array(org.apache.spark.rdd.ParallelCollectionPartition@691, org.apache.spark.rdd.ParallelCollectionPartition@692, org.apache.spark.rdd.ParallelCollectionPartition@693, org.apache.spark.rdd.ParallelCollectionPartition@694)scala> rdd.foreach(print(_))0178923456scala> rdd.foreach(print(_))5623401789scala> rdd.coalesce(1, false).foreach(print _)0123456789scala> rdd.coalesce(1, false).partitionsres28: Array[org.apache.spark.Partition] = Array(CoalescedRDDPartition(0,ParallelCollectionRDD[0] at makeRDD at <console>:21,[I@63a3554,None))scala> rdd.foreachPartition((x:Iterator[Int])=>println(x.next))2057scala> rdd.mapPartitions((x:Iterator[Int])=>Array(x.next()).iterator).collectres4: Array[Int] = Array(0, 2, 5, 7)scala> rdd.keyBy((x:Int)=>x/4).collectres27: Array[(Int, Int)] = Array((0,0), (0,1), (0,2), (0,3), (1,4), (1,5), (1,6), (1,7), (2,8), (2,9))scala> rdd.groupBy(_/4).collectres7: Array[(Int, Iterable[Int])] = Array((0,CompactBuffer(0, 1, 2, 3)), (1,CompactBuffer(4, 5, 6, 7)), (2,CompactBuffer(8, 9)))scala> val jr = rdd.toJavaRDDjr: org.apache.spark.api.java.JavaRDD[Int] = ParallelCollectionRDD[0] at makeRDD at <console>:21scala> jr.collectPartitions(Array(0,1))res20: Array[java.util.List[Int]] = Array([0, 1], [2, 3, 4])
implicit object StringAccumulator extends org.apache.spark.AccumulatorParam[String]{def addInPlace(r1: String, r2: String) = r1 + "," + r2def zero(initialValue: String) = ""}scala> val a = sc.accumulator("")a: org.apache.spark.Accumulator[String] = scala> sc.parallelize(0 to 1000, 99).flatMap((i:Int)=>{a+="f1-"+i; (i*2 to i*2 + 1)}).flatMap((i:Int)=>{a+="f2-"+i; (i*2 to i*2 + 1)}).take(10)res2: Array[Int] = Array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)scala> ares3: org.apache.spark.Accumulator[String] = ,,f1-0,f2-0,f2-1,f1-1,f2-2,f2-3,f1-2,f2-4
0 0
- spark RDD的元素顺序(ordering)测试
- spark filter过滤rdd元素
- spark源码阅读笔记RDD(一)RDD的基本概念
- spark RDD 小实验 测试
- spark(RDD之间的基本转换)
- Spark的RDD详解(源码)
- spark RDD的理解
- 理解Spark的RDD
- spark RDD的原理
- Spark RDD的转换
- Spark RDD的动作
- spark RDD的理解
- Spark RDD的理解
- 理解Spark的RDD
- Spark RDD的转换
- spark(7)-spark RDD的创建(course15)
- Spark map 遍历rdd中的每个元素
- Spark编程、RDD 功能介绍、RDD 元素变换、RDD 元素操作
- 递归算法(recursion algorithm)
- 10.Oracle数据库SQL开发之 理解空值
- UIScrollerView 实现轮播图功能
- iOS-NSDate (Extension)
- 11.Oracle数据库SQL开发之 禁止显示重复行
- spark RDD的元素顺序(ordering)测试
- 用Python制作Powerpoint演示文稿
- Android开发面试题(二)
- iOS-自定义cell步骤总结
- OPENFILER操作
- 黑马程序员——多线程(上)
- Maven学习 (六) 搭建多模块企业级项目
- Unity 的ProtoBuff 相关批处理
- 结构体转化字符串