Spark函数讲解:coalesce
来源:互联网 发布:淘宝好的第三方活动 编辑:程序博客网 时间:2024/06/16 01:45
对RDD中的分区重新进行合并。
函数原型
def coalesce(numPartitions: Int, shuffle: Boolean = false) (implicit ord: Ordering[T] = null): RDD[T]返回一个新的RDD,且该RDD的分区个数等于numPartitions个数。如果shuffle设置为true,则会进行shuffle。
实例
scala> var data = sc.parallelize(List(1,2,3,4))data: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[45] at parallelize at <console>:12scala> data.partitions.lengthres68: Int = 30scala> val result = data.coalesce(2, false)result: org.apache.spark.rdd.RDD[Int] = CoalescedRDD[57] at coalesce at <console>:14scala> result.partitions.lengthres77: Int = 2scala> result.toDebugStringres75: String = (2) CoalescedRDD[57] at coalesce at <console>:14 [] | ParallelCollectionRDD[45] at parallelize at <console>:12 []scala> val result1 = data.coalesce(2, true)result1: org.apache.spark.rdd.RDD[Int] = MappedRDD[61] at coalesce at <console>:14scala> result1.toDebugStringres76: String = (2) MappedRDD[61] at coalesce at <console>:14 [] | CoalescedRDD[60] at coalesce at <console>:14 [] | ShuffledRDD[59] at coalesce at <console>:14 [] +-(30) MapPartitionsRDD[58] at coalesce at <console>:14 [] | ParallelCollectionRDD[45] at parallelize at <console>:12 []从上面可以看出shuffle为false的时候并不进行shuffle操作;而为true的时候会进行shuffle操作。RDD.partitions.length可以获取相关RDD的分区数。
0 0
- Spark函数讲解:coalesce
- Spark函数讲解:coalesce
- Spark函数讲解:aggregate
- Spark函数讲解:cogroup
- Spark函数讲解:checkpoint
- Spark函数讲解:cartesian
- Spark函数讲解:cache
- Spark函数讲解:aggregateByKey
- Spark函数讲解:collect
- Spark函数讲解:collectAsMap
- Spark函数讲解:combineByKey
- Spark函数讲解:aggregateByKey
- Spark函数讲解:collectAsMap
- spark函数讲解:cogroup
- spark函数讲解:aggregate
- COALESCE()函数
- COALESCE函数
- COALESCE()函数
- Non Local Means-块匹配MATLAB和GPU实现
- MKL 进行矩阵向量运算
- PS基础——自由变换
- javascript语法之函数案例练习
- AngularJS jqLite详情
- Spark函数讲解:coalesce
- 九宫格算法
- 数据结构实验之栈二:一般算术表达式转换成后缀式
- 中科院NLPIR中文分词java版应用方法
- vi和vim使用(二)
- Android ListView.setEmptyView
- 9——PHP循环结构foreach用法
- 深入理解Docker Volume(二)
- 压力测试和性能测试的区别