Spark RDD算子【二】coalesce 和 repartition

来源:互联网 发布:淘宝用别人信用卡支付 编辑:程序博客网 时间:2024/06/05 03:21

1.coalesce 和  repartition介绍

它们二个都是在创建好分区之后可以修改分区数量的,使用上有点区别

2.例子

 2.1 repartition


scala> val rdd1 = sc.parallelize(List(1,2,3,4,5,6,7,8,9), 2)rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[32] at parallelize at <console>:21scala> rdd1.partitions.lengthres18: Int = 2scala> val rdd2=rdd1.repartition(3)rdd2: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[36] at repartition at <console>:23scala> rdd2.partitions.lengthres19: Int = 3

2.2 coalesce 

scala> val rdd1 = sc.parallelize(List(1,2,3,4,5,6,7,8,9), 2)rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[47] at parallelize at <console>:21scala> rdd1.partitions.lengthres24: Int = 2scala> val rdd2=rdd1.coalesce(3,true) //true表示是否在shuffle阶段rdd2: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[51] at coalesce at <console>:23scala> rdd2.partitions.lengthres25: Int = 3scala> 



原创粉丝点击