groupByKey 和reduceByKey 的区别:
来源:互联网 发布:优学派软件下载 编辑:程序博客网 时间:2024/05/16 06:07
他们都是要经过shuffle的,groupByKey在方法shuffle之间不会合并原样进行shuffle,。reduceByKey进行shuffle之前会先做合并,这样就减少了shuffle的io传送,所以效率高一点。
案例:
object GroupyKeyAndReduceByKeyDemo { def main(args: Array[String]): Unit = { Logger.getLogger("org").setLevel(Level.WARN) val config = new SparkConf().setAppName("GroupyKeyAndReduceByKeyDemo").setMaster("local") val sc = new SparkContext(config) val arr = Array("val config", "val arr") val socketDS = sc.parallelize(arr).flatMap(_.split(" ")).map((_, 1)) //groupByKey 和reduceByKey 的区别: //他们都是要经过shuffle的,groupByKey在方法shuffle之间不会合并原样进行shuffle, //reduceByKey进行shuffle之前会先做合并,这样就减少了shuffle的io传送,所以效率高一点 socketDS.groupByKey().map(tuple => (tuple._1, tuple._2.sum)).foreach(x => { println(x._1 + " " + x._2) }) println("----------------------") socketDS.reduceByKey(_ + _).foreach(x => { println(x._1 + " " + x._2) }) sc.stop() }}
阅读全文
0 0
- groupByKey 和reduceByKey 的区别:
- reduceByKey与groupByKey的区别
- reduceByKey和groupByKey区别与用法
- reduceByKey和groupByKey区别与用法
- groupByKey与reduceByKey区别
- groupByKey与reduceByKey区别
- 【Spark系列2】reduceByKey和groupByKey区别与用法
- 【转载】Spark中:reduceByKey和groupByKey区别与用法
- 【Spark系列2】reduceByKey和groupByKey区别与用法
- Spark中groupByKey与reduceByKey算子之间的区别
- groupByKey reduceByKey
- [spark]groupbykey reducebykey
- reducebykey groupbykey combinebykey
- 深入理解groupByKey、reduceByKey
- 深入理解groupByKey、reduceByKey
- 深入理解groupByKey、reduceByKey
- Spark编程的基本的算子之:combineByKey,reduceByKey,groupByKey
- [pyspark] 尽量用reduceByKey而不用groupByKey
- 字符串转换为整数,以及整数转换为字符串的函数
- Eclipse启动项目成功,IDEA报错java.lang.ClassNotFoundException: javax.servlet.Filter
- 观点|如何做好计算机视觉的研究?
- Eqs POJ
- C#NPOI读取Excel
- groupByKey 和reduceByKey 的区别:
- 算法--分治法寻找中值
- 白话debounce和throttle
- 3ds Max插件开发(一)Wizard 安装
- QT学习-1.Hello Word解析
- dumpsys 常用命令
- TCP的握手与挥手
- 话语笔录-坚持
- description