Spark算子[07]:reduce,reduceByKey,count,countByKey
来源:互联网 发布:php时间戳转换年月日 编辑:程序博客网 时间:2024/06/05 06:32
算子 reduce,reduceByKey,count,countByKey 可分为两类:
action操作:reduce,count,countByKey
transformation操作:reduceByKey
1、reduce
reduce(func) 是对JavaRDD的操作
使用函数func聚合rdd的元素(它需要两个参数并返回一个参数)。这个函数应该是可交换的,并且是相关联的,这样它就可以并行地计算出来。
scala版本
val rdd1 = sc.parallelize(List("a","b","b","c")) scala> val res = rdd1.reduce(_+"-"+_)res: String = b-c-a-b
java版本
JavaRDD<String> rdd1 = sc.parallelize(Arrays.asList("a", "b", "b", "c"));String res = rdd1.reduce(new Function2<String, String, String>() { @Override public String call(String v1, String v2) throws Exception { return v1 +"-"+v2; }});System.out.println(res);# b-c-a-b
2、reduceByKey
reduceByKey(func, [numTasks]) 是对JavapairRDD的操作;
针对(K, V) 的rdd,使用给定的reduce函数func聚合每个K的值,返回(K, V);可通过第二个参数定义Tasks个数。
scala版本
val scoreList = Array(Tuple2("class1", 90), Tuple2("class1", 60), Tuple2("class2", 60), Tuple2("class2", 50))val scoreRdd = sc.parallelize(scoreList)val resRdd = scoreRdd.reduceByKey(_ + _)resRdd.foreach(res => println(res._1 + ":" + res._2))# -----------------class1:150class2:110
java版本
List<Tuple2<String, Integer>> scoreList = Arrays.asList( new Tuple2<String, Integer>("class1", 90), new Tuple2<String, Integer>("class2", 60), new Tuple2<String, Integer>("class1", 60), new Tuple2<String, Integer>("class2", 50));//平行化集合 生成JavaPairRDD 此处使用的是parallelizePairsJavaPairRDD<String, Integer> scoreRdd = sc.parallelizePairs(scoreList);//JavaPairRDD<String, Integer> resRdd = scoreRdd.reduceByKey(new Function2<Integer, Integer, Integer>() { public Integer call(Integer v1, Integer v2) throws Exception { return v1 + v2; }});//打印输出resRdd.foreach(new VoidFunction<Tuple2<String, Integer>>() { public void call(Tuple2<String, Integer> tuple2) throws Exception { System.out.println(tuple2._1() + ":" + tuple2._2()); }});
3、count
count() 返回rdd中的元素个数。
scala版本
val rdd1 = sc.parallelize(List("a","b","b","c")) scala> val res = rdd1.countres: Long = 4
java版本
List<Tuple2<String, Integer>> scoreList = Arrays.asList( new Tuple2<String, Integer>("class1", 90), new Tuple2<String, Integer>("class2", 60), new Tuple2<String, Integer>("class1", 60), new Tuple2<String, Integer>("class2", 50));//平行化集合 生成JavaPairRDD 此处使用的是parallelizePairsJavaPairRDD<String, Integer> scoreRdd = sc.parallelizePairs(scoreList);Long count = scoreRdd.count();#4
4、countByKey
countByKey() 只有对类型(K,V)类型的RDDs上才可用。返回一个Map(K,Long)对每个键的计数。
scala版本
val scoreList = Array(Tuple2("class1", 90), Tuple2("class1", 60), Tuple2("class2", 60), Tuple2("class2", 50))val scoreRdd = sc.parallelize(scoreList)scala> val res = scoreRdd.countByKeyres: scala.collection.Map[String,Long] = Map(class2 -> 2, class1 -> 2)
java版本
List<Tuple2<String, Integer>> scoreList = Arrays.asList( new Tuple2<String, Integer>("class1", 90), new Tuple2<String, Integer>("class2", 60), new Tuple2<String, Integer>("class1", 60), new Tuple2<String, Integer>("class2", 50));JavaPairRDD<String, Integer> scoreRdd = sc.parallelizePairs(scoreList);Map<String,Long> res = scoreRdd.countByKey();System.out.println(res);# {class1=2, class2=2}
阅读全文
0 0
- Spark算子[07]:reduce,reduceByKey,count,countByKey
- spark--Actions算子--countByKey
- spark--transform算子--reduceByKey
- Spark算子reduceByKey深度解析
- Spark算子reduceByKey深度解析
- Spark算子:RDDAction操作–first/count/reduce/collect/collectAsMap
- spark--actions算子--reduce
- Spark reduce算子
- Spark API 之 reduce、reduceByKey 、 mapvalues
- spark--actions算子--count
- Spark算子:RDD行动Action操作(4)–countByKey、foreach
- Spark算子:RDD行动Action操作(1)–first、count、reduce、collect
- Spark算子:RDD行动Action操作(1)–first、count、reduce、collect
- Spark编程的基本的算子之:combineByKey,reduceByKey,groupByKey
- Spark中groupByKey与reduceByKey算子之间的区别
- Spark API 详解/大白话解释 之 reduce、reduceByKey
- Spark API 详解/大白话解释 之 reduce、reduceByKey
- spark RDD算子(九)之基本的Action操作 first, take, collect, count, countByValue, reduce, aggregate, fold,top
- C语言学习的第七天(续)
- salt_state
- String用法
- bean xml 属性详解
- MySQL锁(1) 一致性非锁定读和一致性锁定读
- Spark算子[07]:reduce,reduceByKey,count,countByKey
- 第十章结构和联合、十一章动态内存分配
- OC全局宏配置路径
- Spark运行架构
- CentOS7.4 用 yum安装X Window (GNome)
- 编程之美:寻找最大的K个数
- 51nod 1284(容斥)
- activity-alias 多入口配置
- Java爬虫实践