Spark的join与cogroup简单示例
来源:互联网 发布:多媒体数据挖掘 编辑:程序博客网 时间:2024/06/05 04:29
1.join
join就是把两个集合根据key,进行内容聚合;
元组集合A:(1,"Spark"),(2,"Tachyon"),(3,"Hadoop")
元组集合B:(1,100),(2,95),(3,65)
A join B的结果:(1,("Spark",100)),(3,("hadoop",65)),(2,("Tachyon",95))2.cogroup
cogroup就是:
有两个元组Tuple的集合A与B,先对A组集合中key相同的value进行聚合,
然后对B组集合中key相同的value进行聚合,之后对A组与B组进行"join"操作;
示例代码:
public class CoGroup {public static void main(String[] args) {SparkConf conf=new SparkConf().setAppName("spark WordCount!").setMaster("local");JavaSparkContext sContext=new JavaSparkContext(conf);List<Tuple2<Integer,String>> namesList=Arrays.asList(new Tuple2<Integer, String>(1,"Spark"),new Tuple2<Integer, String>(3,"Tachyon"),new Tuple2<Integer, String>(4,"Sqoop"),new Tuple2<Integer, String>(2,"Hadoop"),new Tuple2<Integer, String>(2,"Hadoop2"));List<Tuple2<Integer,Integer>> scoresList=Arrays.asList(new Tuple2<Integer, Integer>(1,100),new Tuple2<Integer, Integer>(3,70),new Tuple2<Integer, Integer>(3,77),new Tuple2<Integer, Integer>(2,90),new Tuple2<Integer, Integer>(2,80));JavaPairRDD<Integer, String> names=sContext.parallelizePairs(namesList);JavaPairRDD<Integer, Integer> scores=sContext.parallelizePairs(scoresList);/** * <Integer> JavaPairRDD<Integer, Tuple2<Iterable<String>, Iterable<Integer>>> * org.apache.spark.api.java.JavaPairRDD.cogroup(JavaPairRDD<Integer, Integer> other) */JavaPairRDD<Integer, Tuple2<Iterable<String>, Iterable<Integer>>> nameScores=names.cogroup(scores);nameScores.foreach(new VoidFunction<Tuple2<Integer, Tuple2<Iterable<String>, Iterable<Integer>>>>() {private static final long serialVersionUID = 1L;int i=1;@Overridepublic void call(Tuple2<Integer, Tuple2<Iterable<String>, Iterable<Integer>>> t)throws Exception {String string="ID:"+t._1+" , "+"Name:"+t._2._1+" , "+"Score:"+t._2._2;string+=" count:"+i;System.out.println(string);i++;}});sContext.close();}}示例结果:
ID:4 , Name:[Sqoop] , Score:[] count:1ID:1 , Name:[Spark] , Score:[100] count:2ID:3 , Name:[Tachyon] , Score:[70, 77] count:3ID:2 , Name:[Hadoop, Hadoop2] , Score:[90, 80] count:4
0 1
- Spark的join与cogroup简单示例
- Spark join与cogroup算子
- Spark join和cogroup算子
- Spark RDD转换操作union、join、cogroup
- Spark groupbykey和cogroup使用示例
- spark入门cogroup简单例子(JAVA)
- Spark算子:RDD键值转换操作(4)–cogroup/join
- 【Spark Java API】Transformation(7)—cogroup、join
- Spark算子:RDD键值转换操作(4)–cogroup、join
- Spark算子:RDD键值转换操作(4)–cogroup、join
- 3.3 Spark RDD 键值转换操作4-cogroup、join
- Spark算子[12]:groupByKey、cogroup、join、lookup 源码实例详解
- Pig join cogroup 介绍
- Spark函数:cogroup
- spark cogroup操作
- Spark函数讲解:cogroup
- spark函数讲解:cogroup
- spark算子cogroup讲解
- 无向图的双连通性
- shader内置变量和函数
- windows下创建并使用静态链接库(.lib)
- zcat,zgrep用法
- PHP用redis实现多进程队列
- Spark的join与cogroup简单示例
- linux 常用命令二 网络
- 基于spark mllib的LDA模型训练Scala代码实现
- Android中AIDL的实现使用
- Android Studio安装后的一些必要设置
- C# 实验五--平面直角坐标系
- 【学习笔记----数据结构04-单循环链表】
- 开通C博客了
- 5.4用形态学滤波器检测边缘和角点