spark函数讲解:cogroup
来源:互联网 发布:淘宝店铺装修免费图片 编辑:程序博客网 时间:2024/06/01 14:50
cogroup:将多个RDD中同一个Key对应的Value组合到一起。最多可以组合四个RDD
函数原型:
def cogroup[W1, W2, W3](other1: RDD[(K, W1)], other2: RDD[(K, W2)], other3: RDD[(K, W3)], partitioner: Partitioner) : RDD[(K, (Iterable[V], Iterable[W1], Iterable[W2], Iterable[W3]))] def cogroup[W1, W2, W3](other1: RDD[(K, W1)], other2: RDD[(K, W2)], other3: RDD[(K, W3)], numPartitions: Int) : RDD[(K, (Iterable[V], Iterable[W1], Iterable[W2], Iterable[W3]))]def cogroup[W1, W2, W3](other1: RDD[(K, W1)], other2: RDD[(K, W2)], other3: RDD[(K, W3)]) : RDD[(K, (Iterable[V], Iterable[W1], Iterable[W2], Iterable[W3]))]def cogroup[W1, W2](other1: RDD[(K, W1)], other2: RDD[(K, W2)], partitioner: Partitioner) : RDD[(K, (Iterable[V], Iterable[W1], Iterable[W2]))]def cogroup[W1, W2](other1: RDD[(K, W1)], other2: RDD[(K, W2)], numPartitions: Int) : RDD[(K, (Iterable[V], Iterable[W1], Iterable[W2]))]def cogroup[W1, W2](other1: RDD[(K, W1)], other2: RDD[(K, W2)]) : RDD[(K, (Iterable[V], Iterable[W1], Iterable[W2]))]def cogroup[W](other: RDD[(K, W)], partitioner: Partitioner) : RDD[(K, (Iterable[V], Iterable[W]))]def cogroup[W](other: RDD[(K, W)], numPartitions: Int): RDD[(K, (Iterable[V], Iterable[W]))]def cogroup[W](other: RDD[(K, W)]): RDD[(K, (Iterable[V], Iterable[W]))]
实例:
/** * User: 过往记忆 * Date: 15-03-10 * Time: 下午06:30 * bolg: http://www.iteblog.com * 本文地址:http://www.iteblog.com/archives/1280 * 过往记忆博客,专注于hadoop、hive、spark、shark、flume的技术博客,大量的干货 * 过往记忆博客微信公共帐号:iteblog_hadoop */scala> val data1 = sc.parallelize(List((1, "www"), (2, "bbs")))data1: org.apache.spark.rdd.RDD[(Int, String)] = ParallelCollectionRDD[32] at parallelize at <console>:12 scala> val data2 = sc.parallelize(List((1, "iteblog"), (2, "iteblog"), (3, "very")))data2: org.apache.spark.rdd.RDD[(Int, String)] = ParallelCollectionRDD[33] at parallelize at <console>:12 scala> val data3 = sc.parallelize(List((1, "com"), (2, "com"), (3, "good")))data3: org.apache.spark.rdd.RDD[(Int, String)] = ParallelCollectionRDD[34] at parallelize at <console>:12 scala> val result = data1.cogroup(data2, data3)result: org.apache.spark.rdd.RDD[(Int, (Iterable[String], Iterable[String], Iterable[String]))] = MappedValuesRDD[38] at cogroup at <console>:18 scala> result.collectres30: Array[(Int, (Iterable[String], Iterable[String], Iterable[String]))] =Array((1,(CompactBuffer(www),CompactBuffer(iteblog),CompactBuffer(com))), (2,(CompactBuffer(bbs),CompactBuffer(iteblog),CompactBuffer(com))), (3,(CompactBuffer(),CompactBuffer(very),CompactBuffer(good))))
本文转载自:http://www.iteblog.com/archives/1280
0 0
- Spark函数讲解:cogroup
- spark函数讲解:cogroup
- spark算子cogroup讲解
- Spark函数:cogroup
- spark cogroup操作
- spark--transform算子--cogroup
- Spark join和cogroup算子
- Spark join与cogroup算子
- Spark函数讲解:aggregate
- Spark函数讲解:coalesce
- Spark函数讲解:checkpoint
- Spark函数讲解:cartesian
- Spark函数讲解:cache
- Spark函数讲解:aggregateByKey
- Spark函数讲解:collect
- Spark函数讲解:collectAsMap
- Spark函数讲解:combineByKey
- Spark函数讲解:coalesce
- STRUTS2为每个线程提供一个ACTION实例
- 03-在Spring4中使用通用Mapper
- 如何退出Activity?如何安全退出已调用多个Activity的Application?
- qt换肤功能-qss
- error D8003 : missing source filename
- spark函数讲解:cogroup
- C# WinForm窗体及其控件自适应各种屏幕分辨率
- Cache-Control
- Jmeter教程 简单的压力测试
- 面试题22:根据栈的压入序列,判断弹出序列是否合法
- linux下 查看设备 型号,属性
- 时间日期示例
- mysql将表字段信息拼接转换成实体类中的属性书写格式
- file_operation