rdd的join使用
来源:互联网 发布:mac 编译android源码 编辑:程序博客网 时间:2024/06/03 17:58
代码如下
package rddimport org.apache.spark.{SparkContext, SparkConf}/** * Created by 汪本成 on 2016/7/2. */object rddJoin { def main(args: Array[String]) { val conf = new SparkConf().setAppName("rddJoin").setMaster("local") val sc = new SparkContext(conf) val rdd1 = sc.parallelize(Array((1, 21), (2, 42), (3, 41)), 1) val rdd2 = sc.parallelize(Array((3, 4), (4, 41)), 1) val rdd3 = rdd1.join(rdd2) rdd3.foreach(println) rdd1.zipWithIndex.foreach(println) }}
运行结果如下
16/07/02 22:42:13 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 19 ms16/07/02 22:42:13 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks16/07/02 22:42:13 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms(3,(41,4))16/07/02 22:42:13 INFO Executor: Finished task 0.0 in stage 2.0 (TID 2). 1165 bytes result sent to driver16/07/02 22:42:13 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 113 ms on localhost (1/1)16/07/02 22:42:13 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 16/07/02 22:42:13 INFO DAGScheduler: ResultStage 2 (foreach at rddJoin.scala:18) finished in 0.114 s16/07/02 22:42:13 INFO DAGScheduler: Job 0 finished: foreach at rddJoin.scala:18, took 1.312562 s16/07/02 22:42:13 INFO SparkContext: Starting job: foreach at rddJoin.scala:1916/07/02 22:42:13 INFO DAGScheduler: Got job 1 (foreach at rddJoin.scala:19) with 1 output partitions16/07/02 22:42:13 INFO DAGScheduler: Final stage: ResultStage 3 (foreach at rddJoin.scala:19)16/07/02 22:42:13 INFO DAGScheduler: Parents of final stage: List()16/07/02 22:42:13 INFO DAGScheduler: Missing parents: List()16/07/02 22:42:13 INFO DAGScheduler: Submitting ResultStage 3 (ZippedWithIndexRDD[5] at zipWithIndex at rddJoin.scala:19), which has no missing parents16/07/02 22:42:13 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 1608.0 B, free 12.3 KB)16/07/02 22:42:13 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 1075.0 B, free 13.3 KB)16/07/02 22:42:13 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on localhost:46987 (size: 1075.0 B, free: 5.1 GB)16/07/02 22:42:13 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:100616/07/02 22:42:13 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 3 (ZippedWithIndexRDD[5] at zipWithIndex at rddJoin.scala:19)16/07/02 22:42:13 INFO TaskSchedulerImpl: Adding task set 3.0 with 1 tasks16/07/02 22:42:13 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 3, localhost, partition 0,PROCESS_LOCAL, 2336 bytes)16/07/02 22:42:13 INFO Executor: Running task 0.0 in stage 3.0 (TID 3)((1,21),0)((2,42),1)((3,41),2)
join的在spark rdd中的使用现在书上说的也很详细,这里就直接用程序给大家展示了
0 0
- rdd的join使用
- RDD的Join
- spark RDD join的核心过程
- Spark RDD中Transformation的combineByKey、reduceByKey,join详解
- Spark 实践 - RDD 的 join操作之需要注意的事项 - RDD为空的join操作
- rdd算子中能使用rdd的引用吗?
- RDD Join 性能调优
- RDD Join 性能调优
- RDD Join 性能调优
- MySQL的Join使用
- MySql的Join使用
- MySQL的Join使用
- MySQL的join使用
- MySQL的Join使用
- join方法的使用
- MySQL的Join使用
- MySQL的Join使用
- SQL JOIN的使用
- template classes or function
- 一句SQL,判断char列的值是否组成回文字符串
- jsonp跨域问题
- 开启MyBatis(二)创建工程
- PAT乙级练习题B1031. 查验身份证
- rdd的join使用
- 你的android应用其实不需要那么多的权限(I don't need your permission!)
- Linux命令详解:cat、more、less命令 结合grep 基本可以查看所有的文件
- java中PreparedStatement接口及ResultSet结果集
- 第3章 关于 Greenplum 中的并发控制
- windows下安装JDK和Tomcat
- nodejs的module.exports 与 exports
- 剑指offer系列-T11_2数值的整数次方(时间复杂度为o(logn))
- 2、MySQL安装和配置