RDD之aggregate
来源:互联网 发布:淘宝收到货后怎么换货 编辑:程序博客网 时间:2024/04/29 14:03
定义
定义可参考RDD的API
aggregate[U](zeroValue: U)(seqOp: (U, T) ⇒ U, combOp: (U, U) ⇒ U)(implicit arg0: ClassTag[U]): U
Aggregate the elements of each partition, and then the results for all the partitions, using given combine functions and a neutral “zero value”. This function can return a different result type, U, than the type of this RDD, T. Thus, we need one operation for merging a T into an U and one operation for merging two U’s, as in scala.TraversableOnce. Both of these functions are allowed to modify and return their first argument instead of creating a new U to avoid memory allocation.
zeroValue
the initial value for the accumulated result of each partition for the seqOp operator, and also the initial value for the combine results from different partitions for the combOp operator - this will typically be the neutral element (e.g. Nil for list concatenation or 0 for summation)
seqOp
an operator used to accumulate results within a partition
combOp
an associative operator used to combine results from different partitions
实验1-熟悉使用
api讲的比较清楚了,该函数用来聚集每个分区的元素,并用合并函数和zeroValue来聚集分区结果。并给予我们两个函数,seqOp和CombOp
实验程序
打开spark-shell,我们执行实验1(当复制并粘贴以下代码实验时请将注释去掉)
//该函数用来将每个分区的index展示出来def myfunc[T](index:Int,iter:Iterator[T]):Iterator[(Int,T)]={var res = List[(Int,T)]()for(x<-iter)res.::=(index,x)res.iterator}val data = sc.parallelize(1 to 10,3)data.mapPartitionsWithIndex(myfunc).collectdata.aggregate(0)((a,b)=>if(a>b) a else b ,_+_)
实验结果
结果分析
实验2-zeroValue
api讲解如下:zeroValue值为seqOp函数的初始值,同时也是combOp函数的初始值。
实验程序
打开spark-shell,我们执行实验2(当复制并粘贴以下代码实验时请将注释去掉)
//seqOp函数def seqOp(arg1:Int,arg2:Int):Int={var res:Int=arg2if(arg1>arg2)res=arg1println("seqOp:"+arg1+","+arg2+"=>"+res)res}//combOp函数def combOp(arg1:Int,arg2:Int):Int={println("combOp:"+arg1+","+arg2+"=>"+(arg1+arg2))arg1+arg2}//将每个分区index显示出来def myfunc[T](index:Int,iter:Iterator[T]):Iterator[(Int,T)]={var res = List[(Int,T)]()for(x<-iter)res.::=(index,x)res.iterator}val data = sc.parallelize(1 to 10,3)data.mapPartitionsWithIndex(myfunc).collectdata.aggregate(11)(seqOp,combOp)
实验结果
结果分析
当然,该实验的zeroValue取值比较极端,大家可换成5或者6试一试
参考博客:
[1]:http://homepage.cs.latrobe.edu.au/zhe/ZhenHeSparkRDDAPIExamples.html#aggregate
[2]:http://www.iteblog.com/archives/1268
- RDD之aggregate
- RDD之aggregate操作
- Spark编程之基本的RDD算子-aggregate和aggregateByKey
- Spark RDD的aggregate算子
- spark rdd aggregate (python语言)
- 理解Spark RDD中的aggregate函数
- 理解Spark RDD中的aggregate函数
- spark RDD算子(九)之基本的Action操作 first, take, collect, count, countByValue, reduce, aggregate, fold,top
- LINQ之Aggregate
- 细说Linq之Aggregate
- RDD行动Action操作(3)–aggregate、fold、lookup
- 3.4 Spark RDD Action操作3-聚合-aggregate、fold、reduce
- hadoop Streaming之aggregate
- mongo与Java之aggregate
- Hadoop系列之Aggregate用法
- spark之RDD
- spark之RDD
- Spark之RDD编程
- Nodejs+express+ejs简单实例
- Android数据存储——SharedPreferences及SDCard
- andriod studio的快捷键
- CMFCPropertyGridCtrl响应消息
- 1809: make pair
- RDD之aggregate
- hibernate基本映射标签和属性
- ztree插件简单使用
- ios开发证书,描述文件,bundle ID的关系
- 信号完整性问题最小化的100条通用设计原则
- 开公司流程
- spring整合activemq发送MQ消息[Topic模式]实例,activemqmq
- CentOS7 MongoDB 3.2.6 安装以及自启动配置
- Linux系统修改编码