spark 2.1 reduce
来源:互联网 发布:网购数据 编辑:程序博客网 时间:2024/06/16 10:33
/** * Merge the values for each key using an associative and commutative reduce function. This will * also perform the merging locally on each mapper before sending results to a reducer, similarly * to a "combiner" in MapReduce. */ def reduceByKey(partitioner: Partitioner, func: (V, V) => V): RDD[(K, V)] = self.withScope { combineByKeyWithClassTag[V]((v: V) => v, func, func, partitioner) }
/** * :: Experimental :: * Generic function to combine the elements for each key using a custom set of aggregation * functions. Turns an RDD[(K, V)] into a result of type RDD[(K, C)], for a "combined type" C * * Users provide three functions: * * - `createCombiner`, which turns a V into a C (e.g., creates a one-element list) * - `mergeValue`, to merge a V into a C (e.g., adds it to the end of a list) * - `mergeCombiners`, to combine two C's into a single one. * * In addition, users can control the partitioning of the output RDD, and whether to perform * map-side aggregation (if a mapper can produce multiple items with the same key). * * @note V and C can be different -- for example, one might group an RDD of type * (Int, Int) into an RDD of type (Int, Seq[Int]). */ @Experimental def combineByKeyWithClassTag[C]( createCombiner: V => C, mergeValue: (C, V) => C, mergeCombiners: (C, C) => C, partitioner: Partitioner, mapSideCombine: Boolean = true, serializer: Serializer = null)(implicit ct: ClassTag[C]): RDD[(K, C)] = self.withScope { require(mergeCombiners != null, "mergeCombiners must be defined") // required as of Spark 0.9.0 if (keyClass.isArray) { if (mapSideCombine) { throw new SparkException("Cannot use map-side combining with array keys.") } if (partitioner.isInstanceOf[HashPartitioner]) { throw new SparkException("HashPartitioner cannot partition array keys.") } } val aggregator = new Aggregator[K, V, C]( self.context.clean(createCombiner), self.context.clean(mergeValue), self.context.clean(mergeCombiners)) if (self.partitioner == Some(partitioner)) { self.mapPartitions(iter => { val context = TaskContext.get() new InterruptibleIterator(context, aggregator.combineValuesByKey(iter, context)) }, preservesPartitioning = true) } else { new ShuffledRDD[K, V, C](self, partitioner) .setSerializer(serializer) .setAggregator(aggregator) .setMapSideCombine(mapSideCombine) } }
阅读全文
0 0
- spark 2.1 reduce
- spark 的reduce操作
- spark--actions算子--reduce
- Map-Reduce和Spark
- Spark reduce算子
- spark源码action系列-reduce
- Spark RDD---api(map&reduce)
- 深入理解Spark 2.1 Core (十一):Shuffle Reduce 端的原理与源码分析
- 深入理解Spark 2.1 Core (十一):Shuffle Reduce 端的原理与源码分析
- Spark RDD API详解Map和Reduce
- Spark RDD API详解 Map和Reduce
- Spark RDD API详解 Map和Reduce
- SPARK里的reduce(),fold(),以及aggregate()
- Spark RDD API详解 Map和Reduce
- spark算子map reduce小案例
- Spark RDD API详解 Map和Reduce
- Spark API 之 reduce、reduceByKey 、 mapvalues
- Spark RDD Actions操作之reduce()
- Java数据类型
- CodeForces
- 64位centos7 编译 32位文件出错
- frame、bounds与center属性
- 取近似值
- spark 2.1 reduce
- win10磁盘占用100%
- [bigdata-077] maven+mybatis+mysql 数据库 mybatis xml文件方式 示例
- Android-四种进程类型
- Nginx.conf 中的location 详解
- http协议的消息头的用法作用
- 解决代码动态设置Edittext编辑状态存在问题
- 创建maven工程和手动添加依赖库
- android设置横竖屏、可触控、获取分辨率等代码