spark streaming 2 ParallelCollectionRDD
来源:互联网 发布:2016nba总决赛数据 编辑:程序博客网 时间:2024/05/27 10:43
private object ParallelCollectionRDD { /** * Slice a collection into numSlices sub-collections. One extra thing we do here is to treat Range * collections specially, encoding the slices as other Ranges to minimize memory cost. This makes * it efficient to run Spark over RDDs representing large sets of numbers. And if the collection * is an inclusive Range, we use inclusive range for the last slice. */ def slice[T: ClassTag](seq: Seq[T], numSlices: Int): Seq[Seq[T]] = { if (numSlices < 1) { throw new IllegalArgumentException("Positive number of slices required") } // Sequences need to be sliced at the same set of index positions for operations // like RDD.zip() to behave as expected def positions(length: Long, numSlices: Int): Iterator[(Int, Int)] = { (0 until numSlices).iterator.map { i => val start = ((i * length) / numSlices).toInt val end = (((i + 1) * length) / numSlices).toInt (start, end) } } seq match { case r: Range => positions(r.length, numSlices).zipWithIndex.map { case ((start, end), index) => // If the range is inclusive, use inclusive range for the last slice if (r.isInclusive && index == numSlices - 1) { new Range.Inclusive(r.start + start * r.step, r.end, r.step) } else { new Range(r.start + start * r.step, r.start + end * r.step, r.step) } }.toSeq.asInstanceOf[Seq[Seq[T]]] case nr: NumericRange[_] => // For ranges of Long, Double, BigInteger, etc val slices = new ArrayBuffer[Seq[T]](numSlices) var r = nr for ((start, end) <- positions(nr.length, numSlices)) { val sliceSize = end - start slices += r.take(sliceSize).asInstanceOf[Seq[T]] r = r.drop(sliceSize) } slices case _ => val array = seq.toArray // To prevent O(n^2) operations for List etc positions(array.length, numSlices).map { case (start, end) => array.slice(start, end).toSeq }.toSeq } }}
0 0
- spark streaming 2 ParallelCollectionRDD
- spark streaming 2 ReceiverSupervisorImpl
- spark streaming 2 BlockGenerator
- Spark Streaming 2:概述
- Spark Streaming源码初探 (2)
- Spark教程(2)Spark Streaming 介绍
- Spark Streaming
- spark streaming
- Spark/Streaming
- Spark Streaming
- spark streaming
- Spark Streaming
- Spark Streaming
- Spark Streaming
- Spark Streaming
- spark streaming
- Spark Streaming
- Spark Streaming
- tensorflow学习笔记(二十五):ConfigProto&GPU
- iOS命令行自动打包(archive)
- ssh免密码登录配置方法,(图示加命令)
- HTC vive基于unity的凝视交互功能(带HTC 插件)
- [译]Android Activity 和 Fragment 状态保存与恢复的最佳实践
- spark streaming 2 ParallelCollectionRDD
- Dos命令查看端口占用及关闭进程
- Gradle for Android-建立持续集成
- 配置NGINX同时运行 https 和 http
- 【Unity工具】Sprite Illuminator下载教程&TexturePacke资料总结
- Fragment与Activity之间的相互通信
- UVA116 简单的多阶段DP
- WebService技术总结(四):CXF入门级应用
- 第16周项目1-验证算法(1)直接插入排序