Spark-partitioner
来源:互联网 发布:linux分页显示命令 编辑:程序博客网 时间:2024/05/22 13:41
Spark-partitioner
@(spark)[partitioner]
Partitioner
/** * An object that defines how the elements in a key-value pair RDD are partitioned by key. * Maps each key to a partition ID, from 0 to `numPartitions - 1`. */ abstract class Partitioner extends Serializable { def numPartitions: Int def getPartition(key: Any): Int }
HashPartitioner
/** * A [[org.apache.spark.Partitioner]] that implements hash-based partitioning using * Java's `Object.hashCode`. * * Java arrays have hashCodes that are based on the arrays' identities rather than their contents, * so attempting to partition an RDD[Array[_]] or RDD[(Array[_], _)] using a HashPartitioner will * produce an unexpected or incorrect result. */ class HashPartitioner(partitions: Int) extends Partitioner {
RangePartitioner
实际上这个用于sort base的partition
1. 取个sample,得到大概的数据分布
2. 每个key,根据上面的sample确定partition
/** * A [[org.apache.spark.Partitioner]] that partitions sortable records by range into roughly * equal ranges. The ranges are determined by sampling the content of the RDD passed in. * * Note that the actual number of partitions created by the RangePartitioner might not be the same * as the `partitions` parameter, in the case where the number of sampled records is less than * the value of `partitions`. */ class RangePartitioner[K : Ordering : ClassTag, V](
0 0
- Spark-partitioner
- Spark Partitioner源码
- Spark自定义分区(Partitioner)
- Spark自定义分区(Partitioner)
- Spark自定义分区(Partitioner)
- spark partitioner使用技巧
- Spark自定义分区(Partitioner)
- Spark自定义分区(Partitioner)
- Spark RDD之Partitioner
- spark 自定义partitioner
- Spark Default Partitioner
- spark的自定义partitioner
- Spark Partitioner自定义分区
- spark partitioner使用技巧
- Spark自定义分区(Partitioner)
- spark源码剖析之----Partitioner
- spark-0.8.0源码剖析-分区Partitioner
- spark源码解读1之Partitioner
- C++——2831: 字符串处理
- ACM 线段上格点的个数
- 线性回归
- Spark-Dependency/Aggregator
- java中有关链表的用法
- Spark-partitioner
- Python文件编码的声明方法
- Spark-futureAction
- Android开发中的设计模式—单例模式的详细解释
- getMeasuredWidth()、getLayoutParams().getWidth()、getWidth()的区别
- 【课外作业】二维双向链表练习代码
- Android四大组件之一Service介绍-android学习之旅(十二)
- 0521开始学习打卡
- Java中集合类初探