spark--transform算子--mapPartitionsWithIndex
来源:互联网 发布:matlab定义数组 编辑:程序博客网 时间:2024/06/05 02:04
import org.apache.spark.{SparkConf, SparkContext}import scala.collection.mutable.ArrayBuffer/** * Created by liupeng on 2017/6/15. */object T_mapPartitionsWithIndex { System.setProperty("hadoop.home.dir","F:\\hadoop-2.6.5") def fun_index(index : Int, iter : Iterator[String]) : Iterator[String] = { var list = ArrayBuffer[String]() while (iter.hasNext) { val name : String = iter.next() var fs = index + ":" + name list += fs println(fs) } return list.iterator } def main(args: Array[String]): Unit = { val conf = new SparkConf().setAppName("mapPartitionsWithIndex_test").setMaster("local") val sc = new SparkContext(conf) //准备一下数据 val names: List[String] = List("liupeng", "xuliuxi", "xiaoma") val nameRDD = sc.parallelize(names, 2) // 按照分区以及索引遍历 //如果想知道谁分到了一起,mapPartitionsWithIndex这个算子可以拿到每个partition的index val nameWithPartionIndex = nameRDD.mapPartitionsWithIndex(fun_index) println(nameWithPartionIndex.count()) }}
运行结果:
0:liupeng
1:xuliuxi
1:xiaoma
1:xuliuxi
1:xiaoma
3
阅读全文
0 0
- spark--transform算子--mapPartitionsWithIndex
- spark--transform算子--cartesian
- spark--transform算子--coalesce
- spark--transform算子--cogroup
- spark--transform算子--distinct
- spark--transform算子--filter
- spark--transform算子--flatMap
- spark--transform算子--groupByKey
- spark--transform算子--intersection
- spark--transform算子--join
- spark--transform算子--map
- spark--transform算子--mapPartitions
- spark--transform算子--parallelized
- spark--transform算子--reduceByKey
- spark--transform算子--repartition
- spark--transform算子--sample
- spark--transform算子--sortByKey
- spark--transform算子--union
- 【模拟退火,广义费马点】POJ2420 A Star not a Tree?
- PAT 1012数字分类
- mysql 插入、更新与删除数据
- iOS 序列化和反序列化
- 占位符
- spark--transform算子--mapPartitionsWithIndex
- Linux fork函数详解
- css3的简单动画效果(animation和transition)
- ngrok微信公众号本地开发
- UESTC 1703 一道更简单的字符串题 哈希+枚举
- Leetcode 188. Best Time to Buy and Sell Stock IV
- 一个错误
- java并发编程实战-避免活跃性危险
- 3532:最大上升子序列和