Spark 2.x 自定义累加器AccumulatorV2的使用

来源:互联网 发布:java高手的简历 编辑:程序博客网 时间:2024/06/05 11:49

废除

Spark2.x之后,之前的的accumulator被废除,用AccumulatorV2代替;

这里写代码片


更新增加

创建并注册一个long accumulator, 从“0”开始,用“add”累加

  def longAccumulator(name: String): LongAccumulator = {    val acc = new LongAccumulator    register(acc, name)    acc  }

创建并注册一个double accumulator, 从“0”开始,用“add”累加

  def doubleAccumulator(name: String): DoubleAccumulator = {    val acc = new DoubleAccumulator    register(acc, name)    acc  }

创建并注册一个CollectionAccumulator, 从“empty list”开始,并加入集合

  def collectionAccumulator[T](name: String): CollectionAccumulator[T] = {    val acc = new CollectionAccumulator[T]    register(acc, name)    acc  }

自定义累加器

1、类继承extends AccumulatorV2[String, String],第一个为输入类型,第二个为输出类型
2、覆写抽象方法:

isZero: 当AccumulatorV2中存在类似数据不存在这种问题时,是否结束程序。
copy: 拷贝一个新的AccumulatorV2
reset: 重置AccumulatorV2中的数据
add: 操作数据累加方法实现
merge: 合并数据
value: AccumulatorV2对外访问的数据结果

参考,可参考longAccumulator源码:链接地址[291-361行]

下边为一个简单的案例:实现字符串的拼接;

1、定义:MyAccumulator

class  MyAccumulator extends AccumulatorV2[String,String]{  private var res = ""  override def isZero: Boolean = {res == ""}  override def merge(other: AccumulatorV2[String, String]): Unit = other match {    case o : MyAccumulator => res += o.res    case _ => throw new UnsupportedOperationException(      s"Cannot merge ${this.getClass.getName} with ${other.getClass.getName}")  }  override def copy(): MyAccumulator = {    val newMyAcc = new MyAccumulator    newMyAcc.res = this.res    newMyAcc  }  override def value: String = res  override def add(v: String): Unit = res += v +"-"  override def reset(): Unit = res = ""}

2、调用:

object Accumulator1 {  def main(args: Array[String]) {    val conf = new SparkConf().setAppName("Accumulator1").setMaster("local")    val sc = new SparkContext(conf)    val myAcc = new MyAccumulator    sc.register(myAcc,"myAcc")    //val acc = sc.longAccumulator("avg")    val nums = Array("1","2","3","4","5","6","7","8")    val numsRdd = sc.parallelize(nums)    numsRdd.foreach(num => myAcc.add(num))    println(myAcc)    sc.stop()  }}

3、结果:1-2-3-4-5-6-7-8-


使用注意点

像map()这样的惰性转换中,不保证会执行累加器更新。

// Here, accum is still 0 because no actions have caused the map operation to be computed.val accum = sc.longAccumulatordata.map { x => accum.add(x); x }

更多参考:
http://blog.csdn.net/asas1314/article/details/54571815
http://www.ccblog.cn/103.htm