28.使用Scala实现自定义Accumulator

来源：互联网发布：卫宁软件股票代码编辑：程序博客网时间：2024/06/05 20:35

本文为《Spark大型电商项目实战》系列文章之一，主要介绍使用Scala实现自定义Accumulator的功能。

说明

使用Scala实现项目的关键点，这节介绍使用Scala实现自定义Accumulator。

在Scala IDE（如intellij，或scala ide for eclipse）中配置好Scala开发环境后根据自己情况新建包com.erik.sparkproject，
然后导入之前的工具类Constants.java和StringUtils.java，
最后新建SessionAggrStatAccumulatorTest.scala.

编码实现

编码实现自定义Accumulator，并且进行简单测试，将1s_3s，4s_6s加1

package com.erik.sparkprojectimport org.apache.spark.AccumulatorParamimport org.apache.spark.SparkConfimport org.apache.spark.SparkContext/** * @author erik */object SessionAggrStatAccumulatorTest {  def main(args: Array[String]): Unit = {    /**     * Scala中，自定义Accumulator     * 使用object，直接定义一个伴生对象即可     * 需要实现AccumulatorParam接口，并使用[]语法，定义输入输出的数据格式     */    object SessionAggrStatAccumulator extends AccumulatorParam[String] {      /**       * 首先要实现zero方法       * 负责返回一个初始值       */      def zero(initialValue: String): String = {        Constants.SESSION_COUNT + "=0|"         + Constants.TIME_PERIOD_1s_3s + "=0|"         + Constants.TIME_PERIOD_4s_6s + "=0|"         + Constants.TIME_PERIOD_7s_9s + "=0|"         + Constants.TIME_PERIOD_10s_30s + "=0|"         + Constants.TIME_PERIOD_30s_60s + "=0|"         + Constants.TIME_PERIOD_1m_3m + "=0|"         + Constants.TIME_PERIOD_3m_10m + "=0|"         + Constants.TIME_PERIOD_10m_30m + "=0|"         + Constants.TIME_PERIOD_30m + "=0|"         + Constants.STEP_PERIOD_1_3 + "=0|"         + Constants.STEP_PERIOD_4_6 + "=0|"         + Constants.STEP_PERIOD_7_9 + "=0|"         + Constants.STEP_PERIOD_10_30 + "=0|"         + Constants.STEP_PERIOD_30_60 + "=0|"         + Constants.STEP_PERIOD_60 + "=0"      }      /**       * 其次需要实现一个累加方法       */      def addInPlace(v1: String, v2: String): String = {        // 如果初始化值为空，那么返回v2        if(v1 == "") {          v2        } else {          // 从现有的连接串中提取v2对应的值          val oldValue = StringUtils.getFieldFromConcatString(v1, "\\|", v2);          // 累加1          val newValue = Integer.valueOf(oldValue) + 1          // 给连接串中的v2设置新的累加后的值          StringUtils.setFieldInConcatString(v1, "\\|", v2, String.valueOf(newValue))           }      }    }    // 创建Spark上下文    val conf = new SparkConf()        .setAppName("SessionAggrStatAccumulatorTest")          .setMaster("local")      val sc = new SparkContext(conf);    // 使用accumulator()()方法（curry），创建自定义的Accumulator    val sessionAggrStatAccumulator = sc.accumulator("")(SessionAggrStatAccumulator)       // 模拟使用自定义的Accumulator    val arr = Array(Constants.TIME_PERIOD_1s_3s, Constants.TIME_PERIOD_4s_6s)      val rdd = sc.parallelize(arr, 1)      rdd.foreach { sessionAggrStatAccumulator.add(_) }      println(sessionAggrStatAccumulator.value)    }}

测试

运行这个程序，如果结果输出如下内容说明测试通过
session_count=0|1s_3s=1|4s_6s=1|7s_9s=0|10s_30s=0|1m_3m=0|3m_10m=0|10m_30m=0|30m=0|1_3=0|4_6=0|7_9=0|10_30=0|30_60=0|60=0|

《Spark 大型电商项目实战》源码：https://github.com/Erik-ly/SprakProject

本文为《Spark大型电商项目实战》系列文章之一。
更多文章：Spark大型电商项目实战：http://blog.csdn.net/u012318074/article/category/6744423

0 0