spark aggregateByKey函数执行结果异常问题

来源:互联网 发布:mysql递归 编辑:程序博客网 时间:2024/06/05 11:39
执行函数异常代码
</pre><pre name="code" class="java"> val conf = new SparkConf().setAppName("SparkWordCount").setMaster("local[1]")    val sc = new SparkContext(conf)    val data = sc.parallelize(List((1, 3), (1, 200), (1, 100), (2, 3), (2, 4), (2, 5)))    def seqOp(a: Int, b: Int): Int = {      println("seq: " + a + "\t " + b)      math.max(a, b)    }    def combineOp(a: Int, b: Int): Int = {      println("comb: " + a + "\t " + b)      a + b    }    //    val localIterator=data.aggregateByKey(0)((_,_)._2, _+_).collect();    val localIterator = data.aggregateByKey(4)(seqOp, combineOp).collect();    for (i <- localIterator) println(i)    sc.stop()
//这样的代码执行完后的结果不正确,原因是数据分片默认太少,将代码的第三行改为:
val data = sc.parallelize(List((1, 3), (1, 200), (1, 100), (2, 3), (2, 4), (2, 5)), 6)
即可

 
0 0
原创粉丝点击