SparkTC

来源:互联网 发布:enchant.js 编辑:程序博客网 时间:2024/06/07 06:10

1. 计算传递闭包(可到达路径数目)

2. 自动生成图,使用可变Set存储起点,终点

def generateGraph = {    val edges: mutable.Set[(Int, Int)] = mutable.Set.empty    while (edges.size < numEdges) {      val from = rand.nextInt(numVertices)      val to = rand.nextInt(numVertices)      if (from != to) edges.+=((from, to))    }    edges.toSeq  }
3. 初始化sparkConf, 以及初始化数据
    val sparkConf = new SparkConf().setAppName("SparkTC")    val spark = new SparkContext(sparkConf)    val slices = if (args.length > 0) args(0).toInt else 2    var tc = spark.parallelize(generateGraph, slices).cache()
4. 翻转起点和终点,方便join, (x,y) (y,z) ==>(x,z) 需要翻转(x,y)为(y,x)才能join出正确结果

    val edges = tc.map(x => (x._2, x._1))

5.不断join,union并计算个数直到不变
  var oldCount = 0L    var nextCount = tc.count()    do {      oldCount = nextCount      // Perform the join, obtaining an RDD of (y, (z, x)) pairs,      // then project the result to obtain the new (x, z) paths.      tc = tc.union(tc.join(edges).map(x => (x._2._2, x._2._1))).distinct().cache()      nextCount = tc.count()    } while (nextCount != oldCount)



0 0
原创粉丝点击