Spark组件之GraphX学习4--Structural Operators：mask

来源：互联网发布：淘宝买家提问怎么删除编辑：程序博客网时间：2024/06/05 05:07

更多代码请见：https://github.com/xubo245/SparkLearning

1解释

connectedComponents源码：返回连接成分的顶点值：包含顶点Id，属性没了

  /**   * Compute the connected component membership of each vertex and return a graph with the vertex   * value containing the lowest vertex id in the connected component containing that vertex.   *   * @see [[org.apache.spark.graphx.lib.ConnectedComponents$#run]]   */  def connectedComponents(): Graph[VertexId, ED] = {    ConnectedComponents.run(graph)  }

mask源码：返回的是current graph和other graph的公共子图

  /**   * Restricts the graph to only the vertices and edges that are also in `other`, but keeps the   * attributes from this graph.   * @param other the graph to project this graph onto   * @return a graph with vertices and edges that exist in both the current graph and `other`,   * with vertex and edge data from the current graph   */  def mask[VD2: ClassTag, ED2: ClassTag](other: Graph[VD2, ED2]): Graph[VD, ED]

2.代码：

/** * @author xubo * ref http://spark.apache.org/docs/1.5.2/graphx-programming-guide.html * time 20160503 */package org.apache.spark.graphx.learningimport org.apache.spark._import org.apache.spark.graphx._// To make some of the examples work we will also need RDDimport org.apache.spark.rdd.RDDobject GraphOperatorsStructuralMask {  def main(args: Array[String]): Unit = {    val conf = new SparkConf().setAppName("GraphOperatorsStructuralMask").setMaster("local[4]")    // Assume the SparkContext has already been constructed    val sc = new SparkContext(conf)    // Create an RDD for the vertices    val users: RDD[(VertexId, (String, String))] =      sc.parallelize(Array((3L, ("rxin", "student")), (7L, ("jgonzal", "postdoc")),        (5L, ("franklin", "prof")), (2L, ("istoica", "prof")),        (4L, ("peter", "student"))))    // Create an RDD for edges    val relationships: RDD[Edge[String]] =      sc.parallelize(Array(Edge(3L, 7L, "collab"), Edge(5L, 3L, "advisor"),        Edge(2L, 5L, "colleague"), Edge(5L, 7L, "pi"),        Edge(4L, 0L, "student"), Edge(5L, 0L, "colleague")))    // Define a default user in case there are relationship with missing user    val defaultUser = ("John Doe", "Missing")    // Build the initial Graph    val graph = Graph(users, relationships, defaultUser)    // Notice that there is a user 0 (for which we have no information) connected to users    // 4 (peter) and 5 (franklin).    println("vertices:");    graph.subgraph(each => each.srcId != 100L).vertices.collect.foreach(println)    println("\ntriplets:");    graph.triplets.map(      triplet => triplet.srcAttr._1 + " is the " + triplet.attr + " of " + triplet.dstAttr._1).collect.foreach(println(_))    graph.edges.collect.foreach(println)    // Run Connected Components    val ccGraph = graph.connectedComponents() // No longer contains missing field    // Remove missing vertices as well as the edges to connected to them    val validGraph = graph.subgraph(vpred = (id, attr) => attr._2 != "Missing")    // Restrict the answer to the valid subgraph    val validCCGraph = ccGraph.mask(validGraph)    println("\nccGraph:");    println("vertices:");    ccGraph.vertices.collect.foreach(println)    println("edegs:");    ccGraph.edges.collect.foreach(println)    println("\nvalidGraph:");    validGraph.vertices.collect.foreach(println)    println("\nvalidCCGraph:");    validCCGraph.vertices.collect.foreach(println)  }}

分析：

先对图进行connectedComponents，转换成新的图ccGraph，然后再对原图Graph进行subgraph操作，最后再mask取交集

3.结果：

vertices:(4,(peter,student))(0,(John Doe,Missing))(5,(franklin,prof))(2,(istoica,prof))(3,(rxin,student))(7,(jgonzal,postdoc))triplets:rxin is the collab of jgonzalistoica is the colleague of franklinfranklin is the advisor of rxinfranklin is the pi of jgonzalpeter is the student of John Doefranklin is the colleague of John DoeEdge(3,7,collab)Edge(2,5,colleague)Edge(5,3,advisor)Edge(5,7,pi)Edge(4,0,student)Edge(5,0,colleague)ccGraph:vertices:(4,0)(0,0)(5,0)(2,0)(3,0)(7,0)edegs:Edge(3,7,collab)Edge(2,5,colleague)Edge(5,3,advisor)Edge(5,7,pi)Edge(4,0,student)Edge(5,0,colleague)validGraph:(4,(peter,student))(5,(franklin,prof))(2,(istoica,prof))(3,(rxin,student))(7,(jgonzal,postdoc))validCCGraph:(4,0)(5,0)(2,0)(3,0)(7,0)

参考

【1】 http://spark.apache.org/docs/1.5.2/graphx-programming-guide.html

【2】https://github.com/xubo245/SparkLearning

0 0