Spark组件之GraphX学习4--Structural Operators:mask
来源:互联网 发布:淘宝买家提问怎么删除 编辑:程序博客网 时间:2024/06/05 05:07
更多代码请见:https://github.com/xubo245/SparkLearning
1解释
connectedComponents源码:返回连接成分的顶点值:包含顶点Id,属性没了
/** * Compute the connected component membership of each vertex and return a graph with the vertex * value containing the lowest vertex id in the connected component containing that vertex. * * @see [[org.apache.spark.graphx.lib.ConnectedComponents$#run]] */ def connectedComponents(): Graph[VertexId, ED] = { ConnectedComponents.run(graph) }
mask源码:返回的是current graph和other graph的公共子图
/** * Restricts the graph to only the vertices and edges that are also in `other`, but keeps the * attributes from this graph. * @param other the graph to project this graph onto * @return a graph with vertices and edges that exist in both the current graph and `other`, * with vertex and edge data from the current graph */ def mask[VD2: ClassTag, ED2: ClassTag](other: Graph[VD2, ED2]): Graph[VD, ED]
2.代码:
/** * @author xubo * ref http://spark.apache.org/docs/1.5.2/graphx-programming-guide.html * time 20160503 */package org.apache.spark.graphx.learningimport org.apache.spark._import org.apache.spark.graphx._// To make some of the examples work we will also need RDDimport org.apache.spark.rdd.RDDobject GraphOperatorsStructuralMask { def main(args: Array[String]): Unit = { val conf = new SparkConf().setAppName("GraphOperatorsStructuralMask").setMaster("local[4]") // Assume the SparkContext has already been constructed val sc = new SparkContext(conf) // Create an RDD for the vertices val users: RDD[(VertexId, (String, String))] = sc.parallelize(Array((3L, ("rxin", "student")), (7L, ("jgonzal", "postdoc")), (5L, ("franklin", "prof")), (2L, ("istoica", "prof")), (4L, ("peter", "student")))) // Create an RDD for edges val relationships: RDD[Edge[String]] = sc.parallelize(Array(Edge(3L, 7L, "collab"), Edge(5L, 3L, "advisor"), Edge(2L, 5L, "colleague"), Edge(5L, 7L, "pi"), Edge(4L, 0L, "student"), Edge(5L, 0L, "colleague"))) // Define a default user in case there are relationship with missing user val defaultUser = ("John Doe", "Missing") // Build the initial Graph val graph = Graph(users, relationships, defaultUser) // Notice that there is a user 0 (for which we have no information) connected to users // 4 (peter) and 5 (franklin). println("vertices:"); graph.subgraph(each => each.srcId != 100L).vertices.collect.foreach(println) println("\ntriplets:"); graph.triplets.map( triplet => triplet.srcAttr._1 + " is the " + triplet.attr + " of " + triplet.dstAttr._1).collect.foreach(println(_)) graph.edges.collect.foreach(println) // Run Connected Components val ccGraph = graph.connectedComponents() // No longer contains missing field // Remove missing vertices as well as the edges to connected to them val validGraph = graph.subgraph(vpred = (id, attr) => attr._2 != "Missing") // Restrict the answer to the valid subgraph val validCCGraph = ccGraph.mask(validGraph) println("\nccGraph:"); println("vertices:"); ccGraph.vertices.collect.foreach(println) println("edegs:"); ccGraph.edges.collect.foreach(println) println("\nvalidGraph:"); validGraph.vertices.collect.foreach(println) println("\nvalidCCGraph:"); validCCGraph.vertices.collect.foreach(println) }}分析:
先对图进行connectedComponents,转换成新的图ccGraph,然后再对原图Graph进行subgraph操作,最后再mask取交集
3.结果:
vertices:(4,(peter,student))(0,(John Doe,Missing))(5,(franklin,prof))(2,(istoica,prof))(3,(rxin,student))(7,(jgonzal,postdoc))triplets:rxin is the collab of jgonzalistoica is the colleague of franklinfranklin is the advisor of rxinfranklin is the pi of jgonzalpeter is the student of John Doefranklin is the colleague of John DoeEdge(3,7,collab)Edge(2,5,colleague)Edge(5,3,advisor)Edge(5,7,pi)Edge(4,0,student)Edge(5,0,colleague)ccGraph:vertices:(4,0)(0,0)(5,0)(2,0)(3,0)(7,0)edegs:Edge(3,7,collab)Edge(2,5,colleague)Edge(5,3,advisor)Edge(5,7,pi)Edge(4,0,student)Edge(5,0,colleague)validGraph:(4,(peter,student))(5,(franklin,prof))(2,(istoica,prof))(3,(rxin,student))(7,(jgonzal,postdoc))validCCGraph:(4,0)(5,0)(2,0)(3,0)(7,0)
参考
【1】 http://spark.apache.org/docs/1.5.2/graphx-programming-guide.html
【2】https://github.com/xubo245/SparkLearning
0 0
- Spark组件之GraphX学习4--Structural Operators:mask
- Spark组件之GraphX学习3--Structural Operators:subgraph
- Spark组件之GraphX学习20--待学习部分
- Spark组件之GraphX学习1--入门实例Property Graph
- Spark组件之GraphX学习2--triplets实践
- Spark组件之GraphX学习8--邻居集合
- Spark组件之GraphX学习11--PageRank例子(PageRankAboutBerkeleyWiki)
- Spark组件之GraphX学习13--ConnectedComponents操作
- Spark组件之GraphX学习14--TriangleCount实例和分析
- Spark组件之GraphX学习16--最短路径ShortestPaths
- Spark组件之GraphX学习12--GraphX常见操作汇总SimpleGraphX
- spark组件之graphx图并行计算
- spark graphx 2.1 -- Summary List of Operators
- 第九章:在Spark集群上掌握比较重要的图操作之Structural Operators
- Spark组件之GraphX学习10--PageRank学习和使用(From examples)
- Spark组件之GraphX学习6--随机图生成和出度入度等信息显示
- Spark组件之GraphX学习8--随机图生成和TopK最大入度
- Spark组件之GraphX学习9--使用pregel函数求单源最短路径
- 到底什么是集群&分布式
- java方法的声明及使用
- django应用:south的使用
- 千奇百怪的博弈论(不定时更新)
- Maven在IntelliJ IDEA中的学习笔记(1)
- Spark组件之GraphX学习4--Structural Operators:mask
- Spark 数据ETL及部分代码示例
- MapReduce源码分析之作业Job状态机解析(一)简介与正常流程浅析
- java笔记→java线程的通讯问题(生产者与消费者)
- Redis源码解析——统计二进制数中1的个数
- Problem B
- maven/gradle 打包后自动上传到nexus仓库
- 标准MDL方法修改Page、NonPage内存的属性
- MongoDB学习12_MongoDB学习笔记之 第3章 MongoDB的Java驱动