spark的Graphx中subGraph算法的改进
来源:互联网 发布:投稿系统 php 编辑:程序博客网 时间:2024/06/03 20:54
众所周知,在spark Graphx的求子图方法subgraph中,返回的子图有可能会包含孤立点,即该点无任何边:
算法源码如下所示:
override def subgraph( epred: EdgeTriplet[VD, ED] => Boolean = x => true, vpred: (VertexId, VD) => Boolean = (a, b) => true): Graph[VD, ED] = { vertices.cache() // Filter the vertices, reusing the partitioner and the index from this graph val newVerts = vertices.mapVertexPartitions(_.filter(vpred)) // Filter the triplets. We must always upgrade the triplet view fully because vpred always runs // on both src and dst vertices replicatedVertexView.upgrade(vertices, true, true) val newEdges = replicatedVertexView.edges.filter(epred, vpred) new GraphImpl(newVerts, replicatedVertexView.withEdges(newEdges))}
可以用如下的算法,过滤掉孤立点:
import scala.reflect.ClassTag
def removeSingletons[VD:ClassTag,ED:ClassTag](g:Graph[VD,ED]) = Graph(g.triplets.map(et => (et.srcId,et.srcAttr)).union(g.triplets.map(et => (et.dstId,et.dstAttr))).distinct,g.edges)
可以采用如下的例子测试一下:
val vertices = sc.makeRDD(Seq( (1L, "Ann"), (2L, "Bill"), (3L, "Charles"), (4L, "Dianne")))val edges = sc.makeRDD(Seq( Edge(1L,2L, "is-friends-with"),Edge(1L,3L, "is-friends-with"), Edge(4L,1L, "has-blocked"),Edge(2L,3L, "has-blocked"), Edge(3L,4L, "has-blocked"))) val originalGraph = Graph(vertices, edges)val subgraph = originalGraph.subgraph(et => et.attr == "is-friends-with")sc.setLogLevel("WARN")// show vertices of subgraph ?includes Diannesubgraph.vertices.collect
// now call removeSingletons and show the resulting verticesremoveSingletons(subgraph).vertices.collect
输出分别为:
scala> subgraph.vertices.collectres7: Array[(org.apache.spark.graphx.VertexId, String)] = Array((1,Ann), (2,Bill), (3,Charles), (4,Dianne))
scala> removeSingletons(subgraph).vertices.collectres8: Array[(org.apache.spark.graphx.VertexId, String)] = Array((1,Ann), (2,Bill), (3,Charles))
从中不难看出,该方法可以将孤立点(4)过滤掉。
0 0
- spark的Graphx中subGraph算法的改进
- Spark的GraphX中关于两图合并的算法
- Spark组件之GraphX学习3--Structural Operators:subgraph
- subgraph之间的连线
- Spark的Graphx学习笔记--Pregel
- Spark GraphX在淘宝的实践(明风)
- 收集学习Spark GraphX的一些资料
- pregel 与 spark graphX 的 pregel api
- spark-graphx以及图的相关介绍
- spark-graphx新的里程碑GraphDataFrames
- Spark GraphX在淘宝的实践
- Spark GraphX: 改变图的结构
- 社区发现算法FastUnfolding的GraphX实现
- graphx操作实例06-subgraph和groupEdges
- 快刀初试:Spark GraphX在淘宝的实践
- GraphX:基于Spark的弹性分布式图计算系统
- Spark GraphX进行图计算时的OOM问题
- 基于Spark的图计算框架 GraphX 入门介绍
- 四大组件之一:BroadcastReceiver
- spring+mybatis多数据源(数据库主从)实例
- Java源码之从零单排--ArrayList
- Probuf学习篇
- 关于Android toolbar去阴影问题
- spark的Graphx中subGraph算法的改进
- linux下调试程序和如何图形化调试,图形化工程代码编写与编译
- extern关键字
- 嵌入式开发之业务思考
- 数据库中的长连接 和 短连接接
- LWIP之NETCONN API 函数
- 百度2017校招笔试题 (机器学习/数据挖掘工程师)
- 30分钟了解C++11新特性
- STM32之复位和系统时钟