spark的Graphx中subGraph算法的改进

来源:互联网 发布:投稿系统 php 编辑:程序博客网 时间:2024/06/03 20:54

众所周知,在spark Graphx的求子图方法subgraph中,返回的子图有可能会包含孤立点,即该点无任何边:

算法源码如下所示:

override def subgraph(    epred: EdgeTriplet[VD, ED] => Boolean = x => true,    vpred: (VertexId, VD) => Boolean = (a, b) => true): Graph[VD, ED] = {  vertices.cache()  // Filter the vertices, reusing the partitioner and the index from this graph  val newVerts = vertices.mapVertexPartitions(_.filter(vpred))  // Filter the triplets. We must always upgrade the triplet view fully because vpred always runs  // on both src and dst vertices  replicatedVertexView.upgrade(vertices, true, true)  val newEdges = replicatedVertexView.edges.filter(epred, vpred)  new GraphImpl(newVerts, replicatedVertexView.withEdges(newEdges))}
可以用如下的算法,过滤掉孤立点:
import scala.reflect.ClassTag
def removeSingletons[VD:ClassTag,ED:ClassTag](g:Graph[VD,ED]) =  Graph(g.triplets.map(et => (et.srcId,et.srcAttr)).union(g.triplets.map(et => (et.dstId,et.dstAttr))).distinct,g.edges)
可以采用如下的例子测试一下:
val vertices = sc.makeRDD(Seq( (1L, "Ann"), (2L, "Bill"), (3L, "Charles"), (4L, "Dianne")))val edges = sc.makeRDD(Seq( Edge(1L,2L, "is-friends-with"),Edge(1L,3L, "is-friends-with"), Edge(4L,1L, "has-blocked"),Edge(2L,3L, "has-blocked"), Edge(3L,4L, "has-blocked")))  val originalGraph = Graph(vertices, edges)val subgraph = originalGraph.subgraph(et => et.attr == "is-friends-with")sc.setLogLevel("WARN")// show vertices of subgraph ?includes Diannesubgraph.vertices.collect
// now call removeSingletons and show the resulting verticesremoveSingletons(subgraph).vertices.collect
输出分别为:
scala> subgraph.vertices.collectres7: Array[(org.apache.spark.graphx.VertexId, String)] = Array((1,Ann), (2,Bill), (3,Charles), (4,Dianne))
scala> removeSingletons(subgraph).vertices.collectres8: Array[(org.apache.spark.graphx.VertexId, String)] = Array((1,Ann), (2,Bill), (3,Charles))
从中不难看出,该方法可以将孤立点(4)过滤掉。

0 0
原创粉丝点击