GraphX之PartitionStrategy修改

来源:互联网 发布:ssc ultimate aero数据 编辑:程序博客网 时间:2024/05/18 01:10

增加了两个分区算法,原本做法是在trait中定义loadMetisFile方法,然后在GraphImpl类的 

 override def partitionBy(partitionStrategy: PartitionStrategy): Graph[VD, ED] = {    partitionBy(partitionStrategy, edges.partitions.size)  }
中将partitionStrategy使用asInstanceof[PartitionStrategy]强制转换成PartitionStrategy对象,然后调用loadMetisFile加载数据,但是发现只能第一次可行,后面在执行:

    val newEdges = edges.withPartitionsRDD(edges.map { e =>      val part: PartitionID = partitionStrategy.getPartition(e.srcId, e.dstId, numPartitions)      (part, (e.srcId, e.dstId, e.attr))    }

中的getPartition方法时就无效了,这是因为前面的初始化是Analytics中的val graph = partitionStrategy.foldLeft(unpartitionedGraph)(_.partitionBy(_)) 在master上面,后面子任务并行执行= partitionStrategy.foldLeft(unpartitionedGraph)(_.partitionBy(_))时 ,需要将主节点上面初始化的对象partitionStrategy 反序列化,但是 metisParition的map对象无法序列化造成内容为空,子任务执行getPartition 查询报内容为空。


后来尝试改为

getPartition() {   if(map.size=0) loadMetisFIle() //
}
由于partitionStrategy 是静态对象,导致多任务线程同时执行if(map.size=0) loadMetisFIle(),出现并发问题。
最后将loadMetisFIle的代码直接写入PartitionStrategy的trait初始化代码中,这样在对象初始化的时候直接读取,无所谓是子任务节点还是主节点,代码如下:
/* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements.  See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License.  You may obtain a copy of the License at * *    http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */package org.apache.spark.graphximport org.apache.hadoop.fs.{FileSystem, Path}import org.apache.hadoop.conf.Configurationimport java.net.URIimport org.apache.hadoop.fs.FSDataInputStreamimport java.io.InputStreamReaderimport java.io.BufferedReaderimport scala.collection.mutable.HashMapimport groovy.transform.Synchronized/** * Represents the way edges are assigned to edge partitions based on their source and destination * vertex IDs. */trait PartitionStrategy extends Serializable {  /** Returns the partition number for a given edge. */  def getPartition(src: VertexId, dst: VertexId, numParts: PartitionID): PartitionID    def loadMetisFile()        val metisMap =new HashMap[Int,Int]          try {             val hdfs = FileSystem.get(URI.create("hdfs://192.168.0.100:9000/test/Web_metis_Final_Input.txt.part.6"), new Configuration)              var fp : FSDataInputStream = hdfs.open(new Path("hdfs://192.168.0.100:9000/test/Web_metis_Final_Input.txt.part.6"))              var isr : InputStreamReader = new InputStreamReader(fp)              var bReader : BufferedReader = new BufferedReader(isr)               var id:Int = 1             var line:String = bReader.readLine()               while(line!=null) {                if ( !"".equals(line)) {              metisMap.put(id, line.toInt)              id = id +1             }            line = bReader.readLine()              }            isr.close() ;            bReader.close() ;             println("metisMap size: " + metisMap.size)//           metisMap.foreach{case (e,i) => println(e,i)}     } catch  {       case ex: Exception => { // Handle missing file             ex.printStackTrace()       }      } }/** * Collection of built-in [[PartitionStrategy]] implementations. */object PartitionStrategy {  /**   * Assigns edges to partitions using a 2D partitioning of the sparse edge adjacency matrix,   * guaranteeing a `2 * sqrt(numParts)` bound on vertex replication.   *   * Suppose we have a graph with 12 vertices that we want to partition   * over 9 machines.  We can use the following sparse matrix representation:   *   * <pre>   *       __________________________________   *  v0   | P0 *     | P1       | P2    *  |   *  v1   |  ****    |  *       |          |   *  v2   |  ******* |      **  |  ****    |   *  v3   |  *****   |  *  *    |       *  |   *       ----------------------------------   *  v4   | P3 *     | P4 ***   | P5 **  * |   *  v5   |  *  *    |  *       |          |   *  v6   |       *  |      **  |  ****    |   *  v7   |  * * *   |  *  *    |       *  |   *       ----------------------------------   *  v8   | P6   *   | P7    *  | P8  *   *|   *  v9   |     *    |  *    *  |          |   *  v10  |       *  |      **  |  *  *    |   *  v11  | * <-E    |  ***     |       ** |   *       ----------------------------------   * </pre>   *   * The edge denoted by `E` connects `v11` with `v1` and is assigned to processor `P6`. To get the   * processor number we divide the matrix into `sqrt(numParts)` by `sqrt(numParts)` blocks. Notice   * that edges adjacent to `v11` can only be in the first column of blocks `(P0, P3,   * P6)` or the last   * row of blocks `(P6, P7, P8)`.  As a consequence we can guarantee that `v11` will need to be   * replicated to at most `2 * sqrt(numParts)` machines.   *   * Notice that `P0` has many edges and as a consequence this partitioning would lead to poor work   * balance.  To improve balance we first multiply each vertex id by a large prime to shuffle the   * vertex locations.   *   * When the number of partitions requested is not a perfect square we use a slightly different   * method where the last column can have a different number of rows than the others while still   * maintaining the same size per block.   */  case object EdgePartition2D extends PartitionStrategy {      def loadMetisFile() {  }    override def getPartition(src: VertexId, dst: VertexId, numParts: PartitionID): PartitionID = {      val ceilSqrtNumParts: PartitionID = math.ceil(math.sqrt(numParts)).toInt      val mixingPrime: VertexId = 1125899906842597L      if (numParts == ceilSqrtNumParts * ceilSqrtNumParts) {        // Use old method for perfect squared to ensure we get same results        val col: PartitionID = (math.abs(src * mixingPrime) % ceilSqrtNumParts).toInt        val row: PartitionID = (math.abs(dst * mixingPrime) % ceilSqrtNumParts).toInt        (col * ceilSqrtNumParts + row) % numParts      } else {        // Otherwise use new method        val cols = ceilSqrtNumParts        val rows = (numParts + cols - 1) / cols        val lastColRows = numParts - rows * (cols - 1)        val col = (math.abs(src * mixingPrime) % numParts / rows).toInt        val row = (math.abs(dst * mixingPrime) % (if (col < cols - 1) rows else lastColRows)).toInt        col * rows + row      }    }  }  /**   * Assigns edges to partitions using only the source vertex ID, colocating edges with the same   * source.   */  case object EdgePartition1D extends PartitionStrategy {      def loadMetisFile() {  }    override def getPartition(src: VertexId, dst: VertexId, numParts: PartitionID): PartitionID = {      val mixingPrime: VertexId = 1125899906842597L      (math.abs(src * mixingPrime) % numParts).toInt    }  }  /**   * Assigns edges to partitions by hashing the source and destination vertex IDs, resulting in a   * random vertex cut that colocates all same-direction edges between two vertices.   */  case object RandomVertexCut extends PartitionStrategy {      def loadMetisFile() {      }    override def getPartition(src: VertexId, dst: VertexId, numParts: PartitionID): PartitionID = {//       println("RandomVertexCut src: " + src + ", src hashcode: "+ src.hashCode + ", result: " + //          +  math.abs((src, dst).hashCode()) % numParts)      math.abs((src, dst).hashCode()) % numParts    }  }  /**   * Assigns edges to partitions by hashing the source and destination vertex IDs in a canonical   * direction, resulting in a random vertex cut that colocates all edges between two vertices,   * regardless of direction.   */  case object CanonicalRandomVertexCut extends PartitionStrategy {      def loadMetisFile() {  }    override def getPartition(src: VertexId, dst: VertexId, numParts: PartitionID): PartitionID = {      if (src < dst) {        math.abs((src, dst).hashCode()) % numParts      } else {        math.abs((dst, src).hashCode()) % numParts      }    }  }     /**  *  range Partition尝试  */   case object RangePartition extends PartitionStrategy {  def loadMetisFile() {  }    override def getPartition(src: VertexId, dst: VertexId, numParts: PartitionID): PartitionID = {      var max = 6       var keyRange = max / numParts;      var part = (src.hashCode() % max) / keyRange;//      println("liuqiang src: " + src + ", src hashcode: "+ src.hashCode + ", partition: " + part + ", result: " + //          + Math.max(0, Math.min(numParts - 1, part.toInt)))      return Math.max(0, Math.min(numParts - 1, part.toInt));    }  }     case object MetisPartition extends PartitionStrategy {          /**      *  这里由于不知道如何引入SparkContext的引用,而Spark中只允许有一个SparkContex存在,不能自己新建,sc.textFile("hdfs://XXX")无法用      */      def loadMetisFile() {     }//      def loadMetisFile() {//            try {//                 val hdfs = FileSystem.get(URI.create("hdfs://192.168.0.100:9000/test/Web_metis_Final_Input.txt.part.6"), new Configuration)  //                var fp : FSDataInputStream = hdfs.open(new Path("hdfs://192.168.0.100:9000/test/Web_metis_Final_Input.txt.part.6"))  //                var isr : InputStreamReader = new InputStreamReader(fp) //                 var bReader : BufferedReader = new BufferedReader(isr)   //                var id:Int = 1 //                var line:String = bReader.readLine()  //                 while(line!=null) {//                    if ( !"".equals(line)) {//                  metisMap.put(id, line.toInt)//                  id = id +1 //                }//                line = bReader.readLine() //                 }//                isr.close() ;//                bReader.close() ;//                 println("metisMap size: " + metisMap.size)////           metisMap.foreach{case (e,i) => println(e,i)} //    } catch  {//       case ex: Exception => { // Handle missing file //            ex.printStackTrace()//       }  //    } //   }         override def getPartition(src: VertexId, dst: VertexId, numParts: PartitionID): PartitionID = {//      if(metisMap.size == 0) {//        loadMetisFile()//        println("loadMetisFile")//      }      var s = metisMap.get(src.hashCode().toInt)      if(!s.isEmpty)      return s.getOrElse(0)      else {        println("src: " + src +", hascode: " + src.hashCode + ", partition: " + metisMap.get(src.hashCode()) + ", size: " +            metisMap.size )        throw new IllegalArgumentException("Metis can't find partition!")      }    }  }  /** Returns the PartitionStrategy with the specified name. */  def fromString(s: String): PartitionStrategy = s match {    case "RandomVertexCut" => RandomVertexCut    case "EdgePartition1D" => EdgePartition1D    case "EdgePartition2D" => EdgePartition2D    case "CanonicalRandomVertexCut" => CanonicalRandomVertexCut    case "RangePartition" => RangePartition    case "MetisPartition" => MetisPartition    case _ => throw new IllegalArgumentException("Invalid PartitionStrategy: " + s)  }}


参考博客: http://www.cnblogs.com/HeQiangJava/p/6711527.html