改进的单源最短路径（Graphlab）

来源：互联网发布：禁止安装某些软件编辑：程序博客网时间：2024/05/23 22:33

在Graphlab中实现的单元最短路径是用基于信号的，现将其更改为不采用信号机制的普通GAS模型，首先分析基于消息机制的特征，当单源顶点的数据传入，engine首先激活源点，源点将消息中的最短路径更改为0，调用init(),apply(),scatter()接口;更改顶点的数据并激活邻居。

更改单源最短路径算法

GraphLab中实现的单源最短路径算法是基于消息传递的，现在将起更改为不基于消息模式的GAS模式．

消息模式的机制：

接口说明：

classvertex_data：顶点类，包含的数据为距离，运行结束为源点到该点的最短距离．

Classedge_data:边类，在单源路径中边的数据结构可以为空，程序中还是将其数据初始化为零，表示邻接顶点的距离为1．

typedefgraphlab::distributed_graph<vertex_data, edge_data>graph_type　　定义图的逻辑结构．

get_other_vertex(edge,vertex)获取边edge上除了vertex的另外一个顶点．

Classmin_distance_type消息类，主要是有个重载了＋的操作，使得消息相加的结果为两消息中的距离的较小者．

classsssp : public graphlab::ivertex_program<graph_type,

graphlab::empty, min_distance_type>,

　publicgraphlab::IS_POD_TYPE　

定义顶点运行程序vertex_program，每一个顶点都要运行顶点程序，成员变量为：min_dist表示最小距离，changed布尔变量运行该顶点程序时是否更改了顶点的数据（距离）．

init()收集当前的消息中的最小距离，消息本身就具有相加的功能（重载后变成最小），所以只需要一个简单的赋值操作即可获得当前这个顶点运行程序的最短距离．

gather_edges()不收集边，则表明该顶点程序不运行gather　phase.

gather()不做，略过此步

apply()如果当前收集的最小距离加1小于该顶点的成员变量（表示距离），则证明有更短的路径可以从源点到该顶点，所以修改该顶点数据为最小距离加1，并置changed为真；否则，changed为假．

scatter_edges()若changed为真，表明该顶点有update，则要收集它的邻居（无向图返回ＡＬＬ＿ＥＤＧＥＳ，，若为有向图则收集以该顶点为源顶点的顶点ＯＵＴ＿ＥＤＧＥＳ）；否则返回NO_EDGES．

scatter()调用get_other_vertex()获得该顶点的邻居顶点other，distance_typenewd = vertex.data().dist +edge.data().dist表示该顶点vertex的距离加1，若newd小于other的数据，则激活other,context.signal(other,newd).

分析消息机制的工作原理：

注意，消息本身具有一个找到当前相关消息最小值的功能，重载opertaor+实现的，因此其它可能的顶点在发送消息激活一个顶点的时候，这个顶点相关的消息汇总获得了一个当前源点到达该点的最小可达距离．

1　首先引擎激活源点source，engine.signal(sources[i]),min_distance_type(0))通过消息机制将其距离初始化为零；

2 init阶段，将消息收集来的最小距离记下min_distance，表示源点当前能达该顶点的可能最小距离（后面可能有更短的路径可达）

3　gather_edges阶段，收集被激活的顶点的邻居；若没有邻居则跳过vertex_program

4　apply阶段，若源点有到该顶点可达的更小距离，则更新当前顶点的数据，置changed为真

5　scatter_edges阶段，收集顶点的邻居

6　scatter阶段，若changed为假，跳过此步；否则，调用get_other_vertex()获得该顶点的邻居顶点other，distance_typenewd = vertex.data().dist +edge.data().dist表示该顶点vertex的距离加1，若newd小于other的数据，则激活

other,context.signal(other,newd).

注：被激活的顶点都要执行vertex_program,这样类似路由的洪泛法，将源点的可达距离通过消息传播给与源点联通的顶点，并不断update,从而获得单源最短路径.

从实现机制上看基于消息的SSSP算法类似与IAS模型，不是典型的GAS模型．将其修改为GAS模型即可．

更改消息机制为GAS模型：

主要更改的地方：

激活源点将其距离初始化为零的代码engine.signal(sources[i])，初始化为零交给apply阶段；

通过邻居发消息获得当前可达的最小距离的代码（init阶段）；

修改vertex_program将消息更改为空

classsssp:public

graphlab::ivertex_program<graph_type,

min_distance_type>,

publicgraphlab:: IS_POD_TYPE

运行机理：

1　首先引擎激活源点source，engine.signal(sources[i])

2 gather_edges阶段，收集被激活的顶点的邻居；

若没有邻居则跳过vertex_program

3 gather阶段，收集邻居到该顶点的数据＋１的最小值，调用std::min()函数即可，注意gather阶段是做了很多次，次数与顶点的邻居数相关．

4　apply阶段（每次vertex_program只做一次），若再gather阶段收集到的距离都超过了顶点默认的最大值，则说明运行本次vertex_program的为源点，将源点的距离初始化为0；

否则，若源点有到该顶点可达的更小距离，则更新当前顶点的数据，置changed为真

否则，changed为假

5　scatter_edges阶段，收集顶点的邻居

6　scatter阶段，若changed为假，跳过此步；

否则，调用get_other_vertex()获得该顶点的邻居顶点other，distance_typenewd = vertex.data().dist +edge.data().dist表示该顶点vertex的距离加1，若newd小于other的数据，则激活邻居other,context.signal(other).

初始设计的失败之处：

apply判断源点的时候，产生了数值溢出，原因是程序初始化顶点数据默认为最大值,gather后这个最大值加一后溢出，导致程序错误；解决方法是修改顶点生成程序，将顶点数据默认为一个当前图不可能达到的最大值，我的数据集只需要设置为1024，这样在apply阶段就可以设置判断为距离大于1000则可判断为源点．

现在附上代码，并向graphlab的创使者致以敬意。

/**   * Copyright (c) 2009 Carnegie Mellon University.  *     All rights reserved. * *  Licensed under the Apache License, Version 2.0 (the "License"); *  you may not use this file except in compliance with the License. *  You may obtain a copy of the License at * *      http://www.apache.org/licenses/LICENSE-2.0 * *  Unless required by applicable law or agreed to in writing, *  software distributed under the License is distributed on an "AS *  IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either *  express or implied.  See the License for the specific language *  governing permissions and limitations under the License. * * For more about this software visit: * *      http://www.graphlab.ml.cmu.edu * */#include <vector>#include <string>#include <fstream>#include <boost/spirit/include/qi.hpp>#include <boost/spirit/include/phoenix_core.hpp>#include <boost/spirit/include/phoenix_operator.hpp>#include <boost/spirit/include/phoenix_stl.hpp>#include <boost/unordered_set.hpp>#include <graphlab.hpp>#include <graphlab/util/stl_util.hpp>#include <graphlab/macros_def.hpp>/** * \brief The type used to measure distances in the graph. *///typedef float distance_type;typedef int distance_type;/** * \brief The current distance of the vertex. */struct vertex_data : graphlab::IS_POD_TYPE {  distance_type dist;  //vertex_data(distance_type dist = std::numeric_limits<distance_type>::max()) :  vertex_data(distance_type dist=1024)://赋予顶点一个不可能的最大值，不要使用std::numeric_limits<distance_type>::max(),会溢出    dist(dist) { }}; // end of vertex data/** * \brief The distance associated with the edge. */struct edge_data : graphlab::IS_POD_TYPE {  distance_type dist;  edge_data(distance_type dist = 1) : dist(dist) { }}; // end of edge data/** * \brief The graph type encodes the distances between vertices and * edges */typedef graphlab::distributed_graph<vertex_data, edge_data> graph_type;/** * \brief The graph loader is used by graph.load to parse lines of the * text data file. * * We use the relativley fast boost::spirit parser to parse each line. */bool graph_loader(graph_type& graph, const std::string& fname,                  const std::string& line) {  ASSERT_FALSE(line.empty());  namespace qi = boost::spirit::qi;  namespace ascii = boost::spirit::ascii;  namespace phoenix = boost::phoenix;  graphlab::vertex_id_type source_id(-1), target_id(-1);  float weight = 1;  const bool success = qi::phrase_parse    (line.begin(), line.end(),            //  Begin grammar     (      qi::ulong_[phoenix::ref(source_id) = qi::_1] >> -qi::char_(',') >>      qi::ulong_[phoenix::ref(target_id) = qi::_1] >>       -(-qi::char_(',') >> qi::float_[phoenix::ref(weight) = qi::_1])      )     ,     //  End grammar     ascii::space);   if(!success) return false;  if(source_id == target_id) {    logstream(LOG_ERROR)       << "Self edge to vertex " << source_id << " is not permitted." << std::endl;  }  // Create an edge and add it to the graph  graph.add_edge(source_id, target_id, weight);  return true; // successful load}; // end of graph loader/** * \brief Get the other vertex in the edge. */inline graph_type::vertex_typeget_other_vertex(const graph_type::edge_type& edge,                 const graph_type::vertex_type& vertex) {  return vertex.id() == edge.source().id()? edge.target() : edge.source();}/** * \brief Use directed or undireced edges. */bool DIRECTED_SSSP = false;/** * \brief This class is used as the gather type. */struct min_distance_type : graphlab::IS_POD_TYPE {  distance_type dist;  min_distance_type(distance_type dist =                     std::numeric_limits<distance_type>::max()) : dist(dist) { }  min_distance_type& operator+=(const min_distance_type& other) {    dist = std::min(dist, other.dist);    return *this;  }};/* * \brief The single source shortest path vertex program. *//*class sssp :  public graphlab::ivertex_program<graph_type,                                    graphlab::empty,                                   min_distance_type>,//notes  public graphlab::IS_POD_TYPE {*/class sssp:public graphlab::ivertex_program<graph_type,min_distance_type>,public graphlab::IS_POD_TYPE{  distance_type min_dist;  bool changed;public:/*  void init(icontext_type& context, const vertex_type& vertex,            const min_distance_type& msg) {    min_dist = msg.dist;  } */  /**   * \brief We use the messaging model to compute the SSSP update   *//*  edge_dir_type gather_edges(icontext_type& context,                              const vertex_type& vertex) const {     return graphlab::NO_EDGES;        //return graphlab::ALL_EDGES;  }; // end of gather_edges */  //  * \brief Collect the distance to the neighbor  //  */  // min_distance_type gather(icontext_type& context, const vertex_type& vertex,   //                          edge_type& edge) const {  //   return min_distance_type(edge.data() +   //                            get_other_vertex(edge, vertex).data());  // } // end of gather function   edge_dir_type gather_edges(icontext_type& context,const vertex_type& vertex) const   {if(!DIRECTED_SSSP)return graphlab::ALL_EDGES;elsereturn graphlab::IN_EDGES;   }   min_distance_type  gather(icontext_type& context,const vertex_type& vertex,edge_type& edge) const   {//return std::min(vertex.data().dist,get_other_vertex(edge,vertex).data().dist+edge.data().dist);    return std::min(vertex.data().dist,get_other_vertex(edge,vertex).data().dist+1);   }  /**   * \brief If the distance is smaller then update   *//*  void apply(icontext_type& context, vertex_type& vertex,             const graphlab::empty& empty) {    changed = false;    if(vertex.data().dist > min_dist)       changed = true;      vertex.data().dist = min_dist;    }  }*/  ////  /*  void apply(icontext_type& context,vertex_type& vertex,const min_distance_type& total)  {  changed=false;if(total.dist>1000)//只要能预测超过最大值{changed=true;vertex.data().dist=0;}else if(vertex.data().dist>total.dist){changed=true;  vertex.data().dist=total.dist; }  }  /**   * \brief Determine if SSSP should run on all edges or just in edges   *//*  edge_dir_type scatter_edges(icontext_type& context,                              const vertex_type& vertex) const {    if(changed)      return DIRECTED_SSSP? graphlab::OUT_EDGES : graphlab::ALL_EDGES;     else return graphlab::NO_EDGES;  }; // end of scatter_edges*/  /**   * \brief The scatter function just signal adjacent pages    *//*    void scatter(icontext_type& context, const vertex_type& vertex,               edge_type& edge) const {    const vertex_type other = get_other_vertex(edge, vertex);    distance_type newd = vertex.data().dist + edge.data().dist;    if (other.data().dist > newd) {      const min_distance_type msg(newd);      context.signal(other, newd);    }  } // end of scatter}; // end of shortest path vertex program*/edge_dir_type scatter_edges(icontext_type& context,const vertex_type& vertex) const{if(changed){if(DIRECTED_SSSP)return graphlab::OUT_EDGES;elsereturn graphlab::ALL_EDGES;}elsereturn graphlab::NO_EDGES;}void scatter(icontext_type& context,const vertex_type& vertex,edge_type& edge) const{vertex_type other=get_other_vertex(edge,vertex);if(vertex.data().dist+1<other.data().dist)context.signal(other);}};/** * \brief We want to save the final graph so we define a write which will be * used in graph.save("path/prefix", pagerank_writer()) to save the graph. */struct shortest_path_writer {  std::string save_vertex(const graph_type::vertex_type& vtx) {std::stringstream strm;    strm << vtx.id() << "\t" << vtx.data().dist << "\n";    return strm.str();  }  std::string save_edge(graph_type::edge_type e) { return ""; }}; // end of shortest_path_writerstruct max_deg_vertex_reducer: public graphlab::IS_POD_TYPE {  size_t degree;  graphlab::vertex_id_type vid;  max_deg_vertex_reducer& operator+=(const max_deg_vertex_reducer& other) {    if (degree < other.degree) {      (*this) = other;    }    return (*this);  }};max_deg_vertex_reducer find_max_deg_vertex(const graph_type::vertex_type vtx) {  max_deg_vertex_reducer red;  red.degree = vtx.num_in_edges() + vtx.num_out_edges();  red.vid = vtx.id();  return red;}int main(int argc, char** argv) {  // Initialize control plain using mpi  graphlab::mpi_tools::init(argc, argv);  graphlab::distributed_control dc;  global_logger().set_log_level(LOG_INFO);  // Parse command line options -----------------------------------------------  graphlab::command_line_options     clopts("Single Source Shortest Path Algorithm.");  std::string graph_dir;  std::string format = "tsv";  std::string exec_type = "synchronous";  size_t powerlaw = 0;  std::vector<graphlab::vertex_id_type> sources;  bool max_degree_source = false;  clopts.attach_option("graph", graph_dir,                       "The graph file.  If none is provided "                       "then a toy graph will be created");  clopts.add_positional("graph");  clopts.attach_option("source", sources,                       "The source vertices");  clopts.attach_option("max_degree_source", max_degree_source,                       "Add the vertex with maximum degree as a source");  clopts.add_positional("source");  clopts.attach_option("directed", DIRECTED_SSSP,                       "Treat edges as directed.");  clopts.attach_option("engine", exec_type,                        "The engine type synchronous or asynchronous");     clopts.attach_option("powerlaw", powerlaw,                       "Generate a synthetic powerlaw out-degree graph. ");  std::string saveprefix;  clopts.attach_option("saveprefix", saveprefix,                       "If set, will save the resultant pagerank to a "                       "sequence of files with prefix saveprefix");  if(!clopts.parse(argc, argv)) {    dc.cout() << "Error in parsing command line arguments." << std::endl;    return EXIT_FAILURE;  }  // Build the graph ----------------------------------------------------------  graph_type graph(dc, clopts);  if(powerlaw > 0) { // make a synthetic graph    dc.cout() << "Loading synthetic Powerlaw graph." << std::endl;    graph.load_synthetic_powerlaw(powerlaw, false, 2, 100000000);  } else if (graph_dir.length() > 0) { // Load the graph from a file    dc.cout() << "Loading graph in format: "<< format << std::endl;    graph.load(graph_dir, graph_loader);  } else {    dc.cout() << "graph or powerlaw option must be specified" << std::endl;    clopts.print_description();    return EXIT_FAILURE;  }  // must call finalize before querying the graph  graph.finalize();  dc.cout() << "#vertices:  " << graph.num_vertices() << std::endl            << "#edges:     " << graph.num_edges() << std::endl;  if(sources.empty()) {    if (max_degree_source == false) {      dc.cout()        << "No source vertex provided. Adding vertex 0 as source"         << std::endl;      sources.push_back(0);    }  }  if (max_degree_source) {    max_deg_vertex_reducer v = graph.map_reduce_vertices<max_deg_vertex_reducer>(find_max_deg_vertex);    dc.cout()      << "No source vertex provided.  Using highest degree vertex " << v.vid << " as source."      << std::endl;    sources.push_back(v.vid);  }  // Running The Engine -------------------------------------------------------  graphlab::omni_engine<sssp> engine(dc, graph, exec_type, clopts);  // Signal all the vertices in the source set  for(size_t i = 0; i < sources.size(); ++i) {    engine.signal(sources[i]);//, min_distance_type(0));  }  engine.start();  const float runtime = engine.elapsed_seconds();  dc.cout() << "Finished Running engine in " << runtime            << " seconds." << std::endl;  // Save the final graph -----------------------------------------------------  if (saveprefix != "") {    graph.save(saveprefix, shortest_path_writer(),               false,    // do not gzip               true,     // save vertices               false,1);   // do not save edges  }  // Tear-down communication layer and quit -----------------------------------  graphlab::mpi_tools::finalize();  return EXIT_SUCCESS;} // End of main// We render this entire program in the documentation