一些算法的MapReduce实现——图的BFS遍历
来源:互联网 发布:不是c语言关键字 编辑:程序博客网 时间:2024/06/11 10:51
Breadth-first search 简介
BFS算法伪代码如下[1]
BFS(G, s) for each vertex u ∈ V [G] - {s} do color[u] ← WHITE d[u] ← ∞ π[u] ← NIL //除了源顶点s之外,第1-4行置每个顶点为白色,置每个顶点u的d[u]为无穷大, //置每个顶点的父母为NIL。 color[s] ← GRAY //第8行,将源顶点s置为灰色,这是因为在过程开始时,源顶点已被发现。 d[s] ← 0 //将d[s]初始化为0。 π[s] ← NIL //将源顶点的父顶点置为NIL。 Q ← Ø ENQUEUE(Q, s) //入队 //第12、13行,初始化队列Q,使其仅含源顶点s。 while Q ≠ Ø do u ← DEQUEUE(Q) //出队 //第16行,确定队列头部Q头部的灰色顶点u,并将其从Q中去掉。 for each v ∈ Adj[u] //for循环考察u的邻接表中的每个顶点v do if color[v] = WHITE then color[v] ← GRAY //置为灰色 d[v] ← d[u] + 1 //距离被置为d[u]+1 π[v] ← u //u记为该顶点的父母 ENQUEUE(Q, v) //插入队列中 color[u] ← BLACK //u 置为黑色
具体流程图如下:
MapReduce实现BFS
假设有一个无向图,如下:
每一条边的权重设为1
InputFormat
source<tab>adjacency_list|distance_from_the_source|color|parentNode
根据上图,输入数据为:
1<tab>2,3|0|GRAY|source2<tab>1,3,4,5|Integer.MAX_VALUE|WHITE|null3<tab>1,4,2|Integer.MAX_VALUE|WHITE|null4<tab>2,3|Integer.MAX_VALUE|WHITE|null5<tab>2|Integer.MAX_VALUE|WHITE|null
利用上面的数据,指定3个reducer对其进行处理,从开始的算法简介可以看出,不是一步MapReduce过程就可以把整个graph遍历一遍的,这需要迭代,也就需要多次运行MapReduce过程,直到所有的节点都被访问过,也就是节点的颜色都被标记为黑色,就退出。
MapReduce给我们提供了一些方法可以查看每一步MRJob运行过后的中间结果。
Intermediate output 1:
Reducer 1: (part-r-00000)
2<tab>1,3,4,5,|1|GRAY|1
5<tab>2,|Integer.MAX_VALUE|WHITE|null
Reducer 2: (part-r-00001)
3<tab>1,4,2,|1|GRAY|1
Reducer 3: (part-r-00002)
1<tab>2,3,|0|BLACK|source
4<tab>2,3,|Integer.MAX_VALUE|WHITE|null
2<tab>1,3,4,5,|1|GRAY|1
5<tab>2,|Integer.MAX_VALUE|WHITE|null
Reducer 2: (part-r-00001)
3<tab>1,4,2,|1|GRAY|1
Reducer 3: (part-r-00002)
1<tab>2,3,|0|BLACK|source
4<tab>2,3,|Integer.MAX_VALUE|WHITE|null
Intermediate output 2:
Reducer 1: (part-r-00000)
2<tab>1,3,4,5,|1|BLACK|1
5<tab>2,|2|GRAY|2
Reducer 2: (part-r-00001)
3<tab>1,4,2,|1|BLACK|1
Reducer 3: (part-r-00002)
1<tab>2,3,|0|BLACK|source
4<tab>2,3,|2|GRAY|2
2<tab>1,3,4,5,|1|BLACK|1
5<tab>2,|2|GRAY|2
Reducer 2: (part-r-00001)
3<tab>1,4,2,|1|BLACK|1
Reducer 3: (part-r-00002)
1<tab>2,3,|0|BLACK|source
4<tab>2,3,|2|GRAY|2
Final output:
Reducer 1: (part-r-00000)
2<tab>1,3,4,5,|1|BLACK|1
5<tab>2,|2|BLACK|2
Reducer 2: (part-r-00001)
3<tab>1,4,2,|1|BLACK|1
Reducer 3: (part-r-00002)
1<tab>2,3,|0|BLACK|source
4<tab>2,3,|2|BLACK|2
2<tab>1,3,4,5,|1|BLACK|1
5<tab>2,|2|BLACK|2
Reducer 2: (part-r-00001)
3<tab>1,4,2,|1|BLACK|1
Reducer 3: (part-r-00002)
1<tab>2,3,|0|BLACK|source
4<tab>2,3,|2|BLACK|2
Counters:
MapReduce提供了Counters来收集统计job的信息,想quality-control, 应用层的信息收集,问题诊断。Hadoop自己保留了一些Counters,用于监控收集每个job的信息。具体Counter介绍参见 [6]
我们也可以自己定义Counter,来收集我们自己的信息。
static enum MoreIterations { numberOfIterations}
上面我们定义一个枚举变量,numberOfIterations作为Counter的name, Counter在MRJobs中作为全局变量,只要还有节点没有被访问,那么Counter的值就加1,。
Node.java
该class 存储了图节点的信息,node id,相邻节点,以及当前被访问的状态,距离等等
package com.joey.mapred.graph.utils;import java.util.ArrayList;import java.util.List;import org.apache.hadoop.io.Text;public class Node { /** * three possible colors a node can have to keep track * of the visiting status of the nodes during graph search */ public static enum Color { WHITE, //unvisited GRAY, // visited, unprocess BLACK // processed }; private String id; // id of the node private int distance; // distance of the node from source node // list of the edges private List<String> edges = new ArrayList<String>(); private Color color = Color.WHITE; // parent/ predecessor of the node // The parent of the source is marked "source" to leave it unchanged private String parent; public Node() { distance = Integer.MAX_VALUE; color = Color.WHITE; parent = null; } public Node(String nodeInfo) { String[] inputVal = nodeInfo.split("\t"); String key = ""; String val = ""; try { key = inputVal[0]; // node id // the list of adjacent nodes, distance, color, parent val = inputVal[1]; } catch (Exception e) { e.printStackTrace(); System.exit(1); } String[] tokens = val.split("\\|"); this.id = key; for (String s : tokens[0].split(",")) { if (s.length() > 0) edges.add(s); } if (tokens[1].equalsIgnoreCase("Integer.MAX_VALUE")) { this.distance = Integer.MAX_VALUE; } else { this.distance = Integer.parseInt(tokens[1]); } this.color = Color.valueOf(tokens[2]); this.parent = tokens[3]; } public Text getNodeInfo() { StringBuilder sb = new StringBuilder(); for (String v : edges) { sb.append(v).append(","); } sb.append("|"); if (this.distance < Integer.MAX_VALUE) { sb.append(this.distance).append("|"); } else { sb.append("Integer.MAX_VALUE").append("|"); } sb.append(color.toString()).append("|"); sb.append(getParent()); return new Text(sb.toString()); } public String getId() { return id; } public void setId(String id) { this.id = id; } public int getDistance() { return distance; } public void setDistance(int distance) { this.distance = distance; } public List<String> getEdges() { return edges; } public void setEdges(List<String> edges) { this.edges = edges; } public Color getColor() { return color; } public void setColor(Color color) { this.color = color; } public String getParent() { return parent; } public void setParent(String parent) { this.parent = parent; } }
TraverseGraph.java
该class提供了Mapper和Reducer的基类,Mapper主要设置开始时的sourceNode的所有邻节点的访问状态,迭代中当前节点及邻节点的状态EMIT给reducer
Reducer则根据Mapper提交的信息,选择其中的最终状态(因为每个节点的邻接节点有可能重复,那么由Mapper提交的状态可能多样),最后把最终状态的节点返回
package com.joey.mapred.graph;import java.io.IOException;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import com.joey.mapred.graph.utils.Node;import com.joey.mapred.graph.utils.Node.Color;public class TraverseGraph {/** * * Description : Mapper class that implements the map part of Breadth-first * search algorithm. The nodes colored WHITE or BLACK are emitted as such. For * each node that is colored GRAY, a new node is emitted with the distance * incremented by one and the color set to GRAY. The original GRAY colored * node is set to BLACK color and it is also emitted. * * Input format <key, value> : <line offset in the input file (automatically * assigned), * nodeID<tab>list_of_adjacent_nodes|distance_from_the_source|color|parent> * * Output format <key, value> : <nodeId, (updated) * list_of_adjacent_nodes|distance_from_the_source|color|parent node> * * Reference : * http://www.johnandcailin.com/blog/cailin/breadth-first-graph-search * -using-iterative-map-reduce-algorithm * */// the type parameters are the input keys type, the input values type, the// output keys type, the output values typepublic static class TraverseMapper extends Mapper<Object, Text, Text, Text> {protected void map(Object key, Text value, Context context, Node inNode)throws IOException, InterruptedException {if (inNode.getColor() == Color.GRAY) {for (String neighbor : inNode.getEdges()) {Node adjacentNode = new Node();adjacentNode.setId(neighbor);adjacentNode.setDistance(inNode.getDistance() + 1);adjacentNode.setColor(Node.Color.GRAY);adjacentNode.setParent(inNode.getId());context.write(new Text(adjacentNode.getId()),adjacentNode.getNodeInfo());}// this node is done, color it blackinNode.setColor(Node.Color.BLACK);}context.write(new Text(inNode.getId()), inNode.getNodeInfo());}}/** * * Description : Reducer class that implements the reduce part of parallel * Breadth-first search algorithm. Make a new node which combines all * information for this single node id that is for each key. The new node * should have the full list of edges, the minimum distance, the darkest * Color, and the parent/predecessor node * * Input format <key,value> : <nodeId, * list_of_adjacent_nodes|distance_from_the_source|color|parent_node> * * Output format <key,value> : <nodeId, * (updated)list_of_adjacent_nodes|distance_from_the_source|color|parent_node> * */public static class TraverseReducer extends Reducer<Text, Text, Text, Text> {protected Node reduce(Text key, Iterable<Text> values, Context context,Node outNode) throws IOException, InterruptedException {// set the node id as the keyoutNode.setId(key.toString());for (Text value : values) {Node inNode = new Node(key.toString() + "\t" + value.toString());if (inNode.getEdges().size() > 0) {outNode.setEdges(inNode.getEdges());}if (inNode.getDistance() < outNode.getDistance()) {outNode.setDistance(inNode.getDistance());outNode.setParent(inNode.getParent());}if (inNode.getColor().ordinal() > outNode.getColor().ordinal()) {outNode.setColor(inNode.getColor());}}context.write(key, new Text(outNode.getNodeInfo()));return outNode;}}}
Driver.java
该class相对来说就简单点,不用过多的解释了
package com.joey.mapred.graph;import java.io.IOException;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Counters;import org.apache.hadoop.mapreduce.InputFormat;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.OutputFormat;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;import org.apache.hadoop.util.ToolRunner;import com.joey.mapred.BaseDriver;import com.joey.mapred.graph.TraverseGraph.TraverseMapper;import com.joey.mapred.graph.TraverseGraph.TraverseReducer;import com.joey.mapred.graph.utils.Node;/** * Description : MapReduce program to solve the single-source shortest path * problem using parallel breadth-first search. This program also illustrates * how to perform iterative map-reduce. * * The single source shortest path is implemented by using Breadth-first search * concept. * * Reference : * http://www.johnandcailin.com/blog/cailin/breadth-first-graph-search * -using-iterative-map-reduce-algorithm * */public class BFSearchDriver extends BaseDriver { static class SearchMapperSSSP extends TraverseMapper { public void map(Object key, Text value, Context context) throws IOException, InterruptedException { Node inNode = new Node(value.toString()); // calls the map method of the super class SearchMapper super.map(key, value, context, inNode); } } static class SearchReducerSSSP extends TraverseReducer { // the parameters are the types of the input key, the values associated with // the key and the Context object through which the Reducer communicates // with the Hadoop framework public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException { // create a new out node and set its values Node outNode = new Node(); // call the reduce method of SearchReducer class outNode = super.reduce(key, values, context, outNode); // if the color of the node is gray, the execution has to continue, this // is done by incrementing the counter if (outNode.getColor() == Node.Color.GRAY) context.getCounter(MoreIterations.numberOfIterations).increment(1L); } } public int run(String[] args) throws Exception { int iterationCount = 0; // counter to set the ordinal number of the // intermediate outputs Job job; long terminationValue = 1; // while there are more gray nodes to process while (terminationValue > 0) { job = getJobConf(args); // get the job configuration String input, output; // setting the input file and output file for each iteration // during the first time the user-specified file will be the input whereas // for the subsequent iterations // the output of the previous iteration will be the input if (iterationCount == 0) { // for the first iteration the input will be the first input argument input = args[0]; } else { // for the remaining iterations, the input will be the output of the // previous iteration input = args[1] + iterationCount; } output = args[1] + (iterationCount + 1); // setting the output file FileInputFormat.setInputPaths(job, new Path(input)); FileOutputFormat.setOutputPath(job, new Path(output)); job.waitForCompletion(true); Counters jobCntrs = job.getCounters(); terminationValue = jobCntrs .findCounter(MoreIterations.numberOfIterations).getValue(); iterationCount++; } return 0; } static enum MoreIterations { numberOfIterations } public static void main(String[] args) throws Exception { int res = ToolRunner.run(new Configuration(), new BFSearchDriver(), args); if(args.length != 2){ System.err.println("Usage: <in> <output name> "); } System.exit(res); } @Override protected Job getJobConf(String[] args) throws Exception { JobInfo jobInfo = new JobInfo() { @Override public Class<?> getJarByClass() { return BFSearchDriver.class; } @Override public Class<? extends Mapper> getMapperClass() { return SearchMapperSSSP.class; } @Override public Class<? extends Reducer> getCombinerClass() { return null; } @Override public Class<? extends Reducer> getReducerClass() { return SearchReducerSSSP.class; } @Override public Class<?> getOutputKeyClass() { return Text.class; } @Override public Class<?> getOutputValueClass() { return Text.class; } @Override public Class<? extends InputFormat> getInputFormatClass() { return TextInputFormat.class; } @Override public Class<? extends OutputFormat> getOutputFormatClass() { return TextOutputFormat.class; } }; return setupJob("BFSSearch", jobInfo); }}
BaseDriver.java
package com.joey.mapred;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.conf.Configured;import org.apache.hadoop.mapreduce.InputFormat;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.OutputFormat;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.util.Tool;public abstract class BaseDriver extends Configured implements Tool { // method to set the configuration for the job and // the mapper and the reducer classes protected Job setupJob(String jobName, JobInfo jobInfo) throws Exception { Configuration conf = getConf(); if (conf == null) { throw new RuntimeException("Configuration should not be null"); } Job job = new Job(conf, jobName); // set the several classes job.setJarByClass(jobInfo.getJarByClass()); // set the mapper class job.setMapperClass(jobInfo.getMapperClass()); // the combiner class is optional, so set it only if it // is required by the program if (jobInfo.getCombinerClass() != null) job.setCombinerClass(jobInfo.getCombinerClass()); // set the reducer class job.setReducerClass(jobInfo.getReducerClass()); // the number of reducers is set to 3, this can be // altered according to the program's requirements job.setNumReduceTasks(3); // set the type of the output key and value for the // Map & Reduce functions job.setOutputKeyClass(jobInfo.getOutputKeyClass()); job.setOutputValueClass(jobInfo.getOutputValueClass()); if (jobInfo.getInputFormatClass() != null) job.setInputFormatClass(jobInfo.getInputFormatClass()); if (jobInfo.getOutputFormatClass() != null) job.setOutputFormatClass(jobInfo.getOutputFormatClass()); return job; } protected abstract Job getJobConf(String[] args) throws Exception; protected abstract class JobInfo { public abstract Class<?> getJarByClass(); public abstract Class<? extends Mapper> getMapperClass(); public abstract Class<? extends Reducer> getCombinerClass(); public abstract Class<? extends Reducer> getReducerClass(); public abstract Class<?> getOutputKeyClass(); public abstract Class<?> getOutputValueClass(); public abstract Class<? extends InputFormat> getInputFormatClass(); public abstract Class<? extends OutputFormat> getOutputFormatClass(); }}
有了上面的遍历,我们还可以利用MapReduce 做图的拓扑排序,Single-pair shortest-path,Single-source shortest-paths(上面的实现过程其实就无向图的Single-source shortest paths,只不过简单化了), all-pairs shortest-paths
Reference
1、http://blog.csdn.net/v_JULY_v/article/details/6111353
2、http://en.wikipedia.org/wiki/Breadth-first_search
3、http://hadooptutorial.wikispaces.com/Iterative+MapReduce+and+Counters
4、http://www.ics.uci.edu/~eppstein/161/960215.html
5、http://oucsace.cs.ohiou.edu/~razvan/courses/cs4040/lecture20.pdf
6、http://langyu.iteye.com/blog/1171091
3 0
- 一些算法的MapReduce实现——图的BFS遍历
- 图的遍历算法实现DFS,BFS
- 一些算法的MapReduce实现——好友推荐
- 一些算法的MapReduce实现——最小生成树
- 一些算法的MapReduce实现——MapReduce Job的单元测试实例
- 第十二周项目3-图遍历算法的实现BFS
- 数据结构——图的BFS的邻接矩阵遍历(BFS是由队列实现的)
- 一些算法的MapReduce实现——矩阵相乘一步实现
- 一些算法的MapReduce实现——矩阵-向量乘法实现
- 一些算法的MapReduce实现——倒排索引实现
- 广度优先遍历BFS的图算法
- 图的遍历:DFS和BFS算法
- 图的遍历算法(DFS和BFS)
- 一些算法的MapReduce实现——有向图求边的交集
- 第十二周项目3--图遍历算法实现--实现广度优先遍历——BFS
- 一些算法的MapReduce实现——矩阵分块乘法计算(1)
- 一些算法的MapReduce实现——矩阵分块乘法计算(2)
- 一些算法的MapReduce实现——1 TB数据排序源码分析
- Android中的JSON详细总结
- JBoss 系列八十四: 使用 JMS session 时的一个误区
- 如何面试有多年工作经验的高级程序员
- 学习笔记----图论学习中
- R Data Management(数据管理)
- 一些算法的MapReduce实现——图的BFS遍历
- 读博客”求一个整数中二进制数1的个数“两种方法分析
- lnmp一键安装包(最后开启80防火墙)
- 在OpenErp的配置文件中, 为数据库密码加密
- VSFTP在LINUX中的安装、配置
- 简单的DOS命令
- ptsname() -- 获得从伪终端名(slave pseudo-terminal)
- VC-RAS拨号源码详解(2009-11-17 08:25:24)转载▼标签: ifidc错误代码it
- 防止因意外掉线断网引起的ssh终端程序异常20140110