Spark pagerank

来源:互联网 发布:算法类书籍推荐 编辑:程序博客网 时间:2024/05/29 17:22
  • 算法的流程:
    • 初始化:我们用pages(pairRDD)来记录每个页面和其相关联的页面之间的关系,用ranks(pairRDD)来记录每个页面初始化的rank,初始值为1.0
    • 在每次迭代的过程中,对页面p,我们向其每个相邻的页面,发送一个至为rank(p)/numNeighbors(p)的贡献值
    • 将每个页面收到的contributions相加得到contributionsReceived
    • 将每个页面的排序值设置为0.15 +0.85 * contributionsReceived
  • 算法的实现:
    public static JavaPairRDD<Integer,ArrayList<Integer>> run_page(){        JavaPairRDD<Integer,ArrayList<Integer>> res = sc.textFile(                "/home/liang/workspace/learnSpark/pagerank.txt"        ).mapToPair(new PairFunction<String, Integer, ArrayList<Integer>>() {            @Override            public Tuple2<Integer, ArrayList<Integer>> call(String s) throws Exception {                String key = s.split(" ")[0];                String values = s.split(" ")[1];                ArrayList<Integer> values_integer = new ArrayList<Integer>();                for(String str : values.split(",")){                    values_integer.add(Integer.parseInt(str));                }                return new Tuple2<Integer, ArrayList<Integer>>(Integer.parseInt(key), values_integer);            }        });        return res;    }    public static JavaPairRDD<Integer,Double> run_rank(){        JavaPairRDD<Integer, Double> res = sc.textFile(                "/home/liang/workspace/learnSpark/pagerank.txt"        ).mapToPair(                line->new Tuple2<Integer, Double>(Integer.parseInt(line.split("")[0]),1.0)                );        return res;    }
    public static JavaPairRDD<Integer,ArrayList<Integer>> run_page(){        JavaPairRDD<Integer,ArrayList<Integer>> res = sc.textFile(                "/home/liang/workspace/learnSpark/pagerank.txt"        ).mapToPair(new PairFunction<String, Integer, ArrayList<Integer>>() {            @Override            public Tuple2<Integer, ArrayList<Integer>> call(String s) throws Exception {                String key = s.split(" ")[0];                String values = s.split(" ")[1];                ArrayList<Integer> values_integer = new ArrayList<Integer>();                for(String str : values.split(",")){                    values_integer.add(Integer.parseInt(str));                }                return new Tuple2<Integer, ArrayList<Integer>>(Integer.parseInt(key), values_integer);            }        });        return res;    }    public static JavaPairRDD<Integer,Double> run_rank(){        JavaPairRDD<Integer, Double> res = sc.textFile(                "/home/liang/workspace/learnSpark/pagerank.txt"        ).mapToPair(                line->new Tuple2<Integer, Double>(Integer.parseInt(line.split("")[0]),1.0)                );        return res;    }
0 0
原创粉丝点击