google搜索引擎核心PageRank

来源:互联网 发布:陈田村拆车件淘宝店 编辑:程序博客网 时间:2024/04/30 13:09

google搜索引擎核心PageRank

标签(空格分隔): Spark


001.PNG-95.5kB

002.PNG-138.1kB

003.PNG-250.3kB

004.PNG-113.2kB

005.PNG-181.4kB

006.PNG-115.9kB

007.PNG-144.5kB

算法实现源码:

/* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements.  See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License.  You may obtain a copy of the License at * *    http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */// scalastyle:off printlnpackage org.apache.spark.examplesimport org.apache.spark.{SparkConf, SparkContext}/** * Computes the PageRank of URLs from an input file. Input file should * be in format of: * URL         neighbor URL * URL         neighbor URL * URL         neighbor URL * ... * where URL and their neighbors are separated by space(s). * * This is an example implementation for learning how to use Spark. For more conventional use, * please refer to org.apache.spark.graphx.lib.PageRank */object SparkPageRank {  def showWarning() {    System.err.println(      """WARN: This is a naive implementation of PageRank and is given as an example!        |Please use the PageRank implementation found in org.apache.spark.graphx.lib.PageRank        |for more conventional use.      """.stripMargin)  }  def main(args: Array[String]) {    if (args.length < 1) {      System.err.println("Usage: SparkPageRank <file> <iter>")      System.exit(1)    }    showWarning()    val sparkConf = new SparkConf().setAppName("PageRank")    val iters = if (args.length > 1) args(1).toInt else 10    val ctx = new SparkContext(sparkConf)    val lines = ctx.textFile(args(0), 1)    val links = lines.map{ s =>      val parts = s.split("\\s+")      (parts(0), parts(1))    }.distinct().groupByKey().cache()    var ranks = links.mapValues(v => 1.0)    for (i <- 1 to iters) {      val contribs = links.join(ranks).values.flatMap{ case (urls, rank) =>        val size = urls.size        urls.map(url => (url, rank / size))      }      ranks = contribs.reduceByKey(_ + _).mapValues(0.15 + 0.85 * _)    }    val output = ranks.collect()    output.foreach(tup => println(tup._1 + " has rank: " + tup._2 + "."))    ctx.stop()  }}// scalastyle:on println
0 0
原创粉丝点击