google搜索引擎核心PageRank
来源:互联网 发布:陈田村拆车件淘宝店 编辑:程序博客网 时间:2024/04/30 13:09
google搜索引擎核心PageRank
标签(空格分隔): Spark
算法实现源码:
/* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */// scalastyle:off printlnpackage org.apache.spark.examplesimport org.apache.spark.{SparkConf, SparkContext}/** * Computes the PageRank of URLs from an input file. Input file should * be in format of: * URL neighbor URL * URL neighbor URL * URL neighbor URL * ... * where URL and their neighbors are separated by space(s). * * This is an example implementation for learning how to use Spark. For more conventional use, * please refer to org.apache.spark.graphx.lib.PageRank */object SparkPageRank { def showWarning() { System.err.println( """WARN: This is a naive implementation of PageRank and is given as an example! |Please use the PageRank implementation found in org.apache.spark.graphx.lib.PageRank |for more conventional use. """.stripMargin) } def main(args: Array[String]) { if (args.length < 1) { System.err.println("Usage: SparkPageRank <file> <iter>") System.exit(1) } showWarning() val sparkConf = new SparkConf().setAppName("PageRank") val iters = if (args.length > 1) args(1).toInt else 10 val ctx = new SparkContext(sparkConf) val lines = ctx.textFile(args(0), 1) val links = lines.map{ s => val parts = s.split("\\s+") (parts(0), parts(1)) }.distinct().groupByKey().cache() var ranks = links.mapValues(v => 1.0) for (i <- 1 to iters) { val contribs = links.join(ranks).values.flatMap{ case (urls, rank) => val size = urls.size urls.map(url => (url, rank / size)) } ranks = contribs.reduceByKey(_ + _).mapValues(0.15 + 0.85 * _) } val output = ranks.collect() output.foreach(tup => println(tup._1 + " has rank: " + tup._2 + ".")) ctx.stop() }}// scalastyle:on println
0 0
- google搜索引擎核心PageRank
- google搜索引擎PageRank 算法
- R语言实现Google 搜索引擎的pagerank算法
- Google(谷歌)矩阵是怎算的? --- 核心算法PageRank
- Google PageRank
- Google - Pagerank
- 搜索引擎---PageRank算法
- 搜索引擎与PageRank
- Google外传系列--PageRank核心算法 ->谈Page Rank Google 的民主表决式
- pageRank算法核心思想
- google pagerank checksum算法
- Google的PageRank算法
- Google PageRank技术解密
- Google PageRank 技术解密
- Google pagerank 算法
- Google PageRank技术
- Google的PageRank
- Google PageRank 技术解密
- AFN
- Spark学习三:Spark Schedule以及idea的安装和导入源码
- Spark学习四:网站日志分析案例
- [Python]dict,set
- Java泛型
- google搜索引擎核心PageRank
- 走进VR游戏开发的世界
- laravel安装
- IntelliJ IDEA Tomcat配置 详解
- python爬虫之Scrapy 使用代理配置
- Spring JdbcTemplate方法详解
- 工作第一个月小结
- Java编译器分析
- python scrapy 网络采集使用代理的方法