nutch1.9和solr4.5集成 输出信息

来源:互联网 发布:电脑恢复数据软件 编辑:程序博客网 时间:2024/05/16 11:33

1,通过sor查询nutch抓取的结果

{  "responseHeader": {    "status": 0,    "QTime": 2,    "params": {      "indent": "true",      "q": "title:幻想",      "_": "1418266706916",      "wt": "json"    }  },  "response": {    "numFound": 7,    "start": 0,    "docs": [      {        "content": "幻想江湖-2.2资料片,巅峰对决,震撼来袭! 跳转官网 装备凝练 巅峰擂台 万圣之夜 新版时装",        "id": "http://hxjh.zqgame.com/",        "title": "幻想江湖-2.2资料片,巅峰对决,震撼来袭!",        "segment": "20141211104005",        "boost": 0,        "digest": "c61521c1861b1a7574c8920fd27d0155",        "tstamp": "2014-12-11T02:40:14.477Z",        "url": "http://hxjh.zqgame.com/",        "anchor": [          "幻想江湖",          "幻想江湖"        ],        "_version_": 1487159323035435000      },      {        "content": "幻想江湖-鬼灵精怪万圣节 开启时间 : 10 月 30 日 万圣节礼包领取> 万圣节前夕,为了避免恶灵干扰,大侠们纷纷挂起了南瓜灯,驱逐鬼怪。江湖有一传闻,一群糖果商人行经龙脉岭时,因为身上的糖果、饼干、宝石而找来鬼魂附身,如果帮助他们驱逐了附身邪灵,将会获得他们道谢的礼物哦~! 1 万圣节天天有礼 2 练级打宝两不误 3 节日消费奖励翻倍 4 奖励兑换惊喜不断 5 洗炼折扣大放送 温馨提示: 活动期间,大侠们请每天提着南瓜灯,穿上蝙蝠衫,去龙脉去收集糖果饼干,要不停地说:“trick or treat.”(意思是给不给,不给就捣蛋)。要是不肯给的话,就用各种方法去惩罚他,例如:一招一个怪,“唰唰唰————”把龙脉挂个三小时! 关闭 恭喜你获得幻想江湖万圣节礼包! IOS用户领取: 安卓和越狱用户领取: 有效时间: 即日-2014.11.30 兑换次数: 只限兑换一次 兑换范围: 全服 礼包使用方法: 登录游戏后,点击游戏右上方【领奖】-【福利】-【礼包】后输入正确礼品卡号领取礼包奖励!",        "id": "http://hd.zqgame.com/zqgame/hxjh/dfdj/active_03.html",        "title": "幻想江湖-鬼灵精怪万圣节",        "segment": "20141211104057",        "boost": 0,        "digest": "5ae39251ad06017e4e1854aae9129126",        "tstamp": "2014-12-11T02:41:37.669Z",        "url": "http://hd.zqgame.com/zqgame/hxjh/dfdj/active_03.html",        "anchor": [          "万圣之夜"        ],        "_version_": 1487159633802952700      },      {        "content": "幻想江湖-优雅转身华丽时装首曝 夜魔游龙 西式时装 全新时装新品上架啦,这批时装看上去是不是和以前大有不同呢,此次大胆革新,看到下面的时装,不禁令人想到后面可能真的会有结婚系统咯,新版本新换装,不走平凡路~我们就是这样的与众不同! 进入官网 返回活动首页",        "id": "http://hd.zqgame.com/zqgame/hxjh/dfdj/active_04.html",        "title": "幻想江湖-优雅转身华丽时装首曝",        "segment": "20141211104057",        "boost": 0,        "digest": "e086540bf0f721f39560440c85d2161f",        "tstamp": "2014-12-11T02:41:47.879Z",        "url": "http://hd.zqgame.com/zqgame/hxjh/dfdj/active_04.html",        "anchor": [          "新版时装"        ],        "_version_": 1487159633805049900      },      {        "content": "《幻想江湖》官网-首部超萌动作武侠片!今天开始,做武侠片主人公 首页 新闻中心 游戏资料 游戏论坛 分享到: 安卓下载 ios越狱下载 ios正版下载 礼包领取 1 2 3 4 幻想江湖绝尚发布会精彩视频 最新 新闻 公告 活动 《幻想江湖》IOS18区“美人天下”12月10日火爆开启 2014史上最萌武侠手游来袭!不用吃药,放弃治疗,12月10日上午11:00新区“美人天下”火爆开启!快来没日没夜一起萌萌哒!... 查看详情 > 2014-12-10 • [新闻] 菜鸟进阶强力党 《幻想江湖》装备属性轻松堆 2014-12-10 • [新闻] 全新资料片即将来袭《幻想江湖》四大活动任你玩 2014-12-09 • [活动] 双12 玩幻想送福利 2014-12-09 • [新闻] 细节决定成败 《幻想江湖》人物属性全掌握 2014-12-09 • [活动] 《幻想江湖》IOS18区”美人天下”十六大活动 2014-12-08 • [新闻] 刀尖上的武侠 挑战《幻想江湖》秦陵副本 2014-12-10 • [新闻] 菜鸟进阶强力党 《幻想江湖》装备属性轻松堆 2014-12-10 • [新闻] 全新资料片即将来袭《幻想江湖》四大活动任你玩 2014-12-09 • [新闻] 细节决定成败 《幻想江湖》人物属性全掌握 2014-12-08 • [新闻] 刀尖上的武侠 挑战《幻想江湖》秦陵副本 2014-12-08 • [新闻] 新版“姑姑”遭吐槽 《幻想江湖》还你女神梦 2014-12-05 • [新闻] 《幻想江湖》我们结婚吧!——订婚篇 2014-12-03 • [公告] 幻想江湖-公测9~14区 数据互通公告 2014-12-01 • [公告] 《幻想江湖》12月2日临时维护公告 2014-11-26 • [公告] 《幻想江湖》2.4版本更新 2014-11-25 • [公告] 幻想江湖临时维护公告 2014-11-25 • [公告] 《幻想江湖》appstore1~8服数据互通完毕 2014-11-25 • [公告] 幻想江湖-appstore数据互通延长公告 2014-12-09 • [活动] 双12 玩幻想送福利 2014-12-09 • [活动] 《幻想江湖》IOS18区”美人天下”十六大活动 2014-12-08 • [活动] 周末齐消费 欢乐享不停 2014-12-08 • [活动] 《幻想江湖》美女主播齐聚乐———回顾 2014-12-08 • [活动] 《幻想江湖》25区”独步江湖”十六大活动 2014-12-05 • [活动] 《幻想江湖》玩家体验指南——做好产品,专注体验 联系人:施若熙 联系QQ:744415486 手机:13510624817 邮箱:ruoxi.shi@zqgame.com 联系人:方彦琼 联系QQ:611535985 手机:13603061895 邮箱:yanqiong.fang@zqgame.com 玩家群② 264103428 企业客服QQ:800056019 客服热线:0755-86160520 特色玩法 玩家攻略 职业介绍 明教 唐门 天山 逍遥 18183 766 91手游网 合作媒体 ———————————————————— 微信公众号 新浪微博 腾讯微博 扫描二维码下载 快速注册 通行证: 密 码: 确认密码: 验证码: 立即注册 用户名 恭喜你已经注册成功! 关闭 恭喜您获得幻想江湖公测新手礼包! 你的礼包卡号是: 礼包使用方法: 登陆游戏后,点击游戏右上方【领奖】-【福利】-【礼包】后输入8位的礼包卡号领取礼包奖励!内容包含:止血丹*2、白色强化石*20、成长丹*5、易功丹*10、进阶丹*5。 关闭 微信公众号",        "id": "http://hxjh.zqgame.com/index.html",        "title": "《幻想江湖》官网-首部超萌动作武侠片!今天开始,做武侠片主人公",        "segment": "20141211104057",        "boost": 0,        "digest": "3f9a2060e12f95316ee0201ce8a21da0",        "tstamp": "2014-12-11T02:41:01.462Z",        "url": "http://hxjh.zqgame.com/index.html",        "anchor": [          "进入官网"        ],        "_version_": 1487159633828118500      },      {        "content": "【仙幻奇缘】官网 12.6首次开放公测!无商城,真正免费! 进入官网 论坛中心 游戏下载 购卡充值 1 2 3 4 5 媒体友链 通行证账号: 通行证密码: 确认密码: 验证码: 同意 《中青宝》协议 恭喜你!注册成功! 用户名是: 客户端 立即下载 获取特权礼包 版权所有:深圳中青宝互动网络股份有限公司 客服传真:0755-86368269 中华人民共和国增值电信业务经营许可证:粤B2-20030216 粤ICP备:09057836 网络文化经营许可证:文网文[2008]088号 中华人民共和国互联网出版许可证:新出网证(粤)字017号 每个IP只能参加一次抽奖, 谢谢您的参与!  ",        "id": "http://xh.zqgame.com/",        "title": "【仙幻奇缘】官网 12.6首次开放公测!无商城,真正免费!",        "segment": "20141211104057",        "boost": 0,        "digest": "471def081683b7c5f94a39382e4c00a1",        "tstamp": "2014-12-11T02:41:02.165Z",        "url": "http://xh.zqgame.com/",        "anchor": [          "仙幻奇缘",          "仙幻奇缘"        ],        "_version_": 1487159634570510300      },      {        "content": "《诸神世界》官方网站—3D魔幻战争网游 诸神世界 首页 新闻动态 游戏资料 下载微端 快速充值 官方论坛 下载微端 快速充值 VIP介绍 领取新手卡 选择大区 请选择服务器 风暴荒漠 战争血径 无尽沙海 燃烧平原 双线1-16服 领取中,请稍候…… 您的礼包号为: 更多服务器 《诸神世界》是一款MMORPG的3D国战网页游戏,采用魔幻风格,3D旋转俯瞰视角,以国家战争、团队冒险等玩法为特色,以大范围多维度强PVP玩法为核心的超激情游戏,体验游戏国战pk激情就来诸神世界。 0755-26635899 客服邮箱:kefu@zqgame.com 客服传真:0755-86368269 游戏QQ群:219759659 259942575 用户名: *   以字母开头由大小写字母、数字、下划线组成,长度为4-32位 密码: *   6-20字母、数字、符号组成,不含空格键、「\"」及「'」 确认密码: *   请再一次输入密码 1 2 3 4 最新 新闻 活动 公告 攻略 诸神世界混服部分区服数据互通公告 公告 06-04 诸神世界混服部分区服数据互通公告 公告 05-23 5月29日12点诸神新区-风暴荒漠火爆开启 新闻 05-14 5月15日12点诸神新区-亡魂峡谷火爆开启 公告 04-18 诸神世界混服合服活动精彩上线 公告 04-18 诸神世界混服部分区服数据互通公告 【新闻】 05-14 5月15日12点诸神新区-亡魂峡谷火爆开启 【新闻】 04-16 4月17日12点诸神新区-呼啸沙漠火爆开启 【新闻】 03-24 3月27日12点诸神新区-巨龙之吼火爆开启 【新闻】 03-17 3月20日12点诸神新区-尘风峡谷火爆开启 【新闻】 03-11 3月13日12点诸神新区-耳语海岸火爆开启 【活动】 04-02 《诸神世界》十大开服活动 【活动】 02-13 《诸神世界》元宵&情人节活动 【活动】 01-26 《诸神世界》春节活动 【活动】 11-21 诸神世界周末限时活动火爆上线 【活动】 11-08 双十一《诸神世界》劲爆大酬宾 【公告】 06-04 诸神世界混服部分区服数据互通公告 【公告】 05-23 5月29日12点诸神新区-风暴荒漠火爆开启 【公告】 04-18 诸神世界混服合服活动精彩上线 【公告】 04-18 诸神世界混服部分区服数据互通公告 【公告】 03-25 3月28日平台网络升级公告 魔 牧 枪 炮 术 战 魔 刃 狩猎灵魂 攻击方式:近程魔法攻击 核心属性:智力 敏捷 职业特质:隐匿暗杀能力 职业说明:刀锋舞者,狩猎着生者的灵魂。隐没于黑暗,游走于光明。不被历史描述,却是历史的主宰! 点击查看详情 牧 师 神的宠儿 攻击方式:中程魔法攻击 核心属性:精神 智力 职业特质:恢复治愈能力 职业说明:神之使徒,捍卫生者,拯救死者。信者永生,不信者也救赎。虔诚的信徒,是神的宠儿! 点击查看详情 枪 手 一击必杀 攻击方式:远程物理攻击 核心属性:力量 精神 职业特质:伤害输出 职业说明:猎命王者,半边恶魔半边天使。沉着冷静,是他们的特质;一击必杀,是他们的实力! 点击查看详情 魔 炮 焚天怒焰 攻击方式:远程魔法攻击 核心属性:智力 精神 职业特质:群体伤害 职业说明:焚天烈焰,吞噬罪孽与苍生。沉稳步伐,吼出战歌嘹亮;怒放炮火,点亮生命奇迹! 点击查看详情 术 士 破碎虚空 攻击方式:中程魔法攻击 核心属性:智力 精神 职业特质:战斗节奏控制能力 职业说明:掌握法则,智慧象征。探索真理,识古通今,洞悉未来。以世间威能,抑恶扬善,改天逆命,破碎虚空! 点击查看详情 战 士 金刚不坏 攻击方式:近程物理攻击 核心属性:体质 力量 职业特质:生存能力 职业说明:移动城墙,金刚不坏。战,则掠地千里;守,则万夫莫开。英勇的灵魂铸造不灭传奇! 点击查看详情 系统介绍 进阶指导 特色系统 活动玩法 结婚系统 | 职业介绍 | FAQ | VIP如何获得 | 坐骑强化 | 转职重修 | 战友系统 | 升级送祝福 | 日常任务 | 拍卖寄售 | 技能遗忘重生 | 道具商城 | 财产保护 炼金系统 | 星耀石 | 装备镶嵌 | 装备升阶 | 装备打孔 | 要塞守卫站 | 神器合成 | 宠物潜力修改 | 宝石摘除 斗气系统 | 羽翼系统 | 1V1模拟战 | 移民系统 | 击鼓传花 | 情缘任务 | 神圣血脉 | 军衔系统 | 钓鱼系统 | 称号系统 | 封印进度 | 离线经验 巴比伦塔 | 跨区国战 | 跨区巡游 | 跨区极速狂飙 | 跨区组队争夺战 | 超级血战到底 | 血战到底 | 小丑的梦境 | 王者试炼 | 探险者地宫 | 前线速递 | 骑魂谷 | 冒险岛 | 极速狂飙 | 毁灭神迹 | 国家正式战争 | 国家远征 | 国家情报 | 国家BOSS | 藏宝峡谷 游戏壁纸 游戏截图 玩家相册 MORE                         265G百科 073专区 新浪爱问 抵制不良游戏 拒绝盗版游戏 注意自我保护 谨防上当受骗 适度游戏益脑 沉迷游戏伤身 合理安排时间 享受健康生活 增值电信许可证:粤B2-20120680 网络文化经营许可证: 粤网文[2014]0615-215号 粤ICP备09057836号 深圳市卓页互动网络科技有限公司 Copyright © 2012-2014 All Rights Reserved 本游戏适合18岁以上用户,不含暴力、恐怖、残酷、色情等妨害未成年人身心健康的内容,属于绿色健康产品 yy",        "id": "http://zs.ucjoy.com/",        "title": "《诸神世界》官方网站—3D魔幻战争网游",        "cache": "content",        "segment": "20141211104057",        "boost": 0,        "digest": "8d00af8aaa03c2cf68a69dc68892b764",        "tstamp": "2014-12-11T02:41:18.686Z",        "url": "http://zs.ucjoy.com/",        "anchor": [          "官网",          "诸神世界"        ],        "_version_": 1487159634641813500      },      {        "content": "《诸神世界》官方网站—3D魔幻战争网游 诸神世界 首页 新闻动态 游戏资料 下载微端 快速充值 官方论坛 下载微端 快速充值 VIP介绍 领取新手卡 选择大区 请选择服务器 风暴荒漠 战争血径 无尽沙海 燃烧平原 双线1-16服 领取中,请稍候…… 您的礼包号为: 更多服务器 《诸神世界》是一款MMORPG的3D国战网页游戏,采用魔幻风格,3D旋转俯瞰视角,以国家战争、团队冒险等玩法为特色,以大范围多维度强PVP玩法为核心的超激情游戏,体验游戏国战pk激情就来诸神世界。 0755-26635899 客服邮箱:kefu@zqgame.com 客服传真:0755-86368269 游戏QQ群:219759659 259942575 用户名: *   以字母开头由大小写字母、数字、下划线组成,长度为4-32位 密码: *   6-20字母、数字、符号组成,不含空格键、「\"」及「'」 确认密码: *   请再一次输入密码 您所在的位置: 首页 > 服务器列表 推荐服务器列表 风暴荒漠 火爆 战争血径 火爆 我的服务器列表 你还未进入过游戏,请先登录游戏! 所有服务器 1-10 11-20 诸神混服 双线1-16服 火爆 风暴荒漠 火爆 战争血径 火爆 无尽沙海 火爆 燃烧平原 火爆 抵制不良游戏 拒绝盗版游戏 注意自我保护 谨防上当受骗 适度游戏益脑 沉迷游戏伤身 合理安排时间 享受健康生活 增值电信许可证:粤B2-20120680 网络文化经营许可证: 粤网文[2014]0615-215号 粤ICP备09057836号 深圳市卓页互动网络科技有限公司 Copyright © 2012-2014 All Rights Reserved 本游戏适合18岁以上用户,不含暴力、恐怖、残酷、色情等妨害未成年人身心健康的内容,属于绿色健康产品 yy",        "id": "http://zs.ucjoy.com/serverlist.app",        "title": "《诸神世界》官方网站—3D魔幻战争网游",        "cache": "content",        "segment": "20141211104057",        "boost": 0,        "digest": "30a836aae5886924d1a87d3ab1ad42c8",        "tstamp": "2014-12-11T02:41:13.476Z",        "url": "http://zs.ucjoy.com/serverlist.app",        "anchor": [          "进入新服",          "开始游戏"        ],        "_version_": 1487159634643910700      }    ]  }}


2,截图展示solr展示的结果




bin/crawl urls  crawl  http://xx.xx.xx.xx:8983/solr  5


3,nutch抓取时候日志:

<pre name="code" class="plain">2014-12-11 10:23:02,927 INFO  crawl.Injector - Injector: starting at 2014-12-11 10:23:02 2289 2014-12-11 10:23:02,928 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb 2290 2014-12-11 10:23:02,928 INFO  crawl.Injector - Injector: urlDir: urls 2291 2014-12-11 10:23:02,929 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries. 2292 2014-12-11 10:23:03,210 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2293 2014-12-11 10:23:03,266 WARN  snappy.LoadSnappy - Snappy native library not loaded 2294 2014-12-11 10:23:03,748 INFO  regex.RegexURLNormalizer - can't find rules for scope 'inject', using default 2295 2014-12-11 10:23:04,496 INFO  crawl.Injector - Injector: Total number of urls rejected by filters: 0 2296 2014-12-11 10:23:04,496 INFO  crawl.Injector - Injector: Total number of urls after normalization: 1 2297 2014-12-11 10:23:04,496 INFO  crawl.Injector - Injector: Merging injected urls into crawl db. 2298 2014-12-11 10:23:04,779 INFO  crawl.Injector - Injector: overwrite: false 2299 2014-12-11 10:23:04,779 INFO  crawl.Injector - Injector: update: false 2300 2014-12-11 10:23:05,606 INFO  crawl.Injector - Injector: URLs merged: 1 2301 2014-12-11 10:23:05,611 INFO  crawl.Injector - Injector: Total new urls injected: 0 2302 2014-12-11 10:23:05,612 INFO  crawl.Injector - Injector: finished at 2014-12-11 10:23:05, elapsed: 00:00:02 2303 2014-12-11 10:23:06,551 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2304 2014-12-11 10:23:06,552 INFO  crawl.Generator - Generator: starting at 2014-12-11 10:23:06 2305 2014-12-11 10:23:06,552 INFO  crawl.Generator - Generator: Selecting best-scoring urls due for fetch. 2306 2014-12-11 10:23:06,552 INFO  crawl.Generator - Generator: filtering: false 2307 2014-12-11 10:23:06,552 INFO  crawl.Generator - Generator: normalizing: true 2308 2014-12-11 10:23:06,552 INFO  crawl.Generator - Generator: topN: 50000 2309 2014-12-11 10:23:07,201 INFO  crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule 2310 2014-12-11 10:23:07,202 INFO  crawl.AbstractFetchSchedule - defaultInterval=2592000 2311 2014-12-11 10:23:07,202 INFO  crawl.AbstractFetchSchedule - maxInterval=7776000 2312 2014-12-11 10:23:07,211 INFO  regex.RegexURLNormalizer - can't find rules for scope 'partition', using default 2313 2014-12-11 10:23:07,267 INFO  crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule 2314 2014-12-11 10:23:07,267 INFO  crawl.AbstractFetchSchedule - defaultInterval=2592000 2315 2014-12-11 10:23:07,267 INFO  crawl.AbstractFetchSchedule - maxInterval=7776000 2316 2014-12-11 10:23:07,272 INFO  regex.RegexURLNormalizer - can't find rules for scope 'generate_host_count', using default 2317 2014-12-11 10:23:07,875 INFO  crawl.Generator - Generator: Partitioning selected urls for politeness. 2318 2014-12-11 10:23:08,875 INFO  crawl.Generator - Generator: segment: crawl/segments/20141211102308 2319 2014-12-11 10:23:09,051 INFO  regex.RegexURLNormalizer - can't find rules for scope 'partition', using default 2320 2014-12-11 10:23:09,993 INFO  crawl.Generator - Generator: finished at 2014-12-11 10:23:09, elapsed: 00:00:03 2321 2014-12-11 10:23:10,681 INFO  fetcher.Fetcher - Fetcher: starting at 2014-12-11 10:23:10 2322 2014-12-11 10:23:10,681 INFO  fetcher.Fetcher - Fetcher: segment: crawl/segments/20141211102308 2323 2014-12-11 10:23:10,681 INFO  fetcher.Fetcher - Fetcher Timelimit set for : 1418275390681 2324 2014-12-11 10:23:10,956 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2325 2014-12-11 10:23:11,415 INFO  fetcher.Fetcher - Using queue mode : byHost 2326 2014-12-11 10:23:11,415 INFO  fetcher.Fetcher - Fetcher: threads: 50 2327 2014-12-11 10:23:11,415 INFO  fetcher.Fetcher - Fetcher: time-out divisor: 2 2328 2014-12-11 10:23:11,435 INFO  fetcher.Fetcher - QueueFeeder finished: total 18 records + hit by time limit :0 2329 2014-12-11 10:23:11,585 INFO  fetcher.Fetcher - Using queue mode : byHost 2330 2014-12-11 10:23:11,586 INFO  fetcher.Fetcher - Using queue mode : byHost 2331 2014-12-11 10:23:11,586 INFO  fetcher.Fetcher - fetching http://v.zqgame.com/moviePlay/goMoviePlay/5/001 (queue crawl delay=5000ms) 2332 2014-12-11 10:23:11,587 INFO  fetcher.Fetcher - Using queue mode : byHost  2348 2014-12-11 10:23:11,597 INFO  http.Http - http.proxy.host = null 2349 2014-12-11 10:23:11,597 INFO  http.Http - http.proxy.port = 8080 2350 2014-12-11 10:23:11,597 INFO  http.Http - http.timeout = 10000 2351 2014-12-11 10:23:11,597 INFO  http.Http - http.content.limit = 65536 2352 2014-12-11 10:23:11,597 INFO  http.Http - http.agent = My Nutch Spider/Nutch-1.9 2353 2014-12-11 10:23:11,597 INFO  http.Http - http.accept.language = en-us,en-gb,en;q=0.7,*;q=0.3 2354 2014-12-11 10:23:11,597 INFO  http.Http - http.accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 2355 2014-12-11 10:23:11,597 INFO  fetcher.Fetcher - Using queue mode : byHost2387 2014-12-11 10:23:11,620 INFO  fetcher.Fetcher - Fetcher: throughput threshold: -1 2388 2014-12-11 10:23:11,620 INFO  fetcher.Fetcher - Fetcher: throughput threshold retries: 5 2389 2014-12-11 10:23:11,620 INFO  fetcher.Fetcher - fetcher.maxNum.threads can't be < than 50 : using 50 instead 2390 2014-12-11 10:23:12,622 INFO  fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=17, fetchQueues.getQueueCount=1 2391 2014-12-11 10:23:13,622 INFO  fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=17, fetchQueues.getQueueCount=1 2392 2014-12-11 10:23:14,623 INFO  fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=17, fetchQueues.getQueueCount=1 2393 2014-12-11 10:23:15,623 INFO  fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=17, fetchQueues.getQueueCount=1 2394 2014-12-11 10:23:16,624 INFO  fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=17, fetchQueues.getQueueCount=1 2395 2014-12-11 10:23:16,891 INFO  fetcher.Fetcher - fetching http://v.zqgame.com/moviePlay/goMoviePlay/3/3 (queue crawl delay=5000ms) 2396 2014-12-11 10:23:17,624 INFO  fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=16, fetchQueues.getQueueCount=1 2397 2014-12-11 10:23:18,625 INFO  fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=16, fetchQueues.getQueueCount=1 2398 2014-12-11 10:23:19,625 INFO  fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=16, fetchQueues.getQueueCount=1 2399 2014-12-11 10:23:20,626 INFO  fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=16, fetchQueues.getQueueCount=1 2400 2014-12-11 10:23:21,626 INFO  fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=16, fetchQueues.getQueueCount=1 2401 2014-12-11 10:23:21,935 INFO  fetcher.Fetcher - fetching http://v.zqgame.com/view/index (queue crawl delay=5000ms) 2402 2014-12-11 10:23:22,627 INFO  fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=15, fetchQueues.getQueueCount=1 2403 2014-12-11 10:23:23,627 INFO  fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=15, fetchQueues.getQueueCount=1 2404 2014-12-11 10:23:24,627 INFO  fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=15, fetchQueues.getQueueCount=1 2405 2014-12-11 10:23:25,628 INFO  fetcher.Fetcher - -activeThreads=50, spinWaiting=50, fetchQueues.totalSize=15, fetchQueues.getQueueCount=13158 2014-12-11 10:27:15,997 INFO  fetcher.Fetcher - Thread FetcherThread has no more work available 3159 2014-12-11 10:27:15,997 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=0 3160 2014-12-11 10:27:16,004 INFO  fetcher.Fetcher - -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=0 3161 2014-12-11 10:27:16,005 INFO  fetcher.Fetcher - -activeThreads=0 3162 2014-12-11 10:27:16,629 INFO  fetcher.Fetcher - Fetcher: finished at 2014-12-11 10:27:16, elapsed: 00:00:07 3163 2014-12-11 10:27:17,320 INFO  parse.ParseSegment - ParseSegment: starting at 2014-12-11 10:27:17 3164 2014-12-11 10:27:17,320 INFO  parse.ParseSegment - ParseSegment: segment: crawl/segments/20141211102707 3165 2014-12-11 10:27:17,591 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 3166 2014-12-11 10:27:18,518 INFO  crawl.SignatureFactory - Using Signature impl: org.apache.nutch.crawl.MD5Signature 3167 2014-12-11 10:27:18,528 INFO  parse.ParseSegment - Parsed (12ms):http://v.zqgame.com/indexmain 3168 2014-12-11 10:27:18,571 INFO  parse.ParseSegment - Parsed (1ms):http://v.zqgame.com/moviePlay/goMoviePlay/4/4 3169 2014-12-11 10:27:18,659 INFO  regex.RegexURLNormalizer - can't find rules for scope 'outlink', using default 3170 2014-12-11 10:27:18,871 INFO  parse.ParseSegment - ParseSegment: finished at 2014-12-11 10:27:18, elapsed: 00:00:01 3171 2014-12-11 10:27:19,794 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 3172 2014-12-11 10:27:19,810 INFO  crawl.CrawlDb - CrawlDb update: starting at 2014-12-11 10:27:19 3173 2014-12-11 10:27:19,810 INFO  crawl.CrawlDb - CrawlDb update: db: crawl/crawldb 3174 2014-12-11 10:27:19,811 INFO  crawl.CrawlDb - CrawlDb update: segments: [crawl/segments/20141211102707] 3175 2014-12-11 10:27:19,811 INFO  crawl.CrawlDb - CrawlDb update: additions allowed: true 3176 2014-12-11 10:27:19,811 INFO  crawl.CrawlDb - CrawlDb update: URL normalizing: false 3177 2014-12-11 10:27:19,811 INFO  crawl.CrawlDb - CrawlDb update: URL filtering: false 3178 2014-12-11 10:27:19,811 INFO  crawl.CrawlDb - CrawlDb update: 404 purging: false 3179 2014-12-11 10:27:19,812 INFO  crawl.CrawlDb - CrawlDb update: Merging segment data into db. 3180 2014-12-11 10:27:20,639 INFO  crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule 3181 2014-12-11 10:27:20,639 INFO  crawl.AbstractFetchSchedule - defaultInterval=2592000 3182 2014-12-11 10:27:20,639 INFO  crawl.AbstractFetchSchedule - maxInterval=7776000 3183 2014-12-11 10:27:21,120 INFO  crawl.CrawlDb - CrawlDb update: finished at 2014-12-11 10:27:21, elapsed: 00:00:01 3184 2014-12-11 10:27:22,066 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 3185 2014-12-11 10:27:22,067 INFO  crawl.LinkDb - LinkDb: starting at 2014-12-11 10:27:22 3186 2014-12-11 10:27:22,067 INFO  crawl.LinkDb - LinkDb: linkdb: crawl/linkdb 3187 2014-12-11 10:27:22,067 INFO  crawl.LinkDb - LinkDb: URL normalize: true 3188 2014-12-11 10:27:22,067 INFO  crawl.LinkDb - LinkDb: URL filter: true3189 2014-12-11 10:27:22,068 INFO  crawl.LinkDb - LinkDb: internal links will be ignored. 3190 2014-12-11 10:27:22,068 INFO  crawl.LinkDb - LinkDb: adding segment: crawl/segments/20141211102707 3191 2014-12-11 10:27:23,376 INFO  crawl.LinkDb - LinkDb: merging with existing linkdb: crawl/linkdb 3192 2014-12-11 10:27:23,688 INFO  regex.RegexURLNormalizer - can't find rules for scope 'linkdb', using default 3193 2014-12-11 10:27:24,510 INFO  crawl.LinkDb - LinkDb: finished at 2014-12-11 10:27:24, elapsed: 00:00:02 3194 2014-12-11 10:27:25,209 INFO  crawl.DeduplicationJob - DeduplicationJob: starting at 2014-12-11 10:27:25 3195 2014-12-11 10:27:25,483 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 3196 2014-12-11 10:27:26,760 INFO  crawl.DeduplicationJob - Deduplication: 2 documents marked as duplicates 3197 2014-12-11 10:27:26,760 INFO  crawl.DeduplicationJob - Deduplication: Updating status of duplicate urls into crawl db. 3198 2014-12-11 10:27:27,931 INFO  crawl.DeduplicationJob - Deduplication finished at 2014-12-11 10:27:27, elapsed: 00:00:02 3199 2014-12-11 10:27:28,623 INFO  indexer.IndexingJob - Indexer: starting at 2014-12-11 10:27:28 3200 2014-12-11 10:27:28,711 INFO  indexer.IndexingJob - Indexer: deleting gone documents: false 3201 2014-12-11 10:27:28,711 INFO  indexer.IndexingJob - Indexer: URL filtering: false 3202 2014-12-11 10:27:28,718 INFO  indexer.IndexingJob - Indexer: URL normalizing: false 3203 2014-12-11 10:27:28,933 INFO  indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter 3204 2014-12-11 10:27:28,933 INFO  indexer.IndexingJob - Active IndexWriters : 3205 SOLRIndexWriter 3206     solr.server.url : URL of the SOLR instance (mandatory) 3207     solr.commit.size : buffer size when sending to SOLR (default 1000) 3208     solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml) 3209     solr.auth : use authentication (default false) 3210     solr.auth.username : use authentication (default false) 3211     solr.auth : username for authentication 3212     solr.auth.password : password for authentication 3213  3214  3215 2014-12-11 10:27:28,937 INFO  indexer.IndexerMapReduce - IndexerMapReduce: crawldb: crawl/crawldb 3216 2014-12-11 10:27:28,937 INFO  indexer.IndexerMapReduce - IndexerMapReduce: linkdb: crawl/linkdb 3217 2014-12-11 10:27:28,937 INFO  indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20141211102707 3218 2014-12-11 10:27:29,087 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 3219 2014-12-11 10:27:29,585 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off 3220 2014-12-11 10:27:29,995 INFO  indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter 3221 2014-12-11 10:27:30,022 INFO  solr.SolrMappingReader - source: content dest: content 3222 2014-12-11 10:27:30,023 INFO  solr.SolrMappingReader - source: title dest: title 3223 2014-12-11 10:27:30,023 INFO  solr.SolrMappingReader - source: host dest: host 3224 2014-12-11 10:27:30,023 INFO  solr.SolrMappingReader - source: segment dest: segment 3225 2014-12-11 10:27:30,023 INFO  solr.SolrMappingReader - source: boost dest: boost 3226 2014-12-11 10:27:30,023 INFO  solr.SolrMappingReader - source: digest dest: digest 3227 2014-12-11 10:27:30,023 INFO  solr.SolrMappingReader - source: tstamp dest: tstamp 3228 2014-12-11 10:27:30,054 INFO  solr.SolrIndexWriter - Indexing 2 documents 3229 2014-12-11 10:27:30,175 INFO  solr.SolrIndexWriter - Indexing 2 documents2014-12-11 10:39:34,707 INFO  crawl.Injector - Injector: starting at 2014-12-11 10:39:34 3254 2014-12-11 10:39:34,707 INFO  crawl.Injector - Injector: crawlDb: crawl/crawldb 3255 2014-12-11 10:39:34,707 INFO  crawl.Injector - Injector: urlDir: urls 3256 2014-12-11 10:39:34,708 INFO  crawl.Injector - Injector: Converting injected urls to crawl db entries. 3257 2014-12-11 10:39:34,989 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 3258 2014-12-11 10:39:35,046 WARN  snappy.LoadSnappy - Snappy native library not loaded 3259 2014-12-11 10:39:35,528 INFO  regex.RegexURLNormalizer - can't find rules for scope 'inject', using default 3260 2014-12-11 10:39:36,273 INFO  crawl.Injector - Injector: Total number of urls rejected by filters: 0 3261 2014-12-11 10:39:36,273 INFO  crawl.Injector - Injector: Total number of urls after normalization: 1 3262 2014-12-11 10:39:36,273 INFO  crawl.Injector - Injector: Merging injected urls into crawl db. 3263 2014-12-11 10:39:36,577 INFO  crawl.Injector - Injector: overwrite: false 3264 2014-12-11 10:39:36,577 INFO  crawl.Injector - Injector: update: false 3265 2014-12-11 10:39:37,387 INFO  crawl.Injector - Injector: URLs merged: 1 3266 2014-12-11 10:39:37,392 INFO  crawl.Injector - Injector: Total new urls injected: 0 3267 2014-12-11 10:39:37,392 INFO  crawl.Injector - Injector: finished at 2014-12-11 10:39:37, elapsed: 00:00:02 3268 2014-12-11 10:39:38,327 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 3269 2014-12-11 10:39:38,328 INFO  crawl.Generator - Generator: starting at 2014-12-11 10:39:38 3270 2014-12-11 10:39:38,328 INFO  crawl.Generator - Generator: Selecting best-scoring urls due for fetch. 3271 2014-12-11 10:39:38,328 INFO  crawl.Generator - Generator: filtering: false 3272 2014-12-11 10:39:38,328 INFO  crawl.Generator - Generator: normalizing: true 3273 2014-12-11 10:39:38,328 INFO  crawl.Generator - Generator: topN: 50000 3274 2014-12-11 10:39:38,978 INFO  crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule 3275 2014-12-11 10:39:38,978 INFO  crawl.AbstractFetchSchedule - defaultInterval=2592000 3276 2014-12-11 10:39:38,978 INFO  crawl.AbstractFetchSchedule - maxInterval=7776000 3277 2014-12-11 10:39:38,987 INFO  regex.RegexURLNormalizer - can't find rules for scope 'partition', using default 3278 2014-12-11 10:39:39,040 INFO  crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule 3279 2014-12-11 10:39:39,040 INFO  crawl.AbstractFetchSchedule - defaultInterval=2592000 3280 2014-12-11 10:39:39,040 INFO  crawl.AbstractFetchSchedule - maxInterval=7776000 3281 2014-12-11 10:39:39,045 INFO  regex.RegexURLNormalizer - can't find rules for scope 'generate_host_count', using default 3282 2014-12-11 10:39:39,649 INFO  crawl.Generator - Generator: Partitioning selected urls for politeness. 3283 2014-12-11 10:39:40,649 INFO  crawl.Generator - Generator: segment: crawl/segments/20141211103940 3284 2014-12-11 10:39:40,814 INFO  regex.RegexURLNormalizer - can't find rules for scope 'partition', using default3285 2014-12-11 10:39:41,755 INFO  crawl.Generator - Generator: finished at 2014-12-11 10:39:41, elapsed: 00:00:03 3286 2014-12-11 10:39:42,447 INFO  fetcher.Fetcher - Fetcher: starting at 2014-12-11 10:39:42 3287 2014-12-11 10:39:42,447 INFO  fetcher.Fetcher - Fetcher: segment: crawl/segments/20141211103940 3288 2014-12-11 10:39:42,447 INFO  fetcher.Fetcher - Fetcher Timelimit set for : 1418276382447 3289 2014-12-11 10:39:42,720 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 3290 2014-12-11 10:39:43,171 INFO  fetcher.Fetcher - Using queue mode : byHost 3291 2014-12-11 10:39:43,171 INFO  fetcher.Fetcher - Fetcher: threads: 50 3292 2014-12-11 10:39:43,171 INFO  fetcher.Fetcher - Fetcher: time-out divisor: 2 3293 2014-12-11 10:39:43,182 INFO  fetcher.Fetcher - QueueFeeder finished: total 1 records + hit by time limit :0 3294 2014-12-11 10:39:43,336 INFO  fetcher.Fetcher - Using queue mode : byHost 3295 2014-12-11 10:39:43,337 INFO  fetcher.Fetcher - Using queue mode : byHost 3296 2014-12-11 10:39:43,337 INFO  fetcher.Fetcher - fetching http://passport.zqgame.com/common/agreement.jsp (queue crawl delay=5000ms) 3297 2014-12-11 10:39:43,337 INFO  fetcher.Fetcher - Thread FetcherThread has no more work available 3298 2014-12-11 10:39:43,337 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=1 3299 2014-12-11 10:39:43,337 INFO  fetcher.Fetcher - Using queue mode : byHost 3300 2014-12-11 10:39:43,338 INFO  fetcher.Fetcher - Thread FetcherThread has no more work available 3301 2014-12-11 10:39:43,338 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=1 3302 2014-12-11 10:39:43,338 INFO  fetcher.Fetcher - Using queue mode : byHost 3303 2014-12-11 10:39:43,338 INFO  fetcher.Fetcher - Thread FetcherThread has no more work available 3304 2014-12-11 10:39:43,338 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=1 3305 2014-12-11 10:39:43,338 INFO  fetcher.Fetcher - Using queue mode : byHost 3306 2014-12-11 10:39:43,339 INFO  fetcher.Fetcher - Thread FetcherThread has no more work available 3307 2014-12-11 10:39:43,339 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=1 3308 2014-12-11 10:39:43,339 INFO  fetcher.Fetcher - Using queue mode : byHost 3309 2014-12-11 10:39:43,340 INFO  fetcher.Fetcher - Thread FetcherThread has no more work available 3310 2014-12-11 10:39:43,340 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=1 3311 2014-12-11 10:39:43,340 INFO  fetcher.Fetcher - Using queue mode : byHost 3312 2014-12-11 10:39:43,340 INFO  fetcher.Fetcher - Thread FetcherThread has no more work available 3313 2014-12-11 10:39:43,340 INFO  fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=1 3314 2014-12-11 10:39:43,340 INFO  fetcher.Fetcher - Using queue mode : byHost 3315 2014-12-11 10:39:43,341 INFO  fetcher.Fetcher - Thread FetcherThread has no more work available2014-12-11 10:39:57,352 INFO  indexer.IndexerMapReduce - IndexerMapReduce: crawldb: crawl/crawldb 3511 2014-12-11 10:39:57,352 INFO  indexer.IndexerMapReduce - IndexerMapReduce: linkdb: crawl/linkdb 3512 2014-12-11 10:39:57,353 INFO  indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20141211103940 3513 2014-12-11 10:39:57,501 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 3514 2014-12-11 10:39:57,970 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off 3515 2014-12-11 10:39:58,376 INFO  indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter 3516 2014-12-11 10:39:58,403 INFO  solr.SolrMappingReader - source: content dest: content 3517 2014-12-11 10:39:58,403 INFO  solr.SolrMappingReader - source: title dest: title 3518 2014-12-11 10:39:58,403 INFO  solr.SolrMappingReader - source: host dest: host 3519 2014-12-11 10:39:58,403 INFO  solr.SolrMappingReader - source: segment dest: segment 3520 2014-12-11 10:39:58,403 INFO  solr.SolrMappingReader - source: boost dest: boost 3521 2014-12-11 10:39:58,403 INFO  solr.SolrMappingReader - source: digest dest: digest 3522 2014-12-11 10:39:58,403 INFO  solr.SolrMappingReader - source: tstamp dest: tstamp 3523 2014-12-11 10:39:58,434 INFO  solr.SolrIndexWriter - Indexing 1 documents 3524 2014-12-11 10:39:59,776 INFO  solr.SolrMappingReader - source: content dest: content 3525 2014-12-11 10:39:59,776 INFO  solr.SolrMappingReader - source: title dest: title 3526 2014-12-11 10:39:59,776 INFO  solr.SolrMappingReader - source: host dest: host 3527 2014-12-11 10:39:59,776 INFO  solr.SolrMappingReader - source: segment dest: segment 3528 2014-12-11 10:39:59,776 INFO  solr.SolrMappingReader - source: boost dest: boost 3529 2014-12-11 10:39:59,776 INFO  solr.SolrMappingReader - source: digest dest: digest 3530 2014-12-11 10:39:59,776 INFO  solr.SolrMappingReader - source: tstamp dest: tstamp 3531 2014-12-11 10:40:00,130 INFO  indexer.IndexingJob - Indexer: finished at 2014-12-11 10:40:00, elapsed: 00:00:03 3532 2014-12-11 10:40:00,830 INFO  indexer.CleaningJob - CleaningJob: starting at 2014-12-11 10:40:00 3533 2014-12-11 10:40:01,101 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 3534 2014-12-11 10:40:01,748 INFO  indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter 3535 2014-12-11 10:40:01,775 INFO  solr.SolrMappingReader - source: content dest: content 3536 2014-12-11 10:40:01,775 INFO  solr.SolrMappingReader - source: title dest: title 3537 2014-12-11 10:40:01,775 INFO  solr.SolrMappingReader - source: host dest: host 3538 2014-12-11 10:40:01,775 INFO  solr.SolrMappingReader - source: segment dest: segment 3539 2014-12-11 10:40:01,775 INFO  solr.SolrMappingReader - source: boost dest: boost 3540 2014-12-11 10:40:01,775 INFO  solr.SolrMappingReader - source: digest dest: digest 3541 2014-12-11 10:40:01,775 INFO  solr.SolrMappingReader - source: tstamp dest: tstamp 3542 2014-12-11 10:40:01,963 INFO  indexer.CleaningJob - CleaningJob: deleted a total of 10 documents 3543 2014-12-11 10:40:01,967 WARN  mapred.FileOutputCommitter - Output path is null in cleanup 3544 2014-12-11 10:40:02,382 INFO  indexer.CleaningJob - CleaningJob: finished at 2014-12-11 10:40:02, elapsed: 00:00:01 3545 2014-12-11 10:40:03,313 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable014-12-11 10:40:01,967 WARN  mapred.FileOutputCommitter - Output path is null in cleanup 3544 2014-12-11 10:40:02,382 INFO  indexer.CleaningJob - CleaningJob: finished at 2014-12-11 10:40:02, elapsed: 00:00:01 3545 2014-12-11 10:40:03,313 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 3546 2014-12-11 10:40:03,314 INFO  crawl.Generator - Generator: starting at 2014-12-11 10:40:03 3547 2014-12-11 10:40:03,314 INFO  crawl.Generator - Generator: Selecting best-scoring urls due for fetch. 3548 2014-12-11 10:40:03,314 INFO  crawl.Generator - Generator: filtering: false 3549 2014-12-11 10:40:03,314 INFO  crawl.Generator - Generator: normalizing: true 3550 2014-12-11 10:40:03,315 INFO  crawl.Generator - Generator: topN: 50000 3551 2014-12-11 10:40:03,963 INFO  crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule 3552 2014-12-11 10:40:03,964 INFO  crawl.AbstractFetchSchedule - defaultInterval=2592000 3553 2014-12-11 10:40:03,964 INFO  crawl.AbstractFetchSchedule - maxInterval=7776000 3554 2014-12-11 10:40:03,972 INFO  regex.RegexURLNormalizer - can't find rules for scope 'partition', using default 3555 2014-12-11 10:40:04,062 INFO  crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule 3556 2014-12-11 10:40:04,062 INFO  crawl.AbstractFetchSchedule - defaultInterval=2592000 3557 2014-12-11 10:40:04,062 INFO  crawl.AbstractFetchSchedule - maxInterval=7776000 3558 2014-12-11 10:40:04,067 INFO  regex.RegexURLNormalizer - can't find rules for scope 'generate_host_count', using default 3559 2014-12-11 10:40:04,635 INFO  crawl.Generator - Generator: Partitioning selected urls for politeness. 3560 2014-12-11 10:40:05,636 INFO  crawl.Generator - Generator: segment: crawl/segments/20141211104005 3561 2014-12-11 10:40:05,803 INFO  regex.RegexURLNormalizer - can't find rules for scope 'partition', using default 3562 2014-12-11 10:40:06,747 INFO  crawl.Generator - Generator: finished at 2014-12-11 10:40:06, elapsed: 00:00:03 3563 2014-12-11 10:40:07,435 INFO  fetcher.Fetcher - Fetcher: starting at 2014-12-11 10:40:07 3564 2014-12-11 10:40:07,435 INFO  fetcher.Fetcher - Fetcher: segment: crawl/segments/20141211104005 3565 2014-12-11 10:40:07,435 INFO  fetcher.Fetcher - Fetcher Timelimit set for : 1418276407435 3566 2014-12-11 10:40:07,707 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 3567 2014-12-11 10:40:08,157 INFO  fetcher.Fetcher - Using queue mode : byHost 3568 2014-12-11 10:40:08,158 INFO  fetcher.Fetcher - Fetcher: threads: 50 3569 2014-12-11 10:40:08,158 INFO  fetcher.Fetcher - Fetcher: time-out divisor: 2 3570 2014-12-11 10:40:08,187 INFO  fetcher.Fetcher - QueueFeeder finished: total 40 records + hit by time limit :0 3571 2014-12-11 10:40:08,326 INFO  fetcher.Fetcher - Using queue mode : byHost 3572 2014-12-11 10:40:08,327 INFO  fetcher.Fetcher - Using queue mode : byHost 3573 2014-12-11 10:40:08,327 INFO  fetcher.Fetcher - fetching http://hxjh.zqgame.com/ (queue crawl delay=5000ms) 3574 2014-12-11 10:40:08,328 INFO  fetcher.Fetcher - fetching http://lt.zqgame.com/ (queue crawl delay=5000ms) 3575 2014-12-11 10:40:08,328 INFO  fetcher.Fetcher - Using queue mode : byHost 3576 2014-12-11 10:40:08,328 INFO  fetcher.Fetcher - fetching http://zscq.zqgame.com/ (queue crawl delay=5000ms) 3577 2014-12-11 10:40:08,328 INFO  fetcher.Fetcher - Using queue mode : byHost 3578 2014-12-11 10:40:08,329 INFO  fetcher.Fetcher - fetching http://lj2.zqgame.com/ (queue crawl delay=5000ms) 3523 2014-12-11 10:39:58,434 INFO  solr.SolrIndexWriter - Indexing 1 documents 3524 2014-12-11 10:39:59,776 INFO  solr.SolrMappingReader - source: content dest: content 3525 2014-12-11 10:39:59,776 INFO  solr.SolrMappingReader - source: title dest: title 3526 2014-12-11 10:39:59,776 INFO  solr.SolrMappingReader - source: host dest: host 3527 2014-12-11 10:39:59,776 INFO  solr.SolrMappingReader - source: segment dest: segment 3528 2014-12-11 10:39:59,776 INFO  solr.SolrMappingReader - source: boost dest: boost 3529 2014-12-11 10:39:59,776 INFO  solr.SolrMappingReader - source: digest dest: digest 3530 2014-12-11 10:39:59,776 INFO  solr.SolrMappingReader - source: tstamp dest: tstamp 3531 2014-12-11 10:40:00,130 INFO  indexer.IndexingJob - Indexer: finished at 2014-12-11 10:40:00, elapsed: 00:00:03 3532 2014-12-11 10:40:00,830 INFO  indexer.CleaningJob - CleaningJob: starting at 2014-12-11 10:40:00 3533 2014-12-11 10:40:01,101 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 3534 2014-12-11 10:40:01,748 INFO  indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter 3535 2014-12-11 10:40:01,775 INFO  solr.SolrMappingReader - source: content dest: content14550 2014-12-11 10:59:29,551 INFO  fetcher.Fetcher - fetching http://pay.zqgame.com/pay/toPayPage/dxpc/107 (queue crawl delay=5000ms)14551 2014-12-11 10:59:29,703 INFO  fetcher.Fetcher - -activeThreads=50, spinWaiting=49, fetchQueues.totalSize=49, fetchQueues.getQueueCount=1


0 0