Hadoop、Pig、Hive、Storm、NoSQL 学习资源收集【Updating】

来源:互联网 发布:asp php 哪个好 编辑:程序博客网 时间:2024/06/14 02:11

目录[-]

  • (一)hadoop 相关安装部署
  • (二)hive
  • (三)pig
  • (四)hadoop原理与编码
  • (五)数据仓库与挖掘
  • (六)Oozie工作流
  • (七)HBase
  • (八)flume
  • (九)sqoop
  • (十)ZooKeeper
  • (十一)NOSQL
  • (十二)Hadoop 监控与管理
  • (十三)Storm
  • (十四)YARN & Hadoop 2.0
  • (十五)hadoop 数据平台架构
  • 附:
  • (一)hadoop 相关安装部署

    1、hadoop在windows cygwin下的部署:

     http://lib.open-open.com/view/1333428291655

    http://blog.csdn.net/ruby97/article/details/7423088

    http://blog.csdn.net/savechina/article/details/5656937

    2、hadoop 伪分布式安装:

    http://www.thegeekstuff.com/2012/02/hadoop-pseudo-distributed-installation/

    3、hadoop全分布式安装教程:

    http://hi.baidu.com/leejun_2005/item/367da95bd69f4e0ce6c4a581

    4、实战 windows7 下 eclipse 远程调试 linux hadoop

    http://my.oschina.net/leejun2005/blog/122775

    5、单台服务器上安装Hadoop和Hive十五分钟教程

    http://rdc.taobao.com/team/top/tag/hadoop-hive-%E5%8D%81%E5%88%86%E9%92%9F%E6%95%99%E7%A8%8B/

    ssh-keygen -t dsa -f ~/.ssh/id_dsa

    cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

    http://blogread.cn/it/article/6103?f=wb

    注意:

    在centos下,仅仅上述操作是不行的,还需要如下步骤:

    ?
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    sudovi /etc/ssh/sshd_config
     
    RSAAuthenticationyes 
    PubkeyAuthenticationyes 
    AuthorizedKeysFile     .ssh/authorized_keys
     
    service sshd restart
     
    注:ssh可同时支持publickey和password两种授权方式,publickey默认不开启,需要配置为yes
    如果客户端不存在.ssh/id_rsa,则使用password授权;存在则使用publickey授权;
    如果publickey授权失败,依然会继续使用password授权。不要设置 PasswordAuthentication no ,它的意思是禁止密码登录,这样就只能本机登录了!
     
    但是此时依然会报错,
    Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
     
    然后:
    vi/etc/selinux/config 
    SELINUX=disabled 
     
    chmod700 ~/.ssh
    chmod600 ~/.ssh/authorized_keys
     
    最后重启你的 linux 执行 sshlocalhost
    参考:

    http://www.linuxidc.com/Linux/2012-11/74603.htm

    http://www.360doc.com/content/12/0324/12/9324714_197225609.shtml

    https://www.centos.org/modules/newbb/viewtopic.php?topic_id=33048

    http://flysnowxf.iteye.com/blog/1567570

    8、hadoop集群搭建总结

    http://www.cnblogs.com/beanmoon/archive/2012/11/12/2767010.html

    9、Hadoop For Windows

    http://dongxicheng.org/mapreduce/hadoop-for-windows/

    10、Build and Install Hadoop 2.2 or newer on Windows

    http://wiki.apache.org/hadoop/Hadoop2OnWindows

    11、Build, Install, Configure and Run Apache Hadoop 2.2.0 in Microsoft Windows OS

    http://www.srccodes.com/p/article/38/build-install-configure-run-apache-hadoop-2.2.0-microsoft-windows-os

    12、升级cdh4到cdh5

    http://segmentfault.com/blog/javachen/1190000002532302



    (二)hive

    1、基于hive的日志统计实战:

    http://www.csdn.net/article/2010-11-28/282620

    2、Hive实例:CSDN十大常用密码

    http://blog.sina.com.cn/s/blog_62186b4601013u5z.html

    http://superlxw1234.iteye.com/blog/1528688 (安装步骤)

    (Configuring the Hive Metastore)

    http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-2-0/CDH4-Installation-Guide/cdh4ig_topic_18_4.html  

    3、hive官方教程:

    https://cwiki.apache.org/confluence/display/Hive/GettingStarted

    4、Hive 随谈(四)– Hive QL

    http://www.alidata.org/archives/581   # JOIN

    http://wenku.baidu.com/view/242260c489eb172ded63b709.html

    5、写好Hive 程序的五个提示

    http://www.alidata.org/archives/622  #排序

    6、Hadoop数据仓库工具--hive介绍(百度)

    http://wenku.baidu.com/view/90dad7659b6648d7c1c7460e.html

    7、hive 分享(淘宝网)

    http://wenku.baidu.com/view/4e4a801ca76e58fafab003b1.html

    8、hive简介(美丽说

    http://wenku.baidu.com/view/0f252121a5e9856a56126025.html

    9、Hive学习笔记(阿里巴巴

    http://wenku.baidu.com/view/233308340b4c2e3f5727632a.html

    10、Hive - 运用于hadoop的拍字节范围数据仓库(论文

    http://wenku.baidu.com/view/b5aebfe9998fcc22bcd10d8a.html

    11、Hive: SQL for Hadoop(An Essential Tool for Hadoop-based Data Warehouses)

    http://polyglotprogramming.com/papers/Hive-SQLforHadoop.pdf

    12、Programming Hive

    http://www.itpub.net/thread-1724707-1-1.html

    13、Hive 随谈(六)– Hive 的扩展特性: 

    File Format、SerDe、Map/Reduce 脚本(Transform)、UDF、UDAF

    http://www.alidata.org/archives/604

    14、hive 数据倾斜总结

    http://www.alidata.org/archives/2109

    15、用hive查询json格式的复杂数据

    http://blog.cloudera.com/blog/2012/09/analyzing-twitter-data-with-hadoop/

    https://github.com/rcongiu/Hive-JSON-Serde

    16、同事总结的hive sql 优化

    http://hbase.iteye.com/blog/1488745

    http://superlxw1234.iteye.com/blog/1564456

    http://slaytanic.blog.51cto.com/2057708/1295222 

    17、通过 thrift 接口实现 python 查询 hive 数据仓库

    http://slaytanic.blog.51cto.com/2057708/734106

    18、通过 thrift 接口实现 php 查询 hive 数据仓库(以及phpHiveAdmin简介)

    http://slaytanic.blog.51cto.com/2057708/766230

    http://slaytanic.blog.51cto.com/2057708/818721

    http://slaytanic.blog.51cto.com/2057708/1071263

    https://cwiki.apache.org/Hive/hiveclient.html

    http://csgrad.blogspot.com/2010/04/to-use-language-other-than-java-say.html

    19、Hive SQL使用和数据加载的一点总结

    http://slaytanic.blog.51cto.com/2057708/782175

    20、hive优化之——控制hive任务中的map数和reduce数

    http://superlxw1234.iteye.com/blog/1582880

    21、hive中一些实用的小技巧

    http://superlxw1234.iteye.com/blog/1565774

    22、数据仓库数据模型之:极限存储--历史拉链表

    http://superlxw1234.iteye.com/blog/1567320

    23、Programing Hive读书笔记

    http://www.gemini5201314.net/hadoop/programing-hive%E8%AF%BB%E4%B9%A6%E7%AC%94%E8%AE%B0.html

    24、数据开发技术概览(一淘数据部)

    http://blog.linezing.com/wp-content/uploads/2012/12/%E6%95%B0%E6%8D%AE%E5%BC%80%E5%8F%91%E6%8A%80%E6%9C%AF-%E5%86%B7%E5%B7%9D.pdf

    25、Hive r0.9.0中文文档(二)之联表查询Join

    http://myeyeofjava.iteye.com/blog/1703815

    26、基于Hadoop的内部海量数据服务平台(淘宝网)

    http://www.infoq.com/cn/presentations/hadoop-internal-data-service-platform

    27、hive 配置参数说明

    http://blog.csdn.net/chaoping315/article/details/8500407

    http://www.blogjava.net/changedi/archive/2013/08/13/402741.html

    http://www.blogjava.net/changedi/archive/2013/08/15/402857.html

    28、hive 调优(Hortonworks)

    http://www.slideshare.net/adammuise/2013-jul-23thughivetuningdeepdive

    29、Hive 基础之:分区、桶、Sort Merge Bucket Join(桶 join)

    http://my.oschina.net/leejun2005/blog/178631

    30、深入学习《Programing Hive》:Tuning

    http://flyingdutchman.iteye.com/blog/1871983

    31、利用SemanticAnalyzerHook来过滤不加分区条件的Hive查询

    http://blog.csdn.net/lalaguozhe/article/details/11988047


    (三)pig

    1、pig 实战

    http://www.cnblogs.com/xuqiang/archive/2011/06/06/2073601.html

    2、pig官方教程

    http://pig.apache.org/

    3、Apache Pig中文教程集合

    http://www.codelast.com/?p=4550

    4、Programming Pig

    http://ofps.oreilly.com/titles/9781449302641/index.html

    http://www.google.com.hk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCcQFjAA&url=http%3A%2F%2Fbigdata.googlecode.com%2Ffiles%2FOreilly.Programming.Pig.Sep.2011.pdf&ei=DLGDUNbcI4aTiQfus4HADQ&usg=AFQjCNGzTHIYcc2GuU6ko0TgIKm3UN9T5Q&sig2=2DZtn3yP4KVqro7xt_qAOA

    5、PigFly:hadoop 统一数据分析平台设计(淘宝)

    http://www.docin.com/p-344188827.html

    http://coderplay.iteye.com/blog/1233865

    6、用 Apache Pig 处理百万歌曲数据(cloudera)

    http://blog.cloudera.com/blog/2012/08/process-a-million-songs-with-apache-pig/

    7、Pig Latin: A Not-So-Foreign Language for Data Processing(斯坦福大学论文)

    http://infolab.stanford.edu/~usriv/papers/pig-latin.pdf

    8、Lecture 09: Parallel Databases, Big Data, Map/Reduce, Pig-Latin

    http://www.cs.washington.edu/education/courses/csep544/11au/lectures/lecture09-parallel-db.pdf

    9、Pig Queries Parsing JSON on Amazons Elastic Map Reduce Using S3 Data

    http://eric.lubow.org/2011/hadoop/pig-queries-parsing-json-on-amazons-elastic-map-reduce-using-s3-data/

    https://github.com/a-b/elephant-bird/tree/master/javadoc

    10、pig cookbook:性能调优

    http://pig.apache.org/docs/r0.7.0/cookbook.html

    http://pig.apache.org/docs/r0.10.0/perf.html#Replicated-Joins

    11、pig stream 用法:

    http://wiki.apache.org/pig/PigStreamingFunctionalSpec

    http://www.slideshare.net/charmalloc/hadoop-streaming-tutorial-with-python

    12、Analyzing Big Data with Twitter

    UC Berkeley Course Lectures: Analyzing Big Data With Twitter

    http://blogs.ischool.berkeley.edu/i290-abdt-s12/   在线观看,自备梯子

    http://www.kuaipan.cn/file/id_102542674904481817.htm  金山快盘下载

    13、Apache Pig 性能优化

    http://hbtc2012.hadooper.cn/subject/track1daijianyong3.pdf

    http://www.cnblogs.com/kemaswill/p/3226754.html

    14、Hadoop pig进阶语法

    http://www.cnblogs.com/siwei1988/archive/2012/08/06/2624912.html

    15、在java中嵌入pig:Embedding Pig In Java Programs

    http://wiki.apache.org/pig/EmbeddedPig

    16、Pig 邮件组用户精华问题汇总

    http://hakunamapdata.com/football-zero-apache-pig-hero-the-essence-from-hundreds-of-posts-from-apache-pig-user-mailing-list/



    (四)hadoop原理与编码

    1、hadoop使用中的几个小细节

    http://blog.csdn.net/needle2/article/details/6182515

    2、hadoop中map-reduce相关过程与概念的理解:更多请浏览目录

    http://hi.baidu.com/shirdrn/item/085a5518be8bfa797b5f25aa

    4、IBM developerworks:用 Hadoop 进行分布式并行编程系列, 第 1 ~3 部分

    http://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop1/

    http://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop2/index.html

    https://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop3/

    5、分布式计算开源框架Hadoop介绍

    http://www.infoq.com/cn/articles/hadoop-intro

    6、Hadoop基本流程与应用开发( Java )

    http://www.infoq.com/cn/articles/hadoop-process-develop 

    7、hadoop 源码分析

    http://caibinbupt.iteye.com/?page=2

    8、hadoop数据流、作业提交分析

    http://www.cnblogs.com/spork/archive/2010/01/11/1644346.html

    9、Hadoop管理员的十个最佳实践

    http://www.infoq.com/cn/articles/hadoop-ten-best-practice

    10、hadoop、hive源码分析及使用分享

    http://www.oratea.net/?cat=7#

    11、Hadoop计算能力调度器应用和配置(区别于默认的FIFO队列调度)

    http://www.cnblogs.com/ggjucheng/archive/2012/07/25/2608817.html

    12、浅析Hadoop 中的调度策略

    http://www.ibm.com/developerworks/cn/opensource/os-hadoop-scheduling/index.html

    http://dongxicheng.org/mapreduce/hadoop-schedulers/

    Hadoop-0.20.2公平调度器算法解析

    http://dongxicheng.org/mapreduce/hadoop-fair-scheduler/

    Hadoop计算能力调度器算法解析

    http://dongxicheng.org/mapreduce/hadoop-capacity-scheduler/

    Hadoop Capacity Scheduler配置使用记录

    http://www.cnblogs.com/panfeng412/archive/2013/03/22/hadoop-capacity-scheduler-configuration.html

    hadoop mapred-queue-acls 多队列调度配置

    http://yaoyinjie.blog.51cto.com/3189782/872294

    Hadoop资源感知调度器简介

    http://my.oschina.net/leejun2005/blog/96113

    13、hadoop作业调优参数整理及原理

    http://blog.sina.com.cn/s/blog_ae33b83901015cm9.html

    14、比较全的hadoop源码分析

    http://hbase.iteye.com/blog/1024737

    15、如何在Hadoop上编写MapReduce程序

    http://dongxicheng.org/mapreduce/writing-hadoop-programes/

    16、Hadoop学习笔记(二):从map到reduce的数据流

    http://www.cnblogs.com/beanmoon/archive/2012/12/08/2805636.html

    17、通过Hadoop的API管理Job

    http://blog.csdn.net/dajuezhao/article/details/6591058

    18、揭秘InputFormat:掌控Map Reduce任务执行的利器

    http://www.infoq.com/cn/articles/HadoopInputFormat-map-reduce

    19、Hadoop MapReduce开发最佳实践(上篇)

    http://www.infoq.com/cn/articles/MapReduce-Best-Practice-1

    20、Hadoop实例:二度人脉与好友推荐

    http://my.oschina.net/u/176897/blog/99761

    21、探索大数据分析和 Hadoop

    http://www.ibm.com/developerworks/cn/training/kp/os-kp-hadoop/index.html

    22、Hadoop关于处理大量小文件的问题和解决方法

    http://www.csdn.net/article/2010-11-22/282301?1290758216

    23、下一代 Hadoop YARN 简介:相比于MRv1,YARN的优势

    http://my.oschina.net/leejun2005/blog/97802

    24、HDFS基本知识整理

    http://www.cnblogs.com/beanmoon/archive/2012/11/23/2783966.html

    http://www.cnblogs.com/beanmoon/archive/2012/12/11/2809315.html

    25、海量小文件的存储和检索:facebook 图片存储架构

    http://www.importnew.com/3292.html

    26、Hadoop -- MapReduce过程

    http://blog.sina.com.cn/s/blog_61ef49250100uul8.html

    27、MapReduce: 详解 Shuffle 过程

    http://my.oschina.net/leejun2005/blog/73708     Shuffle过程剖析及性能优化

    http://474731198.iteye.com/blog/1635043

    http://my.oschina.net/leejun2005/blog/85974

    http://samuschen.iteye.com/blog/859975 混洗和排序

    http://www.blogjava.net/shenh062326/archive/2011/01/14/342959.html   部分执行流程

    http://wikidoop.com/wiki/Hadoop/MapReduce/Reducer     Hadoop/MapReduce/Reducer wiki

    28、Hadoop MapReduce Job性能调优——修改Map和Reduce个数

    http://irwenqiang.iteye.com/blog/1535809

    http://samuschen.iteye.com/blog/859971

    hive执行作业时reduce任务个数设置为多少合适

    http://jiedushi.blog.51cto.com/673653/602458

    29、Hadoop分布式文件系统(HDFS)可靠性的研究与优化(硕士论文)

    http://www.docin.com/p-523453291.html

    30、Apache Avro 与 Thrift 比较

    http://www.tbdata.org/archives/1307

    31、Hadoop Job Tuning(hadoop作业调优)

    http://www.searchtb.com/2010/12/hadoop-job-tuning.html

    32、mapreduce的二次排序 SecondarySort

    http://www.cnblogs.com/xuxm2007/archive/2011/09/03/2165805.html

    33、Hadoop学习总结Map-Reduce的过程解析

    http://blog.csdn.net/keda8997110/article/details/8474326

    34、Hadoop平台优化综述(一)

    http://dongxicheng.org/mapreduce/hadoop-optimization-0/

          Hadoop平台优化综述(二)

    http://dongxicheng.org/mapreduce/hadoop-optimization-1/

    35、hadoop 0.20.2版本升级到1.0.3 记录

    http://blog.pureisle.net/archives/1845.html

    36、MapReduce – 用户编程接口简介

    http://www.importnew.com/4259.html

    Hadoop入门教程(四):MR作业的提交监控、输入输出控制及特性使用 

    http://www.importnew.com/4736.html

    37、Quick Introduction To Apache Hadoop MapReduce Java API

    http://www.slideshare.net/AdamKawa/apache-hadoop-java-api

    38、中小规模Hadoop集群优化

    http://blog.csdn.net/azhao_dn/article/details/6955671

    http://blog.csdn.net/cloudeagle_bupt/article/details/8983435

    39、namenode 内部关键数据结构简介

    http://blogread.cn/it/article/2746?f=wb

    40、Mapreduce/Hadoop 在淘宝测试中的应用

          应用MapReduce制作压测利器
    http://www.taobaotest.com/blogs/2515
          HDFS性能压测工具浅析
    http://www.taobaotest.com/blogs/2517
          用云存储实现对云计算的监控
    http://www.taobaotest.com/blogs/2519

    41、Enable Multiple threads in a mapper aka MultithreadedMapper

    http://kickstarthadoop.blogspot.in/2012/02/enable-multiple-threads-in-mapper-aka.html

    42、Hadoop学习笔记:MapReduce框架详解

    http://blog.jobbole.com/84089/



    (五)数据仓库与挖掘

    1、数据仓库基础培训

    http://wenku.baidu.com/view/c788400cba1aa8114431d95b.html

    http://wenku.baidu.com/view/412b09e96294dd88d0d26bff.html

    数据仓库层次结构规范

    http://wenku.baidu.com/view/5809061da300a6c30c229f67.html

    2、数据仓库ods基础学习

    http://wenku.baidu.com/view/bb3e6263caaedd3383c4d3bf.html

    3、HBDW-PM-数据仓库基础

    http://wenku.baidu.com/view/e25bd14769eae009581bec5d.html

    4、mahout in action

    http://net.pku.edu.cn/~course/cs402/2012/book/%5BMahout.in.Action(2011)%5D.Sean.Owen.pdf

    5、数据仓库之 ETL漫谈

    http://superlxw1234.iteye.com/blog/1666960

    6、数据分析和数据挖掘的区别

    http://superlxw1234.iteye.com/blog/1708718


    (六)Oozie工作流

    1、Oozie简介

    http://www.infoq.com/cn/articles/introductionOozie 

    2、跟着示例学Oozie

    http://www.infoq.com/cn/articles/oozieexample

    3、扩展Oozie

    http://www.infoq.com/cn/articles/ExtendingOozie

    4、oozie相关安装配置与问题解决例子

    http://guoyunsky.iteye.com/category/187923

    5、oozie总结

    http://dirlt.com/oozie.html

    6、双十一后台数据分析利器 —— Apache Oozie 工作流调度系统介绍与 Tips

    http://www.abcn.net/2013/12/apache-oozie-tips.html

    7、大数据处理工作流调度系统——oozie及相关产品介绍

    http://www.chinahadoop.cn/course/19/learn#lesson/47


    (七)HBase

    1、hbase官方指南及其性能调优

    http://hbase.apache.org/book/performance.html

    http://blog.linezing.com/2012/03/hbase-performance-optimization  HBase性能优化方法总结

    http://database.51cto.com/art/201301/376723.htm   HBase性能优化的四个要点

    http://kenwublog.com/hbase-performance-tuning    HBase性能参数调优

    2、HBase技术介绍

    http://www.searchtb.com/2011/01/understanding-hbase.html

    3、HBase入门篇2-Java操作HBase例子

    http://www.javabloger.com/article/apache-hbase-shell-and-java-api-html.html

    4、hbase基本概念和hbase shell常用命令用法

    http://www.cnblogs.com/flying5/archive/2011/09/15/2178064.html

    5、 HBase简介

    http://blog.csdn.net/leeqing2011/article/details/7608261

    6、HBase 官方文档(中文版)

    http://www.yankay.com/wp-content/hbase/book.html  (0.90)

    http://abloz.com/hbase/book.html                            (0.95)

    8、hbase系统架构及数据结构

    http://blog.csdn.net/a221133/article/details/6894717

    9、[翻译] HBase存储架构

    http://www.spnguru.com/2010/07/%E7%BF%BB%E8%AF%91-hbase%E5%AD%98%E5%82%A8%E6%9E%B6%E6%9E%84/

    10、HBase存储文件格式概述

    http://forchenyun.iteye.com/blog/828549

    11、Hbase, Hive and Pig 介绍(肯特大学)

    http://www.cs.kent.edu/~jin/Cloud12Spring/HbaseHivePig.pptx

    12、python 调用HBase 实例

    http://hbase.iteye.com/blog/1178063

    13、hbase在淘宝的应用和优化小结

    http://walkoven.com/hbase%20optimization%20and%20apply%20summary%20in%20taobao.pdf

    14、hbase伪分布式安装指南:

    http://my.oschina.net/leejun2005/blog/91952

    15、HBase上关于CMS、GC碎片、大缓存的一种解决方案:Bucket Cache

    http://zjushch.iteye.com/blog/1751387   

    注:作者来自阿里,据称读性能能提升一个数量级,该patch已被hbase社区接受。

    16、HBase 一些 tip

    http://www.blogjava.net/changedi/archive/2012/12/28/393577.html

    http://www.blogjava.net/changedi/archive/2013/01/02/393697.html  应用设计tip

    17、阿里巴巴测试团队总结的一些 Hbase 问题:

    (1)HBase 线上问题分析小记 http://www.taobaotest.com/blogs/2158

    (2)HBase Bug 知多少 http://www.taobaotest.com/blogs/2156

    (3)HBase使用中几个容易犯的小错误 http://www.taobaotest.com/blogs/2312

    18、为Hbase建立高可用性多主节点

    http://www.importnew.com/3020.html

    19、HBase二级索引与Join

    http://rdc.taobao.com/team/jm/archives/951

    20、HBase二级索引方案总结

    http://blog.sina.com.cn/s/blog_4a1f59bf01018apd.html

    21、Hbase存储架构(整理)

    http://asyty.iteye.com/blog/1250301

    22、HBase框架简介(整理)

    http://asyty.iteye.com/blog/1250273

    23、HBase列族高级配置

    http://blog.sina.com.cn/s/blog_ae33b83901018euz.html

    24、HBase Administration, Performance Tuning

    http://www.packtpub.com/article/hbase-basic-performance-tuning

    25、阿里hbase业务设计实践

    http://club.alibabatech.org/resource_detail.htm?topicId=89

    26、HBase业务实践(淘宝)

    http://rdc.taobao.org/?p=457

    27、HBase Architecture(译)

    http://duanple.blog.163.com/blog/static/70971767201191661620641/    上

    http://duanple.blog.163.com/blog/static/709717672011923111743139/  中

    http://duanple.blog.163.com/blog/static/709717672011925102028874/  下

    28、HBase性能深度分析

    http://www.programmer.com.cn/7246/

    29、HBase in 2013:HBase 新特性介绍

    http://yanbohappy.sinaapp.com/?p=434

    30、HBase写数据过程

    http://www.csdn.net/article/2014-01-27/2818283

    31、使用 HBase coprocessor 进行 Region Server 端的聚合计算

    (1)使用HBase EndPoint(coprocessor)进行计算  http://www.searchtb.com/2014/03/using-hbase-endpoint.html

    (2)HBase 利用Coprocessor实现聚合函数  http://www.coderli.com/hbase-coprocessor-aggragateimplementation

    (3)HBase coprocessor使用  http://weikey.me/articles/155.html

    (4)玩转HBase: Coprocessor Endpoint (2):coprocessorProxy和coprocessorExec的合理运用

            http://blog.csdn.net/tntzbzc/article/details/8918463



    (八)flume

    1、Flume日志收集 原理与实践

    http://my.oschina.net/longniao/blog/93662

    flume 真正分布式配置方法

    http://hi.baidu.com/izouying/item/6e7f87248df30a0b76272c24

    Flume——安装与配置 

    http://blog.chinaunix.net/uid-26711636-id-3155236.html

    http://log.medcl.net/item/2012/03/flume-build-process/

    http://f.dataguru.cn/thread-48324-1-1.html

    flume总体集群建设方案

    http://wenku.baidu.com/view/5f457188a0116c175f0e48a0.html

    2、官方文档:

    http://flume.apache.org/FlumeUserGuide.html

    3、Flume NG 配置

    http://marsorp.iteye.com/blog/1561286

    http://blog.csdn.net/hijk139/article/details/8308224

    http://heipark.iteye.com/blog/1617995

    4、flume概念

    http://www.verydemo.com/demo_c89_i41415.html

    5、flume-ng如何根据源文件名输出到HDFS文件名

    http://abloz.com/2013/02/19/flume-ng-output-according-to-the-source-file-name-to-the-hdfs-file-name.html

    6、Hadoop的ETL任务—Flume使用及其优化(品友互动)

    http://wenku.baidu.com/view/ab3dfe26dd36a32d7375818c.html

    7、基于Flume的美团日志收集系统(一)架构和设计

    http://tech.meituan.com/mt-log-system-arch.html

    8、基于Flume的美团日志收集系统(二)改进和优化

    http://tech.meituan.com/mt-log-system-optimization.html



    (九)sqoop

    1、sqoop的安装、配置及使用简介

    http://blog.csdn.net/leeqing2011/article/details/7630690?utm_source=weibolife

    2、Sqoop示例

    http://baiyunl.iteye.com/blog/964254

    3、使用Sqoop在HDFS和RDBMS之间导数据

    http://www.linuxidc.com/Linux/2011-10/45080.htm

    4、Sqoop User Guide (v1.4.2)

    http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html?utm_source=weibolife#_introduction

    5、用sqoop进行mysql和hdfs系统间的数据互导

    http://abloz.com/2012/07/19/data-between-the-mysql-and-hdfs-system-of-mutual-conductance-using-sqoop.html

    6、Mysql<->sqoop<->HDFS 数据交换实验

    http://leonarding.blog.51cto.com/6045525/1092764

    7、MapReduce直接连接Mysql获取数据

    http://superlxw1234.iteye.com/blog/1880712


    (十)ZooKeeper

    1、ZooKeeper Administrator's Guide

    http://zookeeper.apache.org/doc/r3.4.3/zookeeperAdmin.html

    2、ZooKeeper快速搭建

    http://nileader.blog.51cto.com/1381108/795230

    3、ZooKeeper管理员指南——部署与管理ZooKeeper

    http://blogread.cn/it/article/5917?f=sinat

    4、Zookeeper工作原理

    http://blogread.cn/it/article/4603?f=sa

    5、分布式服务框架 Zookeeper -- 管理分布式环境中的数据

    http://www.ibm.com/developerworks/cn/opensource/os-cn-zookeeper/

    6、分布式服务框架:Zookeeper

    http://www.biaodianfu.com/zookeeper.html


    (十一)NOSQL

    1、Redis资料汇总专题

    http://blog.nosqlfan.com/html/3537.html

    2、MongoDB资料汇总专题

    http://blog.nosqlfan.com/html/3548.html

    3、NoSQL数据库笔谈

    http://sebug.net/paper/databases/nosql/Nosql.html

    4、redis入门系列

    http://www.cnblogs.com/xhan/archive/2011/02/08/1949867.html

    5、Redis经验谈

    http://www.programmer.com.cn/14577/

    6、三英战SQL:解析NoSQL的可靠性及扩展操作

    http://www.csdn.net/article/2013-01-07/2813498-availability-and-operational

    7、分布式缓存-Memcached

    http://blog.sina.com.cn/s/blog_493a845501013ei0.html

    8、Redis 设计与实现

    http://www.redisbook.com/en/latest/

    9、SQL to MongoDB Mapping Chart

    http://docs.mongodb.org/manual/reference/sql-comparison/

    10、redis 常识

    https://github.com/springside/springside4/wiki/redis

    11、NoSQL反模式 - 文档数据库篇

    http://www.yankay.com/nosql-anti-pattern-document/

    12、SQL到NOSQL的思维转变

    http://blogread.cn/it/article/3130?f=wb

    13、一致性hash算法 - consistent hashing

    http://blog.csdn.net/sparkliang/article/details/5279393

    http://www.cnblogs.com/xudong-bupt/p/3185194.html




    (十二)Hadoop 监控与管理

    1、云计算平台管理的三大利器Nagios、Ganglia和Splunk

    http://www.programmer.com.cn/11477/

    2、不一样的HBase监控系统

    http://walkoven.com/?p=140

    3、Hadoop和HBase集群的JMX监控

    http://slaytanic.blog.51cto.com/2057708/1179108

    4、hadoop 补丁升级

    http://blog.csdn.net/cloudeagle_bupt/article/details/8621078   给hadoop 0.20.2打patch补丁

    http://hi.baidu.com/hovlj_1130/item/c1ed42cc0dbbeb0dac092f5b   hadoop升级

    5、Analyzing Data with Hue and Hive

    http://blog.cloudera.com/blog/2013/04/demo-analyzing-data-with-hue-and-hive/

    6、Using Hue to Access Hive Data Through Pig

    http://blog.cloudera.com/blog/2013/08/demo-using-hue-to-access-hive-data-through-pig/



    (十三)Storm

    1、storm 简介及单机版安装指南

    http://my.oschina.net/leejun2005/blog/147607

    2、storm入门教程

    http://blog.linezing.com/category/storm-quick-start

    3、Storm应用小结

    http://www.cnblogs.com/panfeng412/tag/Storm/

    4、分布式流式处理框架:Storm

    http://www.biaodianfu.com/storm.html


    (十四)YARN & Hadoop 2.0

    1、Hadoop 1.0与Hadoop 2.0资源管理方案对比

    http://dongxicheng.org/mapreduce-nextgen/hadoop-1-and-2-resource-manage/

    2、更快、更强——解析Hadoop新一代MapReduce框架Yarn

    http://www.csdn.net/article/2014-02-10/2818355


    (十五)hadoop 数据平台架构

    1、大众点评的大数据实践

    http://www.csdn.net/article/2013-12-18/2817838-big-data-practice-in-dianping

    2、从数据收集到海量处理和实时处理(唯品会)

    http://www.infoq.com/cn/presentations/from-data-collection-to-massive-data-processing-and-real-time-processing


    附:

    1、我的百度空间(由于百度的升级门,导致许多博文丢失):

    http://hi.baidu.com/leejun_2005/archive/tag/hadoop%26%2347%3Bpig%26%2347%3Bhive

    2、想读正版,但想先试读或想读英文书籍的童鞋,搜下这个站点,这里有最新最流行的 IT 电子书:

    http://it-ebooks.info/ 

    0 0
    原创粉丝点击