ebay and hadoop
来源:互联网 发布:vue.js v bind 编辑:程序博客网 时间:2024/05/19 08:00
http://www.hadoop-blog.com/2010/12/hadoop-cluster-at-ebay.html
Hadoop cluster at Ebay
I am always curious to know how other companies are installing Hadoop clusters. How are they using its ecosystem. Since Hadoop is still relatively new, there are no best practices. Every company is implementing what they think is the best infrastructure for the Hadoop Cluster.At Hadoop NYC 2010 conference, ebay showcased there implementation of Hadoop production cluster. Following are some tidbits on ebay's implementation of Hadoop.
- JobTracker, Namenode, Zookeeper, HBase Master are all enterprise nodes running in Sun 64 bit architecture. They are running red hat linux with 72GB Ram and 4TB disks.
- There are 4000 datanodes, each running cent OS with 48 GB RAM and 10TB space
- Ganglia and Nagios are used for monitoring and alerting. Ebay is also building a custom solution to augment them.
- ETL is done using mostly Java Map Reduce programs
- Pig is used to build data pipelines
- Hive is used for AdHoc queries
- Mahout is used for Data Mining
They are toying with the idea of using Oozie to manage work flows but haven't decided to use it yet.
It looks like they are doing all the right things.
- ebay and hadoop
- ebay
- eBay open sources a big, fast SQL-on-Hadoop database
- How to Make Money Online with eBay, Yahoo!, and Google
- eBay开源了其大型、高速SQL-on-Hadoop数据库
- Apache Eagle——eBay开源分布式实时Hadoop数据安全方案
- Apache Eagle:eBay开源分布式实时Hadoop数据安全引擎
- Hadoop and Hbase and Spark
- Downloading and installing Hadoop
- Network Topology and Hadoop
- hadoop and python
- hadoop and s3
- Hadoop and the EDW
- Hadoop and Sort Benchmark
- 【Hadoop】Build and Execute
- Weka and Hadoop
- Hadoop 2.2 and Maven
- hadoop and mapreduce.
- 面试时问题
- mysqldumpslow和mysqlsla分析mysql慢查询日志
- 初级软件开发工程师:养成良好的编码习惯
- Android学习六之Service(二)
- Android再按一次后退键退出
- ebay and hadoop
- web service for java
- 第六课学习
- 线性顺序表(动态数组实现)
- ntop 抓包优化 PF_RING + TNAPI
- 如何在CSDN博客添加友情链接
- can总线学习(一)——初识can总线
- 我建立了一个独立博客请来围观指导
- SMAQ:海量数据的存储计算和查询模型(译)