hadoop版本的对比
来源:互联网 发布:mac os sierra beta5 编辑:程序博客网 时间:2024/05/29 03:59
目前hadoop有2个开源版本,一个是Apache的版本,另一个是Cloudera在Apache的基础上进行优化的版本,也称为CDH3版。
两个版本的对比情况如下:
CDH3
版本
Apache
版本
描述
Hadoop Common
●
●
The common utilities that support the other Hadoop subprojects.
Hadoop Distributed File System (HDFS)
●
●
A distributed file system that provides high-throughput access to application data.
Hadoop MapReduce
●
●
A software framework for distributed processing of large data sets on compute clusters.
Flume
●
A distributed, reliable, and available service for efficiently moving large amounts of data as the data is produced.
Sqoop
●
A tool that imports data from relational databases into Hadoop clusters.
Hue
●
A graphical user interface to work with CDH.
Pig
●
●
A high-level data-flow language and execution framework for parallel computation.Enables you to analyze large amounts of data using Pig's query language called Pig Latin.
Hive
●
●
A data warehouse infrastructure that provides data summarization and ad hoc querying. A powerful data warehousing application built on top of Hadoop which enables you to access your data using Hive QL, a language that is similar to SQL.
HBase
●
●
A scalable, distributed database that supports structured data storage for large tables. provides large-scale tabular storage for Hadoop using the Hadoop Distributed File System (HDFS).
ZooKeeper
●
●
A high-performance coordination service for distributed applications.A highly reliable and available service that provides coordination between distributed processes.
Oozie
●
A server-based workflow engine specialized in running workflow jobs with actions that execute Hadoop jobs.
Whirr
●
Provides a fast way to run cloud services.
Snappy
●
A compression/decompression library.
Avro
●
A data serialization system.
Cassandra
●
A scalable multi-master database with no single points of failure.
Chukwa
●
A data collection system for managing large distributed systems.
Mahout
●
A Scalable machine learning and data mining library.
理论上说,CDH3版本应该支持Apache版本的全部组件及其子项目。
两个hadoop版本的异同如下:
系统
从CDH3b3开始不支持hadoop.job.ugi参数,请使用UserGroupInformation.doAs()方法代替。
其它见:https://ccp.cloudera.com/display/CDHDOC/Incompatible+Changes
安装
Cloudera CDH3基于hadoop稳定版0.20.2,并集成很多补丁(patch)。
CDH提供rpm包和tar两种方式(Cloudera更推荐使用rpm方式),hadoop0.20.2只提供了tar包安装方式。
Cloudera CDH3自动设置JAVA_HOME环境变量,apache hadoop需要手工配置。
Apache hadoop使用start/stop-dfs.sh start/stop-all.sh脚本维护集群,CDH通过root身份运行/etc/init.d/hadoop-0.20-*脚本启动、关闭服务,这种方式只可以管理当前服务器,如果希望实现类似start/stop-all.sh需要自己写脚本。
Cloudera CDH安装成功后会添加两个用户:hdfs(hdfs文件系统相关), mapred(mapreduce相关),而Apache hadoop通常的做法是添加一个hadoop用户来做所有的事情。
Cloudera CDH通过alternatives切换多个配置文件,而Apache hadoop配置文件只保存在$HADOOP_HOME/conf下面。
eclipse插件
Cloudera CDH默认没有提供eclipse插件,需要自己编译,而且它的插件和Apache hadoop插件不兼容。
安全
CDH3支持Kerberos安全认证,apache hadoop则使用简陋的用户名匹配认证。
- hadoop版本的对比
- hadoop版本的对比
- hadoop 版本功能对比
- HBase各版本对Hadoop版本的支持对比
- hadoop CDH3版和apache 0.20版本的对比
- Hadoop中新老版本MapReduce 中API对比
- 版本控制器的对比
- storm与hadoop的对比
- Hadoop 的版本关系
- Hadoop的版本介绍
- hadoop的版本演化
- hadoop的版本问题
- hadoop的版本变迁
- 基于版本对比的Debug
- 淘宝Fourinone和Hadoop的完整对比
- 淘宝Fourinone和Hadoop的完整对比
- 淘宝Fourinone和Hadoop的完整对比
- Spark与Hadoop MapReduce的对比分析
- Groovy创始人:Java面临终结 Scala将取而代之
- 【转】关于listview的一些美化
- spring+servlet 简单演示
- Scala Actor:多线程的基础学习
- Shell中的grep、awk和sed的常用命令和语法
- hadoop版本的对比
- 在浏览的地址栏中,直接调用js「javascript:alert("hello wrold");」。
- 代理容不下的一个空格
- 基于Dojo的简单IDE编辑器----DOM浏览器上的实现
- 字符串全排列
- 腾讯微信技术总监周颢:一亿用户增长背后的架构秘密
- VS2010+OpenCV2.3.1创建win32 console App 来显示一副图像
- CSS Sprites
- js 俄罗斯方块