摘取官网的介绍
来源:互联网 发布:知乎 装修 编辑:程序博客网 时间:2024/06/05 20:17
What Is Apache Hadoop?
The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
The project includes these modules:
- Hadoop Common: The common utilities that support the other Hadoop modules.
- Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
- Hadoop YARN: A framework for job scheduling and cluster resource management.
- Hadoop MapReduce: A YARN-based system for parallel processing of large data sets
Overview
Hadoop MapReduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner.
A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner. The framework sorts the outputs of the maps, which are then input to the reduce tasks. Typically both the input and the output of the job are stored in a file-system. The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks.
Typically the compute nodes and the storage nodes are the same, that is, the MapReduce framework and the Hadoop Distributed File System (see HDFS Architecture Guide) are running on the same set of nodes. This configuration allows the framework to effectively schedule tasks on the nodes where data is already present, resulting in very high aggregate bandwidth across the cluster.
The MapReduce framework consists of a single master ResourceManager, one slave NodeManager per cluster-node, and MRAppMaster per application (see YARN Architecture Guide).
Minimally, applications specify the input/output locations and supply map and reduce functions via implementations of appropriate interfaces and/or abstract-classes. These, and other job parameters, comprise the job configuration.
The Hadoop job client then submits the job (jar/executable etc.) and configuration to the ResourceManager which then assumes the responsibility of distributing the software/configuration to the slaves, scheduling tasks and monitoring them, providing status and diagnostic information to the job-client.
对比
MapReduce NextGen aka YARN aka MRv2
The new architecture introduced in hadoop-0.23, divides the two major functions of the JobTracker: resource management and job life-cycle management into separate components.
The new ResourceManager manages the global assignment of compute resources to applications and the per-application ApplicationMaster manages the application’s scheduling and coordination.
An application is either a single job in the sense of classic MapReduce jobs or a DAG of such jobs.
The ResourceManager and per-machine NodeManager daemon, which manages the user processes on that machine, form the computation fabric.
The per-application ApplicationMaster is, in effect, a framework specific library and is tasked with negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the tasks.
- 摘取官网的介绍
- 【摘取】Lisa的故事
- 【摘取】敏捷计划的应用
- 从疯狂的程序员摘取的内容
- 你和高级工程师的差距(摘取)
- 摘取的es5-shim/es6-shim
- 计算两个日期之间相差的月数 (网络摘取)
- 搜索引擎的那些事(摘取价格数据)
- 【摘取】敏捷模式下的测试人员权利法案
- 51单片机串口通信,网络上摘取的代码片段
- 摘取网上关于Cocos2d—x的笔记
- Mj视频中有关UITableView知识点的摘取
- 论坛中摘取的,关于Nor flash一些问题点
- 摘取作物
- 摘取作物
- Meta 标签介绍及部分用法说明(摘取百度百科)
- 【摘取】敏捷测试,用户故事:一个关于“货物送达时间”的故事的评估实例
- 售前杂谈(从QQ群讨论中摘取的)
- Android笔记之LayoutInflater
- 三、nohup命令
- c/c++里的 堆区 栈区 静态区 文字常量区 程序代码区
- HTML5 应用缓存与Web Workers
- mybatis的sql的xml的配置文件中where条件中like的用法。
- 摘取官网的介绍
- iOS 动画
- iCarousel详解
- mysql 5.7 用户管理新特性
- Ubuntu如何开启crontab运行日志?
- 四、怎样配置unix环境变量
- RecyclerViewHeader 添加头部
- C++中引用和指针的区别
- Spring MVC 中 HandlerInterceptorAdapter的使用