hadoop权威指南 chapter2 MapReduce
来源:互联网 发布:windows如何设置锁屏 编辑:程序博客网 时间:2024/06/05 04:10
MapReduce
MapReduce is a programming model for data processing. The model is simple, yet not too simple to express useful programs in. Hadoop can run MapReduce programs written
in various languages; in this chapter, we shall look at the same program expressed in Java, Ruby, Python, and C++. Most important, MapReduce programs are inherently parallel, thus putting very large-scale data analysis into the hands of anyone with enough machines at their disposal. MapReduce comes into its own for large datasets, so let’s start by looking at one.
2.1 Analyzing the Data with Hadoop 使用Hadoop分析数据
To take advantage of the parallel processing that Hadoop provides, we need to express our query as a MapReduce job. After some local, small-scale testing, we will be able to
run it on a cluster of machines.
利用Hadoop提供的并发处理的优势,我们需要使用MapReduce job来表达一个查询,通过一个本地化、小范围的测试,我们就可以在集群机器上运行了。
2.2 Map and Reduce
MapReduce works by breaking the processing into two phases: the map phase and the reduce phase.
Each phase has key-value pairs as input and output, the types of which may be chosen by the programmer. The programmer also specifies two functions: the map function and the reduce function.
map函数和 reduce函数 输入输出键值对
2.3 Scaling Out 横向扩展
Data Flow 数据流
A MapReduce job is a unit of work that the client wants to be performed: it consists of the input data, the MapReduce program, and configuration information. Hadoop runs the job by dividing it into tasks, of which there are two types: map tasks and reduce tasks.
job 是客户端执行的一个工作单元。由输入数据、程序和配置信息组成。
- hadoop权威指南 chapter2 MapReduce
- 【Hadoop权威指南】关于MapReduce
- 【Hadoop权威指南】MapReduce的工作机制
- hadoop权威指南mapreduce的一个程序
- 《Hadoop权威指南》- 2、关于MapReduce
- CSS权威指南学习笔记--Chapter2 选择器
- Hadoop权威指南学习(一)——关于Mapreduce
- 【Hadoop权威指南】MapReduce的类型与格式
- 《hadoop权威指南》学习笔记-MapReduce应用开发(上)
- 《hadoop权威指南》学习笔记-MapReduce应用开发(下)
- 《hadoop权威指南》学习笔记-MapReduce工作机制(上)
- 笔记:Hadoop权威指南 第2章 关于MapReduce
- 笔记:Hadoop权威指南 第5章 MapReduce 应用程序开发
- 笔记:Hadoop权威指南 第8章 MapReduce 的特性
- 【Hadoop权威指南】MapReduce简介(第二天)
- Hadoop权威指南读书笔记(1) - MapReduce和HDFS简介
- 《hadoop 权威指南》 读书笔记
- 《hadoop 权威指南》 读书笔记
- [编程之美] PSet2.14 求数组的子数组之和的最大值
- 中国剩余定理
- Kafka之Java API-生产者(Producers)
- Intent传递对象(两种序列化方式Serializable/Parcelable)
- C#网络编程中
- hadoop权威指南 chapter2 MapReduce
- segue实现两个页面传值
- 关于android的权限问题
- TMXMap解析
- APUE——信号
- fstat函数
- struts1标签之<logic:iterate>
- Qtopia4.2.4移植时出现:The tslib functionality test failed!
- poj 1860&poj2240 似负环