MapReduce概念（转）

来源：互联网发布：傣族泰国知乎编辑：程序博客网时间：2024/05/16 15:16

转，这个博主貌似女的，写文章很清晰呀，mapreduce的基本概念，~

参考原文：

http://code.google.com/intl/zh-CN/edu/parallel/mapreduce-tutorial.html

MapReduce其实是两个分离的概念：map和reduce。

首先看一个简单的例子。

例如，现在需要计算1w篇文章中字母‘w’的数量。这些文章以键值对(key/value)的形式存储

DocumentID(key), DocumentContent(Value)

1, "This is an article" //假设这篇文章中含有“w”字母5个

2, "This is another article" // 含有“w”字母8个

10000, "This is the last article" //含有“w”字母9个

下面是两段伪代码：

map(String key, String value):
// key: document name
// value: document contents
for each word w in value:
EmitIntermediate(w, "1");

map函数将被应用到每一个键值对。因此第一次调用为map(1, "This is a article")，最后一次是map(10000, "This is the last article")。全部map函数运行完毕后，将输出一个中间结果集：

w, "5" //第一次调用的结果

w, "8"

w, "9" //第1w次

接下来工作交给了reduce函数：

reduce(String key, Iterator values):
// key: a word
// values: a list of counts
int result = 0;
for each v in values:
result += ParseInt(v);
Emit(AsString(result));

reduce函数将被应用到每一个要查询的字母上。在此例中只有一个，"w"。此时只调用一次, reduce("w", ["5", "8"....."9"])。reduce做的仅仅是将数列中的所有数字相加，就得到了1w篇文章中w字母的个数。

过程很简单，但里面蕴含的寓意值得深思：

1，首先，因为map是对每一个键值对分别进行计算(即，map函数用来分别统计每一篇文章中w的个数)，而相互之间没有什么关联。因此map函数可以实现很高的并行度，map函数的调用可以被灵活分散到多个服务器。

map的输入一般是：(<k1, v1>)。如上例为(int DocumentID, string ArticleContent)

输出是(<k2, v2>)。如上例是(string Word, int count).

因此输出结果的key：k2通常不再是k1。k1的信息在大部分情况下并不需要，所以会被丢弃。例如我们通常不再需要DocumentID了。

2，其次，reduce函数实际的作用是汇总。此时对于字母w，reduce函数的工作已不能再被划分(只有一次调用)，因此reduce的并行度并不高。但想象一下，现在的工作是统计1w篇文章中“word“， ”hello“， ”good“....”no“等1w个单词出现的次数，就会需要1w次reduce调用。因此reduce在执行大量复杂任务时，仍然能实现很高的并行度。

reduce的输入一般是(<k2, list(v2)>)。上例中即为(string Word, list<int> count).

输出为(<k3, v3>)。在上例中reduce函数就是将list<int> sum了一下，所以k2=k3.

但并非所有的应用都是这样的。

至此，对map和reduce给出概念：

Map 函数，由用户编写，处理输入的键值对，输出一系列键值对形式的中间结果。MapReduce架构按照每一个中间结果的key，group出另一个中间结果 (即将w,"5" w, "8" ...w,"9"汇总成w,[”5“,”8“...."9"])并传递给reduce函数。

Reduce函数，也由用户编写，将键值对形式的中间结果作为输入参数。它按key将value merge到一起(可以是求和，求平均值等多种操作)，形成一个较小的结果集。

注意在实际应用中，map函数和reduce函数都可以有多个，被成为mapper和reducer。

最后是MapReduce的工作过程(非常重要！)

The MapReduce library in the user program first shards the input files into M pieces of typically 16megabytes to 64 megabytes (MB) per piece. It then starts up many copies of the program on a clusterof machines.
One of the copies of the program is special: the master. The rest are workers that are assigned workby the master. There are M map tasks and R reduce tasks to assign. The master picks idle workers and assigns each one a map task or a reduce task.
A worker who is assigned a map task reads the contents of the corresponding input shard. It parses key/value pairs out of the input data and passes each pair to the user-defined Map function. The intermediate key/value pairs produced by the Map function are buffered in memory.
Periodically, the buffered pairs are written to localdisk, partitioned into R regions by the partitioning function. The locations of these buffered pairs on the local disk are passed back to the master, who is responsible for forwarding these locations to the reduce workers.这里通常会使用一个partition函数。
When a reduce worker is notified by the master about these locations, it uses remote procedure calls to read the buffered data from the local disks of the map workers. When a reduce worker has read all intermediatedata, it sorts it by the intermediate keys so that all occurrences of the same key are grouped together. If the amount of intermediate data is too large to fit in memory, an external sort is used.
The reduce worker iterates over the sorted intermediate data and for each unique intermediate key encountered,it passes the key and the corresponding set of intermediate values to the user's Reduce function.The output of the Reduce function is appended to a final output file for this reduce partition.
When all map tasks and reduce tasks have beencompleted, the master wakes up the user program.At this point, the MapReduce call in the user program returns back to the user code.

After successful completion, the output of the MapReduce execution is available in the R output files. 注意这里是R个outputfile，也就是mapper计算出结果后，被partition函数分成的R个region。

To detect failure, the master pings every worker periodically. If no responseis received from a worker in a certain amount oftime, the master marks the worker as failed. Any maptasks completed by the worker are reset back to their initialidle state, and therefore become eligible for schedulingon other workers. Similarly, any map task or reducetask in progress on a failed worker is also reset to idleand becomes eligible for rescheduling.

Completed map tasks are re-executed when failure occurs becausetheir output is stored on the local disk(s) of thefailed machine and is therefore inaccessible. Completedreduce tasks do not need to be re-executed since theiroutput is stored in a global fille system.

Hadoop就是当前最流行的开源MapReduce框架。