云计算中的重要概念:MapReduce
来源:互联网 发布:centos burpsuite 编辑:程序博客网 时间:2024/04/30 16:08
转自wiki http://en.wikipedia.org/wiki/Map_reduce
MapReduce is a framework for processing huge datasets on certain kinds of distributable problems using a large number of computers (nodes), collectively referred to as a cluster. Computational processing can occur on data stored either in a filesystem (unstructured) or within a database (structured).
"Map" step: The master node takes the input, chops it up into smaller sub-problems, and distributes those to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes that smaller problem, and passes the answer back to its master node.
"Reduce" step: The master node then takes the answers to all the sub-problems and combines them in a way to get the output - the answer to the problem it was originally trying to solve.
The advantage of MapReduce is that it allows for distributed processing of the map and reduction operations. Provided each mapping operation is independent of the other, all maps can be performed in parallel - though in practice it is limited by the data source and/or the number of CPUs near that data. Similarly, a set of 'reducers' can perform the reduction phase - all that is required is that all outputs of the map operation which share the same key are presented to the same reducer, at the same time. While this process can often appear inefficient compared to algorithms that are more sequential, MapReduce can be applied to significantly larger datasets than that which "commodity" servers can handle - a large server farm can use MapReduce to sort a petabyte of data in only a few hours. The parallelism also offers some possibility of recovering from partial failure of servers or storage during the operation: if one mapper or reducer fails, the work can be rescheduled — assuming the input data is still available.
- 云计算中的重要概念:MapReduce
- 云计算中的一些重要概念
- 整数中的重要概念
- libjingle中的重要概念
- Camel中的重要概念
- 神经网络中的重要概念
- AOP中的几个重要概念
- 计算机网络中的几个重要概念
- 操作系统中的一些重要概念
- HEVC中的几个重要概念
- JavaWeb开发中的重要概念
- Java中的一些重要概念
- 云计算重要网址
- rtp/rtcp中的一些重要概念
- Analysis Services MDX 中的重要概念
- Chrome扩展中的重要概念:Content Scripts
- Analysis Services MDX 中的重要概念
- 介绍J2ME编程中的几个重要概念
- C#代码关闭Windows XP
- 程序调试经验
- Struts2自定义拦截器实例—Session超时的处理
- 杨中科:我的大学生活 转载
- 利用WebClient类向服务器上载文件
- 云计算中的重要概念:MapReduce
- 仅通过崩溃地址找出源代码的出错行
- C#实时获取CPU利用率
- jasperreport实现Html、Pdf、Rtf、Excel、Xml报表导出
- 网页中滑动导航菜单制作
- GEF 学习系列之一:给画布添加标尺和辅助线
- 搜索专练
- Flex学习笔记_06 使用行为对象和动画效果_认识行为对象、行为和组件
- HTTP POST GET 本质区别详解