MapReduce: Simplified Data Processing on Large Clusters 论文笔记
来源:互联网 发布:王家卫表白方式 知乎 编辑:程序博客网 时间:2024/05/22 13:53
Why do it
The issues of how to parallelize the computation, distribute the data, and handle failures conspire to obscure the original simple computation with large amounts of complex code to deal with these issues.
Programming Model
Map
Map, written by the user, takes an input pair and produces a set of intermediate key/value pairs. The MapReduce library groups together all intermediate values associated with the same intermediate key I and passes them to the Reduce function.
Reduce
The Reduce function, also written by the user, accepts an intermediate key I and a set of values for that key. It merges together these values to form a possibly smaller set of values. The intermediate values are supplied to the user’s reduce function via an iterator. This allows us to handle lists of values that are too large to fit in memory.
Execution overview
Conclusions
why this model is success
- the model is easy to use, even for programmers without experience with parallel and distributed systems, since it hides the details of parallelization, fault-tolerance, locality optimization, and load balancing.
- a large variety of problems are easily expressible as MapReduce computations.
- we have developed an implementation of MapReduce that scales to large clusters of machines comprising thousands of machines
Experiences
- restricting the programming model makes it easy to parallelize and distribute computations and to make such computations fault-tolerant.
- network bandwidth is a scarce resource, the locality optimization allows us to read data from local disks, and writing a single copy of the intermediate data to local disk saves network bandwidth.
- redundant execution can be used to reduce the impact of slow machines, and to handle machine failures and data loss.
- 论文阅读笔记 - MapReduce : Simplified Data Processing on Large Clusters
- MapReduce: Simplified Data Processing on Large Clusters 论文笔记
- MapReduce: Simplified Data Processing on Large Clusters
- MapReduce:Simplified Data Processing On Large Clusters
- MapReduce: Simplified Data Processing on Large Clusters
- MapReduce: Simplified Data Processing on Large Clusters
- MapReduce: Simplified Data Processing on Large Clusters
- Google分布式系统三大论文(三)MapReduce: Simplified Data Processing on Large Clusters
- MapReduce: Simplified Data Processing on Large Clusters(转并改)
- MapReduce: Simplified Data Processing on Large Clusters 中文翻译 1
- MapReduce: Simplified Data Processing on Large Clusters 中文翻译 2
- MapReduce: Simplified Data Processing on Large Clusters 中文翻译 3
- MapReduce: Simplified Data Processing on Large Clusters 中文翻译 4
- [翻译]MapReduce: Simplified Data Processing on Large Clusters
- Simplified Data Processing On Large Clusters
- 《MapReduce: Simplified Data Processing on Large Cluster 》论文翻译
- Google MapReduce:Simpli ed Data Processing on Large Clusters
- MapReduce :Simpliyed Data Processing on Large Clusters 总结
- Kubernetes系列05:深入掌握Service
- 爬取豆瓣电影推荐排行榜
- 奇正偶负和
- Yii2.0 对比 Yii1.1 的重大改进
- Spark的Worker/Instance /Executor之间的
- MapReduce: Simplified Data Processing on Large Clusters 论文笔记
- scrapy关于登录和更多页面的演示
- dfs bfs
- String比较
- TensorFlow配置
- 生日蜡烛
- ELK实时日志分析平台环境部署--完整记录
- linux里的touch命令详解
- 基于RISC-V架构的开源处理器及SoC研究综述