Mapreduce pattern(chapter3)
来源:互联网 发布:linux 看tomcat日志 编辑:程序博客网 时间:2024/05/03 22:26
A single reducer getting a lot of data is bad for a few reasons:
单独一个需要大量数据的reduce任务所带来的问题
1 The sort can become an expensive operation when it has too many records and has to do most of the sorting on local disk, instead of memory;
当数据量很大并且需要在磁盘进行排序的情况下,这种操作是十分耗费资源的;
2 The host where the reduce is running will receive a lot of data over the network, which may create a network resource hot spot for that single host.
并且,对于reduce 任务运行的那个节点而言,将会耗费很多网络资源去获取需要输入的数据;
3 Naturally, scanning through the data in the reduce will take a long time, if there are many records to look through;
因此,要遍历所有的输入记录,将会耗费很多时间;
4 Any sort of memory growth in the reducer has the possibility of blowing through the Java Virtual Machine's memory, for example, if you are all of the values into an ArrayList to perform the median,that ArrayList can grow very big. This will not be a particular problem if you are looking for the top ten items, but if you want to extract from a very large number, you may run into memory limits.
Reduce 端的任何形式的内存增长,都可能对所在节点的JVM的内存使用造成影响,例如,你尝试将所有的value放进一个ArrayList中,从而计算出其中位数,这将会导致将所有的values都要加载进内存中,并且,这个ArrayList 的规模可能会很庞大。如果你想计算出top 10这类的问题,那么上述提到的问题并不鲜见,因此,在这样的情况下,内存资源将会成为reduce的瓶颈;
5 Writes to the output file are not paralleled. Writing to the locally attached disk can be a more expensive operation in reduce phase, when we are dealing with a lot of data. Since there is only one reducer, we are not taking advantage of the parallelism involved in writing data to several hosts, or even several disks on the same host. Again, this is not an issue for top 10, but a become a factor when the data extracts are very large.
另外一个瓶颈是在写输出数据的时候,无法使用并行化的方式。对于reduce阶段,当数据量很大时,向本地磁盘写数据是一种更加耗费资源的操作。由于只使用了一个reducer我们病没有实现写数据的并行操作,这并不只是对于top 10这类问题存在的,当数据量很大时,这个瓶颈就会出现。
- Mapreduce pattern(chapter3)
- chapter3
- chapter3
- [MapReduce]Filter Pattern
- [primer]chapter3
- chapter3习题
- 【OReilly : Java Swing 】Chapter3
- CCNA-Cisco-Chapter3
- CCNA-Cisco-Chapter3
- CCNA2-Cisco-Chapter3
- Bash-Beginnners-Guide chapter3
- 【C++】Chapter3:装饰模式
- Chapter3: Resource Management
- 【Chapter3】LWP的do_GET
- Struts2 Chapter3 拦截器
- ARINC619 COP(chapter3)简介
- java编程思想chapter3
- chapter3 表单和框架
- 2016年上半年考试计划
- eclipse打开文件所在文件夹的方法
- windows7中的“mklink命令”
- [Android实例] 类似地震波向外扩散的自定义控件
- Zebra POS打印机Wifi无线打印方案和Android实现
- Mapreduce pattern(chapter3)
- VS2010 MFC 生成的安装程序总是多字节界面
- java 根据输入的日期返回日期中的年份月份,格式为“2009年3月”
- 区分 DTD XSD XPath
- 4步教你开发风控评分模型
- 提交代码到github
- linux 安装memcached libmemcached libevent freetds
- 多路径(multi-path)介绍及使用 (hp刀片机适用)
- Chrome开发者工具不完全指南:(三、性能篇)