024_MapReduce中的基类Mapper和基类Reducer
来源:互联网 发布:梦幻邮箱数据 编辑:程序博客网 时间:2024/05/21 00:46
1) MapReduce中的基类Mapper类,自定义Mapper类的父类。
2) MapReduce中的基类Reducer类,自定义Reducer类的父类。
1、Mapper类
API文档
1) InputSplit输入分片,InputFormat输入格式化
2) 对Mapper输出结果进行Sorted排序和Group分组
3) 对Mapper输出结果依据Reducer个数进行分区Patition
4) 对Mapper输出数据进行Combiner
- 在Hadoop官方文档的Mapper类说明:
Maps input key/value pairs to a set of intermediate key/value pairs.
Maps are the individual tasks which transform input records into a intermediate records. The transformed intermediate records need not be of the same type as the input records. A given input pair may map to zero or many output pairs.
The Hadoop Map-Reduce framework spawns one map task for each InputSplit generated by the InputFormat for the job. Mapper implementations can access the Configuration for the job via the JobContext.getConfiguration().
The framework first calls setup(org.apache.hadoop.mapreduce.Mapper.Context), followed by map(Object, Object, Context) for each key/value pair in the InputSplit. Finally cleanup(Context) is called.
All intermediate values associated with a given output key are subsequently grouped by the framework, and passed to a Reducer to determine the final output. Users can control the sorting and grouping by specifying two key RawComparator classes.
The Mapper outputs are partitioned per Reducer. Users can control which keys (and hence records) go to which Reducer by implementing a custom Partitioner.
Users can optionally specify a combiner, via Job.setCombinerClass(Class), to perform local aggregation of the intermediate outputs, which helps to cut down the amount of data transferred from the Mapper to the Reducer.
Applications can specify if and how the intermediate outputs are to be compressed and which CompressionCodecs are to be used via the Configuration.
If the job has zero reduces then the output of the Mapper is directly written to the OutputFormat without sorting by keys.
- Mapper类的结构:
- 方法如下:
第一类:protected类型,用户根据实际需要进行覆写。
1) setup:每个任务执行前调用一次。
2) map:每个Key/Value对调用一次。
3) clearup:每个任务执行结束前调用一次。
第二类,运行的方法
run()方法,是Mapper类的入口,方法内部调用了setup()、map()、clearup()三个方法。
- 024_MapReduce中的基类Mapper和基类Reducer
- Mapper类/Reducer类中的setup方法和cleanup方法以及run方法的介绍
- Mapper类/Reducer类中的setup方法和cleanup方法以及run方法的介绍
- MapReduce中Mapper类和Reducer类4函数解析
- Hadoop mapper类和reducer类的阅读 Hadoop(1)
- MapReduce框架Mapper和Reducer类源码分析
- Mapper和Reducer入门程序
- 大数据学习记录(day5)-Hadoop之Mapper类和Reducer类代码学习
- 027_编写MapReduce的模板类Mapper、Reducer和Driver
- Hadoop Mapreduce Mapper和Reducer源码
- 多个Mapper和Reducer的Job
- Hive Mapper和Reducer的设置
- Hadoop中的进程与Mapper实例,Reducer实例
- Mapper reducer 的生命周期
- mapper,reducer,OutputFormat
- Mapper 与 Reducer 解析
- Hadoop开发周期(二):编写mapper和reducer程序
- Hadoop中Mapper和Reducer是单独进程还是线程
- 020_自己编写的wordcount程序在hadoop上面运行,不使用插件hadoop-eclipse-plugin-1.2.1.jar
- Xcode7.2中storyboard能显示tab bar的图标真机测试不显示 解决办法
- 021_在Eclipse Indigo中安装插件hadoop-eclipse-plugin-1.2.1.jar,直接运行wordcount程序
- 022_Hadoop中的数据类型(Writable、WritableComparable、Comparator、RawComparator…)
- 023_数量类型练习——Hadoop MapReduce手机流量统计
- 024_MapReduce中的基类Mapper和基类Reducer
- 025_MapReduce样例Hadoop TopKey算法
- 026_默认的MapReduce Driver(最小驱动问题)
- springframework(六)AOP之静态代理
- 027_编写MapReduce的模板类Mapper、Reducer和Driver
- 028_MapReduce中的计数器Counter的使用
- 虚拟机(VMWare)NAT 模式,配置静态IP上网的问题
- [置顶]01_Hadoop学习笔记内容说明
- 14. Longest Common Prefix