mapreduce系列(8)--自定义GroupingComparator
来源:互联网 发布:python爬虫框架 知乎 编辑:程序博客网 时间:2024/06/04 19:25
一、概述
GroupingComparator是在reduce阶段分组来使用的,由于reduce阶段,如果key相同的一组,只取第一个key作为key,迭代所有的values。如果reduce的key是自定义的bean,我们只需要bean里面的某个属性相同就认为这样的key是相同的,这是我们就需要之定义GroupCoparator来“欺骗”reduce了。 我们需要理清楚的还有map阶段你的几个自定义:parttioner中的getPartition()这个是map阶段自定义分区,bean中定义CopmareTo()是在溢出和merge时用来来排序的。
二、代码
自定义OrderBean.java
package groupingcomparator;import java.io.DataInput;import java.io.DataOutput;import java.io.IOException;import org.apache.hadoop.io.DoubleWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.io.WritableComparable;/** * 自定义bean * @author tianjun */public class OrderBean implements WritableComparable<OrderBean>{ private Text itemid; private DoubleWritable amount; public OrderBean() { } public OrderBean(Text itemid, DoubleWritable amount) { set(itemid, amount); } public void set(Text itemid, DoubleWritable amount) { this.itemid = itemid; this.amount = amount; } public Text getItemid() { return itemid; } public DoubleWritable getAmount() { return amount; } @Override public int compareTo(OrderBean o) { int cmp = this.itemid.compareTo(o.getItemid()); if (cmp == 0) { cmp = -this.amount.compareTo(o.getAmount()); } return cmp; } @Override public void write(DataOutput out) throws IOException { out.writeUTF(itemid.toString()); out.writeDouble(amount.get()); } @Override public void readFields(DataInput in) throws IOException { String readUTF = in.readUTF(); double readDouble = in.readDouble(); this.itemid = new Text(readUTF); this.amount= new DoubleWritable(readDouble); } @Override public String toString() { return itemid.toString() + "\t" + amount.get(); }}
自定义分区ItemIdPartitioner.java
package groupingcomparator;import org.apache.hadoop.io.NullWritable;import org.apache.hadoop.mapreduce.Partitioner;public class ItemIdPartitioner extends Partitioner<OrderBean, NullWritable>{ @Override public int getPartition(OrderBean bean, NullWritable value, int numReduceTasks) { //相同id的订单bean,会发往相同的partition //而且,产生的分区数,是会跟用户设置的reduce task数保持一致 return (bean.getItemid().hashCode() & Integer.MAX_VALUE) % numReduceTasks; }}
欺骗reduce的分组ItemidGroupingComparator.java
package groupingcomparator;import org.apache.hadoop.io.WritableComparable;import org.apache.hadoop.io.WritableComparator;/** * 利用reduce端的GroupingComparator来实现将一组bean看成相同的key * @author tianjun * */public class ItemidGroupingComparator extends WritableComparator { //传入作为key的bean的class类型,以及制定需要让框架做反射获取实例对象 protected ItemidGroupingComparator() { super(OrderBean.class, true); } @Override public int compare(WritableComparable a, WritableComparable b) { OrderBean abean = (OrderBean) a; OrderBean bbean = (OrderBean) b; //比较两个bean时,指定只比较bean中的orderid return abean.getItemid().compareTo(bbean.getItemid()); }}
mr程序SecondarySort.java
package groupingcomparator;import java.io.IOException;import org.apache.commons.lang.StringUtils;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.DoubleWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.NullWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;/** * * @author tianjun * */public class SecondarySort { static class SecondarySortMapper extends Mapper<LongWritable, Text, OrderBean, NullWritable>{ OrderBean bean = new OrderBean(); @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String[] fields = StringUtils.split(line, ","); bean.set(new Text(fields[0]), new DoubleWritable(Double.parseDouble(fields[2]))); context.write(bean, NullWritable.get()); } } static class SecondarySortReducer extends Reducer<OrderBean, NullWritable, OrderBean, NullWritable>{ //到达reduce时,相同id的所有bean已经被看成一组,且金额最大的那个一排在第一位 @Override protected void reduce(OrderBean key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException { context.write(key, NullWritable.get()); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf); job.setJarByClass(SecondarySort.class); job.setMapperClass(SecondarySortMapper.class); job.setReducerClass(SecondarySortReducer.class); job.setOutputKeyClass(OrderBean.class); job.setOutputValueClass(NullWritable.class); FileInputFormat.setInputPaths(job, new Path("F:\\myWorkPlace\\java\\dubbo\\demo\\dubbo-demo\\mr-demo1\\src\\main\\java\\groupingcomparator\\orders.txt")); FileOutputFormat.setOutputPath(job, new Path("c:/wordcount/gpoutput")); //在此设置自定义的Groupingcomparator类 job.setGroupingComparatorClass(ItemidGroupingComparator.class); //在此设置自定义的partitioner类 job.setPartitionerClass(ItemIdPartitioner.class); job.setNumReduceTasks(2); job.waitForCompletion(true); }}
还有数据如下:
Order_0000001,Pdt_01,222.8Order_0000001,Pdt_05,25.8Order_0000002,Pdt_05,325.8Order_0000002,Pdt_03,522.8Order_0000002,Pdt_04,122.4Order_0000003,Pdt_05,222.8Order_0000003,Pdt_07,932.8Order_0000003,Pdt_06,132.8
local形式跑出来的结果如下:
part-r-00000:
Order_0000002 522.8
part-r-00001
Order_0000001 222.8Order_0000003 932.8
1 0
- mapreduce系列(8)--自定义GroupingComparator
- 自定义groupingcomparator
- MapReduce 分组GroupingComparator
- mapreduce系列(9)--自定义OutputFormat
- mapreduce系列(10)--自定义Inputformat
- Hadoop系列-MapReduce自定义排序(十三)
- Hadoop系列-MapReduce自定义Partitioner(十四)
- mapreduce GroupingComparator mapreduce排序规则和分组规则
- hadoop的mapreduce编程模型中GroupingComparator的使用
- hadoop的mapreduce编程模型中GroupingComparator的使用
- Mapreduce中的GroupingComparator应用-查询订单最大金额
- Hadoop入门之自定义groupingcomparator和outputformat的使用
- Hadoop系列-MapReduce自定义数据类型(序列化、反序列化机制)(十二)
- 关于在hadoop的mapreduce程序中使用GroupingComparator组件的注意事项
- MapReduce系列---
- mapreduce系列(7)--查找共同好友
- Hadoop系列之初识MapReduce(1)
- Hadoop读书笔记(十二)MapReduce自定义排序
- PathLocationStrategy
- 【Unity3D_UGUI速成班】——02.Image
- 在映射表中使用enum枚举
- TestNG的注释
- 跨域访问-预请求及跨域常见问题
- mapreduce系列(8)--自定义GroupingComparator
- 闭包
- MFC DestroyWindow
- GD庫 水印合成 透明底圖處理
- 标准输入和标准输出的read_write
- 监督学习与无监督学习
- 指针函数
- Android布局思考
- 知名Payload type与音视频编解码对照表