基于MapReduce的二次排序
来源:互联网 发布:三星手机 数据恢复 编辑:程序博客网 时间:2024/06/07 03:05
1.需求
现给出一系列订单数据,要求用“mapreduce自己的排序机制”将每条订单数据中成交额最大的数据排在第一位显示出来。
数据源:
订单id 商品id 成交金额
Order_0000001Pdt_01222.8Order_0000001Pdt_0525.8Order_0000002Pdt_03522.8Order_0000002Pdt_04122.4Order_0000002Pdt_05722.4Order_0000003Pdt_01222.8
2.思路
1)利用“订单id”与“成交金额”作为联合主键(以bean的形式),如此一来可以将map阶段读取到的所有订单数据按订单id分区(利用partitioner),以金额排序(WritableComparable中的compareTo方法),并发送到reduce
2)在reduce端利用 GroupingComparator将订单id相同的<k,v>聚合成组,后之间输出
3.代码
1)OrderBean类,实现WritableComparatable接口
public class OrderBean implements WritableComparable<OrderBean> {private Text orderId;private DoubleWritable price;public OrderBean(){}public OrderBean(Text itemid, DoubleWritable amount) {set(itemid, amount);}public void set(Text orderId, DoubleWritable price) {this.orderId = orderId;this.price = price;}public Text getOrderId() {return orderId;}public void setOrderId(Text orderId) {this.orderId = orderId;}public DoubleWritable getPrice() {return price;}public void setPrice(DoubleWritable price) {this.price = price;}@Overridepublic void write(DataOutput out) throws IOException {out.writeUTF(orderId.toString());out.writeDouble(price.get());}@Overridepublic void readFields(DataInput in) throws IOException {String readUTF = in.readUTF();double readDouble = in.readDouble();this.orderId = new Text(readUTF);this.price = new DoubleWritable(readDouble);}@Overridepublic int compareTo(OrderBean o) {int cmp = this.orderId.compareTo(o.getOrderId());if(cmp == 0){//当orderId相同时cmp = -this.price.compareTo(o.getPrice()); //从大到小的逆序}return cmp;}@Overridepublic String toString() {return this.orderId.toString() + "\t" + this.price.get();}}
2)Mapper类
//拿到orderId与成交金额,并赋值到bean对象中,最后输出该对象static class SecondarySortMapper extends Mapper<LongWritable, Text, OrderBean, NullWritable>{OrderBean ob = new OrderBean();Text t = new Text();@Overrideprotected void map(LongWritable key,Text value,Context context)throws IOException, InterruptedException {String line = value.toString();String[] fields = line.split("\t");String orderId = fields[0];double price = Double.parseDouble(fields[2]);ob.set(new Text(orderId), new DoubleWritable(price));t.set(ob.toString());context.write(ob,NullWritable.get());}}
3)Partitioner类
//将不同orderId的bean交给不同的reduceTask处理public class SecondarySortPartitioner extends Partitioner<OrderBean, NullWritable>{@Overridepublic int getPartition(OrderBean key, NullWritable value, int numPartitions) {//相同id的bean 会发往相同的partition//产生的分区数会跟用户设置的reduce任务数一致return (key.g//将不同orderId的bean交给不同的reduceTask处理public class SecondarySortPartitioner extends Partitioner<OrderBean, NullWritable>{@Overridepublic int getPartition(OrderBean key, NullWritable value, int numPartitions) {//相同id的bean 会发往相同的partition//产生的分区数会跟用户设置的reduce任务数一致return (key.getOrderId().hashCode() & Integer.MAX_VALUE) % numPartitions ;}}etOrderId().hashCode() & Integer.MAX_VALUE) % numPartitions ;}}
4)GroupingComparator类
/*GroupingComparator的作用是调用reduce时对数据进行分组 * *reduce的工作机制: *reduce任务会接收map阶段输出的key与经过shuffle阶段整合过的values(集合) ,当reduce任务处理完当前的<key,values>后, *他会判断下一条记录的key是不是和当前的key在同一组中。如果是,那么reduce任务会继续处理这条记录。如果不是则当前reduce任务结束 * *话说回来,如果不用GroupingComparator的分组的话,那么同一组记录要在reduce方法中独立处理,那么有些数据可能需要传递,因此为增加复杂度。 *因此设置GroupingComparator的目的就是降低复杂度 */public class SecondarySortGC extends WritableComparator{//传入作为key的bean的class类型,以及制定要让框架作反射获取的实例对象protected SecondarySortGC() {super(OrderBean.class, true);}@Overridepublic int compare(Object a, Object b) {OrderBean abean = (OrderBean) a;OrderBean bbean = (OrderBean) b;//对两个bean作比较时,只比较他们的orderidreturn abean.getOrderId().compareTo(bbean.getOrderId());}}
5)Reducer类
static class SecondarySortReducer extends Reducer<OrderBean, NullWritable, OrderBean, NullWritable>{@Overrideprotected void reduce(OrderBean key,Iterable<NullWritable> values,Context context)throws IOException, InterruptedException {context.write(key, NullWritable.get());}}
6)main方法
public static void main(String[] args)throws Exception {Configuration conf = new Configuration();Job job = Job.getInstance(conf);job.setJarByClass(SecondarySort.class);job.setMapperClass(SecondarySortMapper.class);job.setReducerClass(SecondarySortReducer.class);job.setPartitionerClass(SecondarySortPartitioner.class);job.setGroupingComparatorClass(SecondarySortGC.class);job.setOutputKeyClass(OrderBean.class);job.setOutputValueClass(NullWritable.class);FileInputFormat.setInputPaths(job, new Path("H:/大数据/mapreduce/secondarysort/input"));FileOutputFormat.setOutputPath(job, new Path("H:/大数据/mapreduce/secondarysort/output"));job.setNumReduceTasks(3);job.waitForCompletion(true);}
4.输出
1)part-r-00000
Order_0000003222.8
2)part-r-00001
Order_0000001222.8Order_000000125.8
Order_0000002722.4Order_0000002522.8Order_0000002122.4
阅读全文
0 0
- 基于MapReduce的二次排序
- mapreduce的二次排序
- MapReduce的二次排序
- MapReduce 的二次排序
- mapreduce的二次排序 SecondarySort
- mapreduce的二次排序 SecondarySort
- mapreduce的二次排序 SecondarySort
- mapreduce的二次排序 SecondarySort
- MapReduce的二次排序 SecondarySort
- mapreduce的二次排序 SecondarySort
- mapreduce的二次排序 SecondarySort
- mapreduce的二次排序 SecondarySort
- MapReduce的排序和二次排序
- MapReduce的排序和二次排序
- MapReduce的排序和二次排序
- MapReduce的排序和二次排序
- Mapreduce中value集合的二次排序
- mapreduce的二次排序(字符型)
- Shell简单笔记
- jQuery实现二级联动
- 剑指Offer—34—第一个只出现一次的字符位置
- Unity_简易飞机大战制作(一)
- HTML5多媒体
- 基于MapReduce的二次排序
- windows8.1+VS2013下CUDA7.5配置
- 初阶一层的星号*玩法
- 网页中调用百度地图API
- 剑指Offer—35—数组中的逆序对
- 点击图片弹出文件选择框并覆盖原图功能
- python爬虫爬取《战狼Ⅱ》影评
- 布隆过滤器
- 如何确定VS编译器版本--_MSC_VER || #if _MSC_VER > 1000 #pragma once #endif