Mapreduce实例---流量汇总并按流量大小倒序排序

来源:互联网 发布:等到烟火清 知乎 编辑:程序博客网 时间:2024/06/05 12:13

一:问题介绍

给一个数据文件,文件包含手机用户的各种上网信息,求每个手机用户的总上行流量总下行流量和总流量;并且结果按总流量倒序排序

第一步:Mapreduce实例---流量汇总(flowcount)

第二步:排序。(也就是现在要做的事情)



二:代码

自定义流量类:实现WritableComparable接口

public class FlowBean implements WritableComparable<FlowBean>{private long upFlow;private long downFlow;private long sumFlow;//因为反射机制的需要,必须定义一个无参构造函数public FlowBean(){};public FlowBean(long upFlow, long downFlow) {super();this.upFlow = upFlow;this.downFlow = downFlow;this.sumFlow = upFlow+downFlow;}public void set(long upFlow, long downFlow){this.upFlow = upFlow;this.downFlow = downFlow;this.sumFlow = upFlow+downFlow;}public long getUpFlow() {return upFlow;}public void setUpFlow(long upFlow) {this.upFlow = upFlow;}public long getDownFlow() {return downFlow;}public void setDownFlow(long downFlow) {this.downFlow = downFlow;}public long getSumFlow() {return sumFlow;}public void setSumFlow(long sumFlow) {this.sumFlow = sumFlow;}/* * 反序列化方法:从数据字节流中逐个恢复出各个字段 */@Overridepublic void readFields(DataInput in) throws IOException {upFlow=in.readLong();downFlow=in.readLong();sumFlow=in.readLong();}/* * 序列化方法:将我们要传输的数据序列成字节流 */@Overridepublic void write(DataOutput out) throws IOException {out.writeLong(upFlow);out.writeLong(downFlow);out.writeLong(sumFlow);}@Overridepublic String toString() {return upFlow+"\t"+downFlow+"\t"+sumFlow;}@Overridepublic int compareTo(FlowBean o) {//倒序排序return this.sumFlow>o.getSumFlow()?-1:1;}}

mapper和reducer和job发布类三个合并:

public class FlowCountSort {/* * 这是流量汇总排序中的第二个步骤,处理的数据是汇总步骤的输出结果 */public static class FlowCountSortMapper extends Mapper<LongWritable, Text, FlowBean, Text>{@Overrideprotected void map(LongWritable key, Text value,Context context)throws IOException, InterruptedException {String line=value.toString();String[] fields = StringUtils.split(line, '\t');String phone=fields[0];long upFlow=Long.parseLong(fields[1]);long downFlow=Long.parseLong(fields[2]);// 把流量信息作为key,以便排序的结果符合我们的需求context.write(new FlowBean(upFlow,downFlow), new Text(phone));}}public static class FlowCountSortReducer extends Reducer<FlowBean, Text, Text, FlowBean>{/* * 在流量汇总排序的步骤中,reducer不需要做汇总 而且reduce方法被调用时拿到的数据就是一个kv */@Overrideprotected void reduce(FlowBean key, Iterable<Text> values,Context context)throws IOException, InterruptedException {context.write(values.iterator().next(), key);}}public static void main(String[] args) throws Exception {Job job = Job.getInstance(new Configuration());job.setJarByClass(FlowCountSort.class);job.setMapperClass(FlowCountSortMapper.class);job.setReducerClass(FlowCountSortReducer.class);job.setMapOutputKeyClass(FlowBean.class);job.setMapOutputValueClass(Text.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(FlowBean.class);/* * 注意:这里的数据文件是上一步骤流量汇总之后的结果 */FileInputFormat.setInputPaths(job, new Path(args[0]));FileOutputFormat.setOutputPath(job, new Path(args[1]));job.waitForCompletion(true);}}


三:操作流程

Mapreduce实例---分区流量汇总类似。

注意:运行jar文件时,要处理的数据文件是上一步骤流量汇总之后的结果。


结果:


1 0
原创粉丝点击