Hadoop1.x MapReduce 实现二次排序 实现WritableComparable接口
来源:互联网 发布:出租朋友圈软件 编辑:程序博客网 时间:2024/05/16 19:28
一、前言
利用MapReduce来实现,首先按照第一列升序排列,当第一列相同时,第二列升序排列 3 3 3 2 3 1 2 2 2 1 1 1-------------------------------------预期结果 1 1 2 1 2 2 3 1 3 2 3 3
主要思路:
因为map输出的 <key,value>
是按照key来排序,value不能参与排序,所以这里就自定义一个key 其实现WritableComparable类,具体自定义方式见代码中的NewK2的实现部分。
二、代码
package sort;import java.io.DataInput;import java.io.DataOutput;import java.io.IOException;import java.net.URI;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.FileSystem;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.io.WritableComparable;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;import org.apache.hadoop.mapreduce.lib.partition.HashPartitioner;public class sort { static final String INPUT_PATH = "hdfs://hadoop1:9000/input"; static final String OUT_PATH = "hdfs://hadoop1:9000/out"; public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); final FileSystem fileSystem = FileSystem.get(new URI(INPUT_PATH), conf); if(fileSystem.exists(new Path(OUT_PATH))){ fileSystem.delete(new Path(OUT_PATH), true); } final Job job = new Job(conf,sort.class.getSimpleName()); //指定输入目录 FileInputFormat.setInputPaths(job, new Path(INPUT_PATH)); //指定输入数据进行格式化的类 job.setInputFormatClass(TextInputFormat.class); //指定自定义Mapper类 job.setMapperClass(MyMapper.class); //指定Mapper输出的key,value类型 job.setMapOutputKeyClass(NewK2.class); job.setMapOutputValueClass(LongWritable.class); //分区 job.setPartitionerClass(HashPartitioner.class); job.setNumReduceTasks(1); //指定自定义的Reducer类 job.setReducerClass(MyReducer.class); job.setOutputKeyClass(LongWritable.class); job.setOutputValueClass(LongWritable.class); //指定输出目录 FileOutputFormat.setOutputPath(job, new Path(OUT_PATH)); //指定输出的格式化类 job.setOutputFormatClass(TextOutputFormat.class); //将整个作业提交给JobTracker job.waitForCompletion(true); } static class MyMapper extends Mapper<LongWritable, Text, NewK2, LongWritable>{ @Override protected void map(LongWritable key, Text v1, Mapper<LongWritable, Text, NewK2, LongWritable>.Context context) throws IOException, InterruptedException { String[] splited = v1.toString().split("\t"); final long k2Long = Long.parseLong(splited[0]); final long v2Long = Long.parseLong(splited[1]); NewK2 k2 = new NewK2(k2Long,v2Long); context.write(k2, new LongWritable(v2Long)); } } static class MyReducer extends Reducer<NewK2, LongWritable, LongWritable, LongWritable>{ @Override protected void reduce( NewK2 k2, Iterable<LongWritable> v2s, Reducer<NewK2, LongWritable, LongWritable, LongWritable>.Context context) throws IOException, InterruptedException { context.write(new LongWritable(k2.first), new LongWritable(k2.second)); } } static class NewK2 implements WritableComparable<NewK2>{ Long first;//第一列数 Long second;//第二列数 public NewK2(){} public NewK2(Long first,Long second){ this.first = first; this.second = second; } @Override public void readFields(DataInput in) throws IOException { this.first = in.readLong(); this.second = in.readLong(); } @Override public void write(DataOutput out) throws IOException { out.writeLong(first); out.writeLong(second); } /** * key排序是会调用该方法 * 如果当第一列不同时,第一列升序,当第一列相同时,第二列升序 */ @Override public int compareTo(NewK2 o) { final long minus = this.first - o.first; if(minus != 0){ return (int)minus; } return (int)(this.second - o.second); } @Override public int hashCode() { return this.first.hashCode() + this.second.hashCode(); } @Override public boolean equals(Object obj) { if(!(obj instanceof NewK2)){ return false; } NewK2 ok2 = (NewK2)obj; return (this.first == ok2.first) && (this.second == ok2.second); } }}
阅读全文
1 0
- Hadoop1.x MapReduce 实现二次排序 实现WritableComparable接口
- MapReduce实现自定义二次排序
- Hadoop mapreduce自定义排序WritableComparable
- Hadoop mapreduce自定义排序WritableComparable
- (Hadoop学习-2)mapreduce实现二次排序
- Day19 实现二次排序
- spark 二次排序实现
- Hadoop 二次排序实现
- Hadoop——自定义数据类型,实现WritableComparable, 并且 分组,排序
- Hadoop 实现WritableComparable的类
- mapreduce二次排序
- MapReduce中二次排序
- MapReduce中的二次排序
- MapReduce之二次排序
- mapreduce二次排序
- MapReduce中的二次排序
- MapReduce二次排序。
- MapReduce中的二次排序
- 【POJ 1611】 The Suspects(并查集练习)
- 2017.08.02小结
- samsung 7420 在UFS烧录系统流程(ufs不存在)
- 各领域公开数据集下载 | 资源
- 如何成为一名优秀的C++程序员
- Hadoop1.x MapReduce 实现二次排序 实现WritableComparable接口
- 改变CSS世界纵横规则的writing-mode属性
- Ext.data.proxy.Ajax 的CRUD操作及sync()的作用
- OpenCV学习之直方图匹配与平面划分
- 7种表格样式
- ubuntu下使用qemu安装虚拟机并配置桥接网络
- 快速幂
- 【图像处理】时域、频域,空间域
- PHP date()函数警告: It is not safe to rely on the system解决方法