Hadoop自定义Writable实现二次排序
来源:互联网 发布:李鸿章 袁世凯 知乎 编辑:程序博客网 时间:2024/04/28 18:12
输入数据集
20,75,cqu
25,90,cqnu
20,70,cqupt
24,80,cquk
二次排序功能
先按第一列数字排序,再按第二列数字排序
输出结果
20,70 cqupt
20,75 cqu
24,80 cquk
25,90 cqnu
实现原理
因为MapReduce的输出是按key排序的,所以,我们可以自定义一个key,这个key包含第一列和第二列。在实现compareTo方法时,先按第一列排序,再按第二列排序。
实现代码
import java.io.DataInput;import java.io.DataOutput;import java.io.IOException;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.io.WritableComparable;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat;import org.apache.hadoop.mapreduce.lib.input.MultipleInputs;import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;public class TestJoin { public static class IntPairWritable implements WritableComparable<IntPairWritable>{ private IntWritable first; IntWritable second; public IntPairWritable(){ set(new IntWritable(), new IntWritable()); } public void set(IntWritable first, IntWritable second){ this.first = first; this.second = second; } @Override public void write(DataOutput out) throws IOException { this.first.write(out); this.second.write(out); } @Override public void readFields(DataInput in) throws IOException { this.first.readFields(in); this.second.readFields(in); } @Override public int compareTo(IntPairWritable o) { int result = this.first.compareTo(o.first); if(result != 0) return result; return this.second.compareTo(o.second); } public boolean equals(Object o){ if(o instanceof IntPairWritable){ IntPairWritable obj = (IntPairWritable)o; return this.first.equals(obj.first) && this.second.equals(obj.second); } return false; } public String toString(){ return this.first.toString() + "," + this.second.toString(); } } public static class SecondSortMapper extends Mapper<LongWritable, Text , IntPairWritable , Text>{ public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{ String array[] = value.toString().split(","); IntPairWritable keyPair = new IntPairWritable(); keyPair.set(new IntWritable(Integer.valueOf(array[0])),new IntWritable( Integer.valueOf(array[1]))); context.write(keyPair, new Text(array[2])); } } public static class TestJoinReducer extends Reducer<IntPairWritable, Text, IntPairWritable, Text>{ public void reduce(IntPairWritable key, Iterable<Text> value, Context context) throws IOException, InterruptedException{ for(Text v : value){ context.write(key, v); } } } public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException { if(args.length < 2){ System.out.println("args must be three");return ; } Configuration conf = new Configuration(); Job job = Job.getInstance(conf,"TestJoin"); job.setJarByClass(TestJoin.class); job.setMapperClass(SecondSortMapper.class); job.setReducerClass(TestJoinReducer.class); job.setOutputKeyClass(IntPairWritable.class); // job.setOutputValueClass(Text.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[2])); System.exit(job.waitForCompletion(true)?0:1); }}
注意事项
在自定义Writable时,要实现的方法有read,write,compareTo三个方法。
除此之外,还要重写toString方法,以便于key可以输出内容。
0 0
- Hadoop自定义Writable实现二次排序
- Hadoop 二次排序实现
- Hadoop如何实现自定义的Writable
- hadoop二次排序实现join
- 自定义Hadoop Writable
- Hadoop 自定义Writable NullPointerException
- Hadoop 自定义Writable NullpointerException
- hadoop 自定义Writable
- Hadoop(11) 自定义Writable
- MapReduce实现自定义二次排序
- hadoop writable的实现
- Hadoop 实现Writable接口
- Hadoop和Spark分别实现二次排序
- Hadoop和Spark分别实现二次排序
- hadoop 二次排序join的实现
- 使用Hadoop和Spark实现二次排序
- Writable实现类 与 自定义Writable接口
- Hadoop之MapReduce自定义二次排序流程实例详解
- NSString的详细用法(函数说明)
- 位运算及其应用
- 树莓派hwclock命令参数及用法详解--linux显示/设置硬件时钟命令
- 写css 的时候可以自动刷新界面的js
- 黑马程序员日记-16
- Hadoop自定义Writable实现二次排序
- Basic Calculator II | LeetCode 48ms C++ Solution
- 《Effective Java》笔记
- 转帖:PLSQL怎么创建oracle数据库用户
- Java super关键字总结
- 使用Andbase框架实现屏幕适配
- 用“计算机”,访问FTP
- JAVA环境变量设置
- 扩展欧几里得模板 poj-C Looooops