hadoop之最大气温
来源:互联网 发布:数据结构python版 编辑:程序博客网 时间:2024/04/28 21:40
hadoop权威指南第8章第2小结介绍了通过MR求最大气温的例子。
我做了简单修改后代码如下:
输入
1995 10
1996 10
1995 5
1999 20
1999 10
1996 3
期望输出:
1995 10
1996 10
1999 20
我做了简单修改后代码如下:
输入
1995 10
1996 10
1995 5
1999 20
1999 10
1996 3
期望输出:
1995 10
1996 10
1999 20
//自定义类InitPairpublic class InitPair implements WritableComparable<InitPair>{private int year; //年份private int tmp; //气温public int getYear() {return year;}public void setYear(int year) {this.year = year;}public int getTmp() {return tmp;}public void setTmp(int tmp) {this.tmp = tmp;}@Overridepublic void write(DataOutput out) throws IOException {// TODO Auto-generated method stubout.writeInt(year);out.writeInt(tmp);}@Overridepublic void readFields(DataInput in) throws IOException {// TODO Auto-generated method stubthis.year=in.readInt();this.tmp=in.readInt();}@Overridepublic int compareTo(InitPair o) {// TODO Auto-generated method stubif(this.year>o.year){return 1;}else if(this.year==o.year){return 0;}else{return -1;}}@Overridepublic boolean equals(Object obj) {// 年份和气温都相等才返回trueInitPair o=(InitPair) obj;if(this.year==o.year && this.tmp==o.tmp){return true;}return false;}public InitPair() {super();// TODO Auto-generated constructor stub}public InitPair(int year, int tmp) {super();this.year = year;this.tmp = tmp;}@Overridepublic String toString() {// TODO Auto-generated method stubreturn this.year+ " " +this.tmp;}} //二次排序public class MaxTempByTwoSort {static class maxTempMap extends Mapper<LongWritable, Text, InitPair, NullWritable> {@Overrideprotected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {// 简单的解析String[] line = value.toString().split("\\t");if (line.length == 2) {//key为自定义的类 value为null 表示本例不用到value值context.write(new InitPair(Integer.parseInt(line[0]), Integer.parseInt(line[1])), NullWritable.get());}}}static class maxTempReduce extends Reducer<InitPair, NullWritable, InitPair, NullWritable> {@Overrideprotected void reduce(InitPair key, Iterable<NullWritable> arg1, Context context)throws IOException, InterruptedException {//只是简单的输出context.write(key, NullWritable.get());}}//分区static class FirstPartition extends Partitioner<InitPair, NullWritable> {@Overridepublic int getPartition(InitPair key, NullWritable value, int numPartitions) {//仅用年份进行分区return Math.abs(key.getYear() * 127) % numPartitions;}}//全部的排序规则,首先按年份升序排列,再按气温降序排列static class KeyComparator extends WritableComparator {protected KeyComparator() {super(InitPair.class, true);}@Overridepublic int compare(WritableComparable a, WritableComparable b) {// TODO Auto-generated method stubInitPair ip1 = (InitPair) a;InitPair ip2 = (InitPair) b;int cmp = ip1.compareTo(ip2);if (cmp != 0) {return cmp;}else{if (ip1.getTmp() > ip2.getTmp()) {return -1; //气温高表示排在前面} else{return 1;//气温低表示排在后面}}}}//分组合并规则:只根据年份进行分组,年份相同气温不同被认为是同一个keystatic class GroupComparator extends WritableComparator {protected GroupComparator() {super(InitPair.class, true);}@Overridepublic int compare(WritableComparable a, WritableComparable b) {// TODO Auto-generated method stubInitPair ip1 = (InitPair) a;InitPair ip2 = (InitPair) b;return ip1.compareTo(ip2);}}public static void main(String[] arg) throws IOException, ClassNotFoundException, InterruptedException {Job job=Job.getInstance(new Configuration());job.setJarByClass(MaxTempByTwoSort.class);//mapjob.setMapperClass(maxTempMap.class);job.setPartitionerClass(FirstPartition.class);job.setSortComparatorClass(KeyComparator.class);job.setGroupingComparatorClass(GroupComparator.class);//reducejob.setReducerClass(maxTempReduce.class);job.setOutputKeyClass(InitPair.class);job.setOutputValueClass(NullWritable.class);FileInputFormat.addInputPath(job, new Path(arg[0]));FileOutputFormat.setOutputPath(job, new Path(arg[1]));job.waitForCompletion(true);}}
输出:
1995 10
1996 10
1999 20
总结:刚开始看这个实例的时候比较迷惑的地方在于:分组后再经过reduce输出便能得到每年最大的气温。
当时对GroupComparator没有很好的理解,比如:<<1995,20>,null> <<1995,10>,null> 按照年份分组后只剩下
<<1995,20>,null>,因为我们的分组策略中只包含了年份,两个年份相同时被认为是同一组。
自然reduce输出的就是<<1995,20>,null>中的<1995,20>。
阅读全文
0 0
- hadoop之最大气温
- Hadoop之Avro mapreduce最高气温程序
- Hadoop中求最高气温
- MapReduce之气温计算
- 命令行实现hadoop上传JAR文件,运行《hadoop权威指南第二版》测试最大气温的应用
- hadoop学习笔记-5-最高气温示例MaxTemperature
- AM3354之DHT11气温湿度传感器驱动和测试
- 气温坐“过山车”原因
- 世界气温总体下降
- 随着各地气温升高
- 求每年最高气温
- python 气温查询
- hadoop安装之-hadoop
- hadoop 之Hadoop生态系统
- hadoop之hadoop配置
- 什么是Hadoop最大的挑战?
- 第五期地区的气温
- mapreduce实例,计算最高气温
- 2017 ACM/ICPC Asia Regional Shenyang Online E题
- 八月组队总结
- 创建Pydve项目报 Django not found
- 《C语言入门经典》习题7.3(仅供与某人讨论使用)
- 众多语言,学哪个好?
- hadoop之最大气温
- leetcode-个人题解2
- 显式构造函数
- DevOps工具系列简介
- LeetCode--Permutations 全排列
- java设计模式之Decorator(装饰)模式
- EL和JSTL标签库 入门
- Android Study之Material Design初体验(一)
- 生成AssetBundle