mapreduce自定义分组
来源:互联网 发布:java 服务器监控 开源 编辑:程序博客网 时间:2024/05/20 23:05
//-------------------FlowSumArea.java-------------package pack4;import java.io.IOException;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import pack2.FlowBean;public class FlowSumArea { public static class FlowSumAreaMapper extends Mapper<LongWritable, Text, Text, FlowBean> { @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String[] fields = line.split("\\s+"); String phoneNB = fields[1]; long u_flow = Long.parseLong(fields[7]); long d_flow = Long.parseLong(fields[8]); context.write(new Text(phoneNB), new FlowBean(phoneNB, u_flow, d_flow)); } } public static class FlowSumAreaReducer extends Reducer<Text, FlowBean, Text, FlowBean> { @Override protected void reduce(Text key, Iterable<FlowBean> values, Context context) throws IOException, InterruptedException { long u_flow_count = 0; long d_flow_count = 0; for(FlowBean bean : values) { u_flow_count += bean.getU_flow(); d_flow_count += bean.getD_flow(); } context.write(key, new FlowBean(key.toString(), u_flow_count, d_flow_count)); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf); //设置整个job所用的那些类在哪个jar包 job.setJarByClass(FlowSumArea.class); //job使用的mapper和reducer类 job.setMapperClass(FlowSumAreaMapper.class); job.setReducerClass(FlowSumAreaReducer.class); job.setPartitionerClass(AreaPartitioner.class); job.setNumReduceTasks(6); //指定mapper输出类型 job.setMapOutputKeyClass(LongWritable.class); job.setMapOutputValueClass(Text.class); //指定reducer输出类型 job.setOutputKeyClass(Text.class); job.setOutputValueClass(FlowBean.class); //指定原始数据存放路径 FileInputFormat.setInputPaths(job,new Path("/flow/src")); //指定结果存放路径 FileOutputFormat.setOutputPath(job, new Path("/flow/out15")); //提交 System.exit(job.waitForCompletion(true) ? 0 : 1); }}
//-------------------AreaPartitioner.java-------------package pack4;import java.util.HashMap;import org.apache.hadoop.mapreduce.Partitioner;public class AreaPartitioner<KEY, VALUE> extends Partitioner<KEY, VALUE>{ private static HashMap<String, Integer> areamap = new HashMap<>(); static { areamap.put("135", 0); areamap.put("136", 1); areamap.put("137", 2); areamap.put("138", 3); areamap.put("139", 4); } @Override public int getPartition(KEY key, VALUE value, int numPartitions) { int areaCoder = areamap.get(key.toString().substring(0, 3))==null?5:areamap.get(key.toString().substring(0, 3)); return areaCoder; }}
0 0
- MapReduce自定义分组Group
- mapreduce自定义分组
- MapReduce自定义分组实现
- Hadoop mapreduce自定义分组RawComparator
- mapreduce,自定义分区,分组,排序实现join
- MapReduce的自定义排序、分区和分组
- mapreduce学习笔记-二次排序(自定义数据类型,自定义分区分组)
- mapReduce分组
- Mapreduce中的 自定义类型、分组与二次排序
- hadoop-2.7.1 MapReduce自定义分组的实现
- MongoDB MapReduce 分组统计
- MapReduce排序分组
- MapReduce实现分组排序
- MapReduce 分组GroupingComparator
- 十一、理解MapReduce的二次排序功能,包括自定义数据类型、分区、分组、排序
- 「 Hadoop」mapreduce对温度数据进行自定义排序、分组、分区等
- mapreduce,自定义排序,分区,分组实现按照年份升序排序,温度降序排序
- MapReduce框架排序和分组
- 【字典树】hdu 1247 Hat’s Words
- Android Studio征服记——软件问题记录
- xsl基础篇
- Spring框架中获取连接池的四种方式
- Urlrewritefilter使用说明
- mapreduce自定义分组
- OC学习第一章 类,对象,第一个oc程序
- 源码分析Android SystemServer进程的启动过程
- (java)leetcode Contains Duplicate
- “CObject::CObject”: 无法访问 private 成员(在“CObject”类中声明)
- Redis(一):在Mac上安装Redis
- 如何有效地描述软件缺陷(Defect)?
- 第一次写的存储过程
- 再详细的介绍一下Unity5的AssetBundle