Hadoop之MapReduce-Partition编程
来源:互联网 发布:空调的选购 知乎 编辑:程序博客网 时间:2024/05/16 18:59
一、问题描述
在Hadoop序列化案例(http://blog.csdn.net/gaijianwei/article/details/46004025)的基础上,将输出的数据按照手机号所属的运营商进行分区。
二、问题实现
DataCount代码(只是对Hadoop序列化案例的DataCount代码稍作修改)
package edu.jianwei.hadoop.mr;import java.io.IOException;import java.util.HashMap;import java.util.Map;import org.apache.commons.collections.map.HashedMap;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Mapper.Context;import org.apache.hadoop.mapreduce.Partitioner;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;public class DataCount {static class DCMapper extends Mapper<LongWritable, Text, Text, DataBean>{ private Text k=new Text(); private DataBean v=new DataBean();@Overrideprotected void map(LongWritable key, Text value, Context context)throws IOException, InterruptedException {String line=value.toString();String[] words=line.split("\t"); String telNum=words[1];double upLoad=Double.parseDouble(words[8]);double downLoad=Double.parseDouble(words[9]);k.set(telNum);v.Set(telNum, upLoad, downLoad);context.write(k, v);}}static class DCReduce extends Reducer<Text,DataBean, Text, DataBean>{ private DataBean v=new DataBean();@Overrideprotected void reduce(Text key, Iterable<DataBean> v2s,Context context)throws IOException, InterruptedException {double upTotal=0;double downToal=0;for (DataBean d : v2s) {upTotal+=d.getUpLoad();downToal+=d.getDownload();}v.Set("", upTotal, downToal);context.write(key, v);}}public static class DCPartitioner extends Partitioner<Text, DataBean>{ static Map<String,Integer> provider=new HashMap<String,Integer>(); static{ provider.put( "139",1); provider.put( "138",1); provider.put( "152",2); provider.put("153", 2); provider.put("182", 3); provider.put("183", 3); }@Overridepublic int getPartition(Text k, DataBean value, int numPartitions) {String tel_sub=k.toString().substring(0,3);Integer counter; counter=provider.get(tel_sub); if(counter==null){ counter=0; }return counter;}}public static void main(String[] args) throws Exception { Configuration conf=new Configuration(); Job job=Job.getInstance(); job.setJarByClass(DataCount.class); job.setMapperClass(DCMapper.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(DataBean.class); FileInputFormat.setInputPaths(job, new Path(args[0])); job.setReducerClass(DCReduce.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(DataBean.class); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setPartitionerClass(DCPartitioner.class); job.setNumReduceTasks(Integer.parseInt(args[2])); job.waitForCompletion(true);}}DataBean同Hadoop序列化案例中的DataBean
三、代码测试
1.代码运行(启动4个Reduce任务)
hadoop jar /root/dc.jar edu.jianwei.hadoop.mr.DataCount /dc /dc/res 4
2.运行结果
这里输出结果不在一一列举, 例part-r-00001的数据:
13826544101 264.0 0.0 264.0
13922314466 3008.0 3720.0 6728.0
13925057413 11058.0 48243.0 59301.0
13926251106 240.0 0.0 240.0
13926435656 132.0 1512.0 1644.0
注意:
1’. 代码运行(启动3个Reduce任务)
hadoop jar /root/dc.jar edu.jianwei.hadoop.mr.DataCount /dc/HTTP_20130313143750.dat /dc/res_3 3
2‘.运行结果
1’‘.代码运行(启动5个Reduce任务)
hadoop jar /root/dc.jar edu.jianwei.hadoop.mr.DataCount /dc/HTTP_20130313143750.dat /dc/res_3 3
2''.运行结果
- Hadoop之MapReduce-Partition编程
- Hadoop之MapReduce编程模型
- Hadoop之深入MapReduce编程
- hadoop 之MapReduce编程实战
- Hadoop之MapReduce的partition 浅析(四)
- Hadoop mapreduce 中partition作用
- mapreduce之partition分区
- Hadoop编程之MapReduce操作Mysql数据库
- hadoop初学之MapReduce编程模型学习
- Hadoop编程之MapReduce操作Mysql数据库
- Hadoop编程之MapReduce操作Mysql数据库
- Hadoop之MapReduce-自定义排序编程
- hadoop之MapReduce编程的权限问题
- Hadoop编程之MapReduce操作Mysql数据库
- Hadoop实战-中高级部分 之 Hadoop MapReduce高级编程
- Hadoop实战-中高级部分 之 Hadoop MapReduce高级编程
- Hadoop MapReduce高级编程
- Hadoop MapReduce高级编程
- 理解G1垃圾收集器日志
- 分享:5个解决方法帮助你的团队高效运作
- fastjson转bean和集合的实用方法
- Unity3D研究院之使用C#语言建立本地数据库
- adb shell 后续继续输入命令
- Hadoop之MapReduce-Partition编程
- linux命令实现:write
- TinyHTTPd--超轻量型Http Server源码分析
- 利用phpExcel实现Excel数据的导入导出(全步骤详细解析)
- ArrayIndexOutofBound AbsListView$RecycleBin.addScrapView
- xib---拖拽的方法搭建视图
- ThinkPHP中利用SESSION实现用户登录验证的方法
- (7.2.5)细说SQL Server中的加密
- Notepad++ 快捷键 大全 官方整理过来的