Hadoop读书笔记(六)MapReduce自定义数据类型demo
来源:互联网 发布:ubuntu搜狗输入法消失 编辑:程序博客网 时间:2024/05/17 07:23
Hadoop读书笔记(一)Hadoop介绍:http://blog.csdn.net/caicongyang/article/details/39898629
Hadoop读书笔记(二)HDFS的shell操作:http://blog.csdn.net/caicongyang/article/details/41253927
Hadoop读书笔记(三)Java API操作HDFS:http://blog.csdn.net/caicongyang/article/details/41290955
Hadoop读书笔记(四)HDFS体系结构 :http://blog.csdn.net/caicongyang/article/details/41322649
Hadoop读书笔记(五)MapReduce统计单词demo:http://blog.csdn.net/caicongyang/article/details/41453579
1.demo说明
从给定的日志文件中统计手机流量
2.日志文件
1363157985066 1372623050300-FD-07-A4-72-B8:CMCC120.196.100.82i02.c.aliimg.com24272481246812001363157995052 138265441015C-0E-8B-C7-F1-E0:CMCC120.197.40.44026402001363157991076 1392643565620-10-7A-28-CC-0A:CMCC120.196.100.992413215122001363154400022 139262511065C-0E-8B-8B-B1-50:CMCC120.197.40.44024002001363157993044 1821157596194-71-AC-CD-E6-18:CMCC-EASY120.196.100.99iface.qiyi.com视频网站1512152721062001363157995074 841384135C-0E-8B-8C-E8-20:7DaysInn120.197.40.4122.72.52.122016411614322001363157993055 13560439658C4-17-FE-BA-DE-D9:CMCC120.196.100.99181511169542001363157995033 159201332575C-0E-8B-C7-BA-20:CMCC120.197.40.4sug.so.360.cn信息安全2020315629362001363157983019 1371919941968-A1-B7-03-07-B1:CMCC-EASY120.196.100.824024002001363157984041 136605779915C-0E-8B-92-5C-20:CMCC-EASY120.197.40.4s19.cnzz.com站点统计24969606902001363157973098 150136858585C-0E-8B-C7-F7-90:CMCC120.197.40.4rank.ie.sogou.com搜索引擎2827365935382001363157986029 15989002119E8-99-C4-4E-93-E0:CMCC-EASY120.196.100.99www.umeng.com站点统计3319381802001363157992093 13560439658C4-17-FE-BA-DE-D9:CMCC120.196.100.9915991849382001363157986041 134802531045C-0E-8B-C7-FC-80:CMCC-EASY120.197.40.4331801802001363157984040 136028465655C-0E-8B-8B-B6-00:CMCC120.197.40.42052.flash2-http.qq.com综合门户1512193829102001363157995093 1392231446600-FD-07-A2-EC-BA:CMCC120.196.100.82img.qfc.cn1212300837202001363157982040 135024688235C-0A-5B-6A-0B-D4:CMCC-EASY120.196.100.99y0.ifengimg.com综合门户5710273351103492001363157986072 1832017338284-25-DB-4F-10-1A:CMCC-EASY120.196.100.99input.shouji.sogou.com搜索引擎2118953124122001363157990043 1392505741300-1F-64-E1-E6-9A:CMCC120.196.100.55t3.baidu.com搜索引擎696311058482432001363157988072 1376077871000-FD-07-A4-7B-08:CMCC120.196.100.82221201202001363157985079 1382307000120-7C-8F-70-68-1F:CMCC120.196.100.99633601802001363157985069 1360021750200-1F-64-E2-E8-B1:CMCC120.196.100.55181381080186852200
3.代码
KpiApp.java
package mapReduce;import java.io.DataInput;import java.io.DataOutput;import java.io.IOException;import java.net.URI;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.FileSystem;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.io.Writable;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;import org.apache.hadoop.mapreduce.lib.partition.HashPartitioner;/** * * <p> * Title: KpiApp.java * Package mapReduce * </p> * <p> * Description: 统计流量 * <p> * @author Tom.Cai * @created 2014-11-25 下午10:23:33 * @version V1.0 * */public class KpiApp {private static final String INPUT_PATH = "hdfs://192.168.80.100:9000/wlan";private static final String OUT_PATH = "hdfs://192.168.80.100:9000/wlan_out";public static void main(String[] args) throws Exception {FileSystem fileSystem = FileSystem.get(new URI(INPUT_PATH), new Configuration());Path outPath = new Path(OUT_PATH);if (fileSystem.exists(outPath)) {fileSystem.delete(outPath, true);}Job job = new Job(new Configuration(), KpiApp.class.getSimpleName());FileInputFormat.setInputPaths(job, INPUT_PATH);job.setInputFormatClass(TextInputFormat.class);job.setMapperClass(KpiMapper.class);job.setMapOutputKeyClass(Text.class);job.setMapOutputValueClass(KpiWite.class);job.setPartitionerClass(HashPartitioner.class);job.setNumReduceTasks(1);job.setReducerClass(KpiReducer.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(KpiWite.class);FileOutputFormat.setOutputPath(job, new Path(OUT_PATH));job.setOutputFormatClass(TextOutputFormat.class);job.waitForCompletion(true);}static class KpiMapper extends Mapper<LongWritable, Text, Text, KpiWite> {@Overrideprotected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {String[] splited = value.toString().split("\t");String num = splited[1];KpiWite kpi = new KpiWite(splited[6], splited[7], splited[8], splited[9]);context.write(new Text(num), kpi);}}static class KpiReducer extends Reducer<Text, KpiWite, Text, KpiWite> {@Overrideprotected void reduce(Text key, Iterable<KpiWite> value, Context context) throws IOException, InterruptedException {long upPackNum = 0L;long downPackNum = 0L;long upPayLoad = 0L;long downPayLoad = 0L;for (KpiWite kpi : value) {upPackNum += kpi.upPackNum;downPackNum += kpi.downPackNum;upPayLoad += kpi.upPayLoad;downPayLoad += kpi.downPayLoad;}context.write(key, new KpiWite(String.valueOf(upPackNum), String.valueOf(downPackNum), String.valueOf(upPayLoad), String.valueOf(downPayLoad)));}}}class KpiWite implements Writable {long upPackNum;long downPackNum;long upPayLoad;long downPayLoad;public KpiWite() {}public KpiWite(String upPackNum, String downPackNum, String upPayLoad, String downPayLoad) {this.upPackNum = Long.parseLong(upPackNum);this.downPackNum = Long.parseLong(downPackNum);this.upPayLoad = Long.parseLong(upPayLoad);this.downPayLoad = Long.parseLong(downPayLoad);}@Overridepublic void readFields(DataInput in) throws IOException {this.upPackNum = in.readLong();this.downPackNum = in.readLong();this.upPayLoad = in.readLong();this.downPayLoad = in.readLong();}@Overridepublic void write(DataOutput out) throws IOException {out.writeLong(upPackNum);out.writeLong(downPackNum);out.writeLong(upPayLoad);out.writeLong(downPayLoad);}}
欢迎大家一起讨论学习!
有用的自己收!
记录与分享,让你我共成长!欢迎查看我的其他博客;我的博客地址:http://blog.csdn.net/caicongyang
1 0
- Hadoop读书笔记(六)MapReduce自定义数据类型demo
- Hadoop读书笔记(十二)MapReduce自定义排序
- Hadoop读书笔记(五)MapReduce统计单词demo
- Hadoop读书笔记(八)MapReduce 打成jar包demo
- Hadoop实战【二、MapReduce+自定义数据类型】
- Hadoop读书笔记(七)MapReduce 0.x版本API使用demo
- Hadoop系列-MapReduce自定义数据类型(序列化、反序列化机制)(十二)
- Hadoop读书笔记(九)MapReduce计数器
- MapReduce自定义数据类型
- MapReduce自定义数据类型
- MapReduce数据类型及自定义MapReduce数据类型
- Hadoop系列-MapReduce自定义排序(十三)
- Hadoop系列-MapReduce自定义Partitioner(十四)
- hadoop mapreduce自定义排序
- Hadoop学习之MapReduce(六)
- Hadoop学习之MapReduce(六)
- 自定义数据类型转换Demo
- Hadoop读书笔记(十一)MapReduce中的partition分组
- u3d快速入门图文教程
- 第13周项目4(3)选择排序
- 【LeetCode】Median of Two Sorted Arrays
- 对GEL文件的理解
- Makefile
- Hadoop读书笔记(六)MapReduce自定义数据类型demo
- javax.servlet.jsp.jstl.sql中的result的使用
- 获取指定目录下指定文件
- log4j 简单使用
- fopen 与 open
- ListView之适配器参数传递问题
- 0018算法笔记——【动态规划】流水作业调度问题与Johnson法则
- POJ 1276 Cash Machine
- 关于环境变量 ORACLE_SID 简单谈下