hadoop 之 InputFormat类 --- NLineInputFormat 实例
来源:互联网 发布:mac文稿和照片区别 编辑:程序博客网 时间:2024/05/16 04:36
NLineInputFormat 介绍
文本由任务读取时,需要一种格式读入,KeyValueTextInputFormat 是InputFormat 类的一个具体子类,他定义的读取格式是这样的:
- 一行是一条记录;
- 读取后按照(key,value)对表示一条记录;
- 跟默认的TextInputFormat一样,key是字符偏移量,value是一行的所有内容;
- N 表示一个Map可以处理的Record(记录)数量,也就是每个Map处理的行数;
应用实例
1.要处理的数据,tradeinfoIn文件
zhangsan@163.com 6000 0 2014-02-20lisi@163.com 2000 0 2014-02-20lisi@163.com 0 100 2014-02-20zhangsan@163.com 3000 0 2014-02-20wangwu@126.com 9000 0 2014-02-20wangwu@126.com 0 200 2014-02-20
2.被Job任务读入后的格式:
<0,zhangsan@163.com 6000 0 2014-02-20><35,lisi@163.com,2000 0 2014-02-20><67,lisi@163.com 0 100 2014-02-20><98,zhangsan@163.com 3000 0 2014-02-20><134,wangwu@126.com 9000 0 2014-02-20><168,wangwu@126.com 0 200 2014-02-20>
3.代码
- 设置 mapreduce.input.lineinputformat.linespermap 属性,告诉每个Map应该处理的Map数量
conf.setInt("mapreduce.input.lineinputformat.linespermap", 2);
- 设置Job读取文件时按照NLineInputFormat格式
job.setInputFormatClass(NLineInputFormat.class);
package mapreduce.mr;import java.io.IOException;import java.util.HashMap;import java.util.Map;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.conf.Configured;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Partitioner;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat;import org.apache.hadoop.mapreduce.lib.input.NLineInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.Tool;import org.apache.hadoop.util.ToolRunner;import mapreduce.bean.InfoBeanMy;public class SumStepByTool extends Configured implements Tool{ public static class SumStepByToolMapper extends Mapper<LongWritable, Text, Text, InfoBeanMy>{ private InfoBeanMy outBean = new InfoBeanMy(); private Text k = new Text(); @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{ String line = value.toString(); String[] fields = line.split("\t"); String account = fields[0]; double income = Double.parseDouble(fields[1]); double expense = Double.parseDouble(fields[2]); outBean.setFields(account, income, expense); k.set(account); context.write(k, outBean); } } public static class SumStepByToolReducer extends Reducer<Text, InfoBeanMy, Text, InfoBeanMy>{ private InfoBeanMy outBean = new InfoBeanMy(); @Override protected void reduce(Text key, Iterable<InfoBeanMy> values, Context context) throws IOException, InterruptedException{ double income_sum = 0; double expense_sum = 0; for(InfoBeanMy infoBeanMy : values) { income_sum += infoBeanMy.getIncome(); expense_sum += infoBeanMy.getExpense(); } outBean.setFields("", income_sum, expense_sum); context.write(key, outBean); } } public static class SumStepByToolPartitioner extends Partitioner<Text, InfoBeanMy>{ private static Map<String, Integer> accountMap = new HashMap<String, Integer>(); static { accountMap.put("zhangsan", 1); accountMap.put("lisi", 2); accountMap.put("wangwu", 3); } @Override public int getPartition(Text key, InfoBeanMy value, int numPartitions) { String keyString = key.toString(); String name = keyString.substring(0, keyString.indexOf("@")); Integer part = accountMap.get(name); if (part == null ) { part = 0; } return part; } } public int run(String[] args) throws Exception { Configuration conf = getConf(); conf.setInt("mapreduce.input.lineinputformat.linespermap", 2); Job job = Job.getInstance(conf); job.setJarByClass(this.getClass()); job.setJobName("SumStepByTool"); //job.setInputFormatClass(TextInputFormat.class); //这个是默认的输入格式 //job.setInputFormatClass(KeyValueTextInputFormat.class); //这个把一行记录的第一个区域当做key,其他区域作为value job.setInputFormatClass(NLineInputFormat.class); job.setMapperClass(SumStepByToolMapper.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(InfoBeanMy.class); job.setReducerClass(SumStepByToolReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(InfoBeanMy.class); job.setNumReduceTasks(3); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); return job.waitForCompletion(true) ? 0:-1; } public static void main(String[] args) throws Exception { int exitCode = ToolRunner.run(new SumStepByTool(),args); System.exit(exitCode); }}
注意
- 每个map处理的记录数是通过设置conf对象中的 “mapreduce.input.lineinputformat.linespermap”属性来控制的
conf.setInt("mapreduce.input.lineinputformat.linespermap", 2);
0 0
- hadoop 之 InputFormat类 --- NLineInputFormat 实例
- hadoop 之 InputFormat类 --- KeyValueTextInputFormat 实例
- Hadoop之NLineInputFormat解析
- NLineInputFormat实例
- Hadoop 之 InputFormat
- Hadoop编程模型之InputFormat
- Hadoop中的NLineInputFormat
- 关于Hadoop的InputFormat类
- 关于Hadoop的InputFormat类
- Hadoop系列之InputFormat,OutputFormat用法
- Hadoop中常用的InputFormat,OutPutFormat类
- Hadoop:自定义输入文件格式类InputFormat
- hadoop源码学习 InputFormat抽象类
- [Hadoop源码解读](一)MapReduce篇之InputFormat
- Hadoop之InputFormat接口的设计与实现
- [Hadoop源码解读](一)MapReduce篇之InputFormat
- [Hadoop源码解读](一)MapReduce篇之InputFormat
- [Hadoop源码解读](一)MapReduce篇之InputFormat<转>
- 文字闪烁
- 超出字符,自动隐藏
- 购物车结算
- 音乐播放暂停控制
- mongodb查询
- hadoop 之 InputFormat类 --- NLineInputFormat 实例
- js面向对象
- 动态table不支持onclick()
- 计算器
- JS过去某个时间到现在天数
- jQuery 获取 iframe 父页面与子页面的元素和方法
- subscribeOn与observeOn的是是非非
- jQuery 获取浏览器窗口可视区域的高度和宽度及滚动条高度
- form checkbox & radio 解决方案