RCFile读写操作
来源:互联网 发布:2016年意外事故数据 编辑:程序博客网 时间:2024/06/01 07:39
http://smallboby.iteye.com/blog/1592531
读取RCfile
Job job = new Job(); job.setJarByClass(类.class); //设定输入文件为RcFile格式 job.setInputFormatClass(RCFileInputFormat.class); //普通输出 job.setOutputFormatClass(TextOutputFormat.class); //设置输入路径 RCFileInputFormat.addInputPath(job, new Path(srcpath)); //MultipleInputs.addInputPath(job, new Path(srcpath), RCFileInputFormat.class); // 输出 TextOutputFormat.setOutputPath(job, new Path(respath)); // 输出key格式 job.setOutputKeyClass(Text.class); //输出value格式 job.setOutputValueClass(NullWritable.class); //设置mapper类 job.setMapperClass(ReadTestMapper.class); //这里没设置reduce,reduce的操作就是读Text类型文件,因为mapper已经给转换了。 code = (job.waitForCompletion(true)) ? 0 : 1; // mapper 类 pulic class ReadTestMapper extends Mapper<LongWritable, BytesRefArrayWritable, Text, NullWritable> { @Override protected void map(LongWritable key, BytesRefArrayWritable value, Context context) throws IOException, InterruptedException { // TODO Auto-generated method stub Text txt = new Text(); //因为RcFile行存储和列存储,所以每次进来的一行数据,Value是个列簇,遍历,输出。 StringBuffer sb = new StringBuffer(); for (int i = 0; i < value.size(); i++) { BytesRefWritable v = value.get(i); txt.set(v.getData(), v.getStart(), v.getLength()); if(i==value.size()-1){ sb.append(txt.toString()); }else{ sb.append(txt.toString()+"\t"); } } context.write(new Text(sb.toString()),NullWritable.get()); } }写RCFile
Job job = new Job(); Configuration conf = job.getConfiguration(); //设置每行的列簇数 RCFileOutputFormat.setColumnNumber(conf, 4); job.setJarByClass(类.class); FileInputFormat.setInputPaths(job, new Path(srcpath)); RCFileOutputFormat.setOutputPath(job, new Path(respath)); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(RCFileOutputFormat.class); job.setMapOutputKeyClass(LongWritable.class); job.setMapOutputValueClass(BytesRefArrayWritable.class); job.setMapperClass(OutPutTestMapper.class); conf.set("date", line.getOptionValue(DATE)); //设置压缩参数 conf.setBoolean("mapred.output.compress", true); conf.set("mapred.output.compression.codec", "org.apache.hadoop.io.compress.GzipCodec"); code = (job.waitForCompletion(true)) ? 0 : 1; public class OutPutTestMapper extends Mapper<LongWritable, Text, LongWritable, BytesRefArrayWritable> { @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String day = context.getConfiguration().get("date"); if (!line.equals("")) { String[] lines = line.split(" ", -1); if (lines.length > 3) { String time_temp = lines[1]; String times = timeStampDate(time_temp); String d = times.substring(0, 10); if (day.equals(d)) { byte[][] record = {lines[0].getBytes("UTF-8"), lines[1].getBytes("UTF-8"),lines[2].getBytes("UTF-8"), lines[3].getBytes("UTF-8")}; } BytesRefArrayWritable bytes = new BytesRefArrayWritable(record.length); for (int i = 0; i < record.length; i++) { BytesRefWritable cu = new BytesRefWritable(record[i], 0, record[i].length); bytes.set(i, cu); } context.write(key, bytes); } } } }
- RCFile读写操作
- rcfile
- RcFile
- Map/Reduce操作RCFile的RecordReader
- 读写操作
- RCFile文件格式
- 文件读写操作
- 另一个文件读写操作
- 读写模版的操作
- xml的读写操作
- [C++]文件读写操作
- 读写文件操作
- xml文件操作( 读写)
- 文件的读写操作
- 文件的读写操作
- nandflash的读写操作
- Nand读写操作
- C#操作注册表(读写)
- 题目22:今年暑假不AC
- 项目优化的若干方法
- java获取系统的属性Properties
- C# 快速导出word
- 基于AM335X NAND FLASH 驱动调试总结
- RCFile读写操作
- //撤销流程写法instance_id
- 浮点数的存储格式
- PHP中冒号、endif、endwhile、endfor这些都是什么
- Acrobat Professional 激活方法
- Linux下部署项目步骤
- windows internals 6th edition 初读笔记
- 题目23:迷瘴
- n点虚拟主机管理软件装不上,无法配置