[hadoop2.7.1]I/O之SequenceFile最新API编程实例(写入、读取)
来源:互联网 发布:mac地址ip地质作用 编辑:程序博客网 时间:2024/06/06 16:34
写操作
根据上一篇的介绍,在hadoop2.x之后,hadoop中的SequenceFile.Writer将会逐渐摒弃大量的createWriter()重载方法,而整合为更为简洁的createWriter()
方法,除了配置参数外,其他的参数统统使用SequenceFile.Writer.Option来替代,具体有:
新的API里提供的option参数:
FileOption
FileSystemOption
StreamOption
BufferSizeOption
BlockSizeOption
ReplicationOption
KeyClassOption
ValueClassOption
MetadataOption
ProgressableOption
CompressionOption
这些参数能够满足各种不同的需要,参数之间不存在顺序关系,这样减少了代码编写工作量,更为直观,便于理解,下面先来看看这个方法,后边将给出一个具体实例。
createWriter
public static org.apache.hadoop.io.SequenceFile.Writer createWriter(Configuration conf, org.apache.hadoop.io.SequenceFile.Writer.Option... opts) throws IOException
Create a new Writer with the given options.- Parameters:
conf
- the configuration to useopts
- the options to create the file with- Returns:
- a new Writer
- Throws:
IOException
权威指南第四版中提供了一个SequenceFileWriteDemo实例:
// cc SequenceFileWriteDemo Writing a SequenceFileimport java.io.IOException;import java.net.URI;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.FileSystem;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IOUtils;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.SequenceFile;import org.apache.hadoop.io.Text;// vv SequenceFileWriteDemopublic class SequenceFileWriteDemo { private static final String[] DATA = { "One, two, buckle my shoe", "Three, four, shut the door", "Five, six, pick up sticks", "Seven, eight, lay them straight", "Nine, ten, a big fat hen" }; public static void main(String[] args) throws IOException { String uri = args[0]; Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(URI.create(uri), conf); Path path = new Path(uri); IntWritable key = new IntWritable(); Text value = new Text(); SequenceFile.Writer writer = null; try { writer = SequenceFile.createWriter(fs, conf, path, key.getClass(), value.getClass()); for (int i = 0; i < 100; i++) { key.set(100 - i); value.set(DATA[i % DATA.length]); System.out.printf("[%s]\t%s\t%s\n", writer.getLength(), key, value); writer.append(key, value); } } finally { IOUtils.closeStream(writer); } }}// ^^ SequenceFileWriteDemo
对于上面实例中的createWriter()
方法用整合之后的最新的方法来改写一下,代码如下:
package org.apache.hadoop.io;import java.io.IOException;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IOUtils;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.SequenceFile;import org.apache.hadoop.io.SequenceFile.Writer;import org.apache.hadoop.io.SequenceFile.Writer.FileOption;import org.apache.hadoop.io.SequenceFile.Writer.KeyClassOption;import org.apache.hadoop.io.SequenceFile.Writer.ValueClassOption;import org.apache.hadoop.io.Text;public class THT_testSequenceFile2 {private static final String[] DATA = { "One, two, buckle my shoe","Three, four, shut the door", "Five, six, pick up sticks","Seven, eight, lay them straight", "Nine, ten, a big fat hen" };public static void main(String[] args) throws IOException {// String uri = args[0];String uri = "file:///D://B.txt";Configuration conf = new Configuration();Path path = new Path(uri);IntWritable key = new IntWritable();Text value = new Text();SequenceFile.Writer writer = null;SequenceFile.Writer.FileOption option1 = (FileOption) Writer.file(path);SequenceFile.Writer.KeyClassOption option2 = (KeyClassOption) Writer.keyClass(key.getClass());SequenceFile.Writer.ValueClassOption option3 = (ValueClassOption) Writer.valueClass(value.getClass());try {writer = SequenceFile.createWriter( conf, option1,option2,option3,Writer.compression(CompressionType.RECORD));for (int i = 0; i < 10; i++) {key.set(1 + i);value.set(DATA[i % DATA.length]);System.out.printf("[%s]\t%s\t%s\n", writer.getLength(), key,value);writer.append(key, value);}} finally {IOUtils.closeStream(writer);}}}
运行结果如下:
2015-11-06 22:15:05,027 INFO compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.deflate][128]1One, two, buckle my shoe[173]2Three, four, shut the door[220]3Five, six, pick up sticks[264]4Seven, eight, lay them straight[314]5Nine, ten, a big fat hen[359]6One, two, buckle my shoe[404]7Three, four, shut the door[451]8Five, six, pick up sticks[495]9Seven, eight, lay them straight[545]10Nine, ten, a big fat hen
生成的文件:
读操作
新的API里提供的option参数:
FileOption -表示读哪个文件
InputStreamOption
StartOption
LengthOption -按照设置的长度变量来决定读取的字节
BufferSizeOption
OnlyHeaderOption
根据最新的API直接上源码:
package org.apache.hadoop.io;import java.io.IOException;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IOUtils;import org.apache.hadoop.io.SequenceFile;import org.apache.hadoop.io.SequenceFile.Reader;import org.apache.hadoop.io.Writable;import org.apache.hadoop.util.ReflectionUtils;public class THT_testSequenceFile3 {public static void main(String[] args) throws IOException {//String uri = args[0];String uri = "file:///D://B.txt";Configuration conf = new Configuration();Path path = new Path(uri);SequenceFile.Reader.Option option1 = Reader.file(path);SequenceFile.Reader.Option option2 = Reader.length(174);//这个参数表示读取的长度SequenceFile.Reader reader = null;try {reader = new SequenceFile.Reader(conf,option1,option2);Writable key = (Writable) ReflectionUtils.newInstance(reader.getKeyClass(), conf);Writable value = (Writable) ReflectionUtils.newInstance(reader.getValueClass(), conf);long position = reader.getPosition();while (reader.next(key, value)) {String syncSeen = reader.syncSeen() ? "*" : "";System.out.printf("[%s%s]\t%s\t%s\n", position, syncSeen, key,value);position = reader.getPosition(); // beginning of next record}} finally {IOUtils.closeStream(reader);}}}
我这儿设置了一个读取长度的参数,只读到第174个字节那,所以运行结果如下:
2015-11-06 22:53:00,602 INFO compress.CodecPool (CodecPool.java:getDecompressor(181)) - Got brand-new decompressor [.deflate][128]1One, two, buckle my shoe[173]2Three, four, shut the door
1 0
- [hadoop2.7.1]I/O之SequenceFile最新API编程实例(写入、读取)
- [hadoop2.7.1]I/O之MapFile(排过序的SequenceFile)读、写、重建index实例
- [hadoop2.7.1]I/O之“泥坯块”SequenceFile前序知识
- [hadoop2.7.1]I/O之一步一步解析Text(实例)
- linux系统编程之文件与I/O(二):文件的读取写入
- linux系统编程之文件与I/O(二):文件的读取写入
- linux系统编程之文件与I/O(二):文件的读取写入
- [hadoop2.7.1]I/O之IntWritable测试实例(详尽)
- [hadoop2.7.1]I/O之序列化(serializer)
- [hadoop2.7.1]I/O之序列化(WritableSerialization)示例
- [hadoop2.7.1]I/O之压缩
- [hadoop2.7.1]I/O之tfile
- 《hadoop权威指南》学习笔记-hadoop I/O之SequenceFile
- Java I/O (三)读取和写入文件
- [hadoop2.7.1]I/O之Writable源码及相关注解
- [hadoop2.7.1]I/O之一步一步解析Text(基础知识及与String比较)
- java I/O 之读取InputStream数据到内存&&内存数据写入到OutputSteam中
- java I/O流(3)键盘录入,读取转换流和写入转换流
- NOIP 数字反转
- 高人对libsvm的经典总结
- 非托管资源泄露
- Codeforces 427D Match & Catch(后缀自动机)
- Linux 给用户及用户组分配权限以及对文件目录的操作
- [hadoop2.7.1]I/O之SequenceFile最新API编程实例(写入、读取)
- Xcode快捷键
- hdu--4455+ Substrings+2012杭州区域赛C题+DP
- Swift-- 方法
- [TwistedFate]属性property
- zoj 3891 K-hash(后缀自动机)
- hdoj2053(switch game
- Intent 传数据
- POJ 3009 dfs暴搜