[hadoop2.7.1]I/O之SequenceFile最新API编程实例(写入、读取)

来源:互联网 发布:mac地址ip地质作用 编辑:程序博客网 时间:2024/06/06 16:34


写操作


根据上一篇的介绍,在hadoop2.x之后,hadoop中的SequenceFile.Writer将会逐渐摒弃大量的createWriter()重载方法,而整合为更为简洁的createWriter()方法,除了配置参数外,其他的参数统统使用SequenceFile.Writer.Option来替代,具体有:


新的API里提供的option参数:


FileOption
FileSystemOption
StreamOption
BufferSizeOption
BlockSizeOption
ReplicationOption
KeyClassOption
ValueClassOption
MetadataOption
ProgressableOption
CompressionOption

这些参数能够满足各种不同的需要,参数之间不存在顺序关系,这样减少了代码编写工作量,更为直观,便于理解,下面先来看看这个方法,后边将给出一个具体实例。

  • createWriter

    public static org.apache.hadoop.io.SequenceFile.Writer createWriter(Configuration conf,                                                    org.apache.hadoop.io.SequenceFile.Writer.Option... opts)                                                             throws IOException
    Create a new Writer with the given options.
    Parameters:
    conf - the configuration to use
    opts - the options to create the file with
    Returns:
    a new Writer
    Throws:
    IOException

权威指南第四版中提供了一个SequenceFileWriteDemo实例:

// cc SequenceFileWriteDemo Writing a SequenceFileimport java.io.IOException;import java.net.URI;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.FileSystem;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IOUtils;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.SequenceFile;import org.apache.hadoop.io.Text;// vv SequenceFileWriteDemopublic class SequenceFileWriteDemo {    private static final String[] DATA = {    "One, two, buckle my shoe",    "Three, four, shut the door",    "Five, six, pick up sticks",    "Seven, eight, lay them straight",    "Nine, ten, a big fat hen"  };    public static void main(String[] args) throws IOException {    String uri = args[0];    Configuration conf = new Configuration();    FileSystem fs = FileSystem.get(URI.create(uri), conf);    Path path = new Path(uri);    IntWritable key = new IntWritable();    Text value = new Text();    SequenceFile.Writer writer = null;    try {      writer = SequenceFile.createWriter(fs, conf, path,          key.getClass(), value.getClass());            for (int i = 0; i < 100; i++) {        key.set(100 - i);        value.set(DATA[i % DATA.length]);        System.out.printf("[%s]\t%s\t%s\n", writer.getLength(), key, value);        writer.append(key, value);      }    } finally {      IOUtils.closeStream(writer);    }  }}// ^^ SequenceFileWriteDemo

对于上面实例中的createWriter()方法用整合之后的最新的方法来改写一下,代码如下:

package org.apache.hadoop.io;import java.io.IOException;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IOUtils;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.SequenceFile;import org.apache.hadoop.io.SequenceFile.Writer;import org.apache.hadoop.io.SequenceFile.Writer.FileOption;import org.apache.hadoop.io.SequenceFile.Writer.KeyClassOption;import org.apache.hadoop.io.SequenceFile.Writer.ValueClassOption;import org.apache.hadoop.io.Text;public class THT_testSequenceFile2 {private static final String[] DATA = { "One, two, buckle my shoe","Three, four, shut the door", "Five, six, pick up sticks","Seven, eight, lay them straight", "Nine, ten, a big fat hen" };public static void main(String[] args) throws IOException {// String uri = args[0];String uri = "file:///D://B.txt";Configuration conf = new Configuration();Path path = new Path(uri);IntWritable key = new IntWritable();Text value = new Text();SequenceFile.Writer writer = null;SequenceFile.Writer.FileOption option1 = (FileOption) Writer.file(path);SequenceFile.Writer.KeyClassOption option2 = (KeyClassOption) Writer.keyClass(key.getClass());SequenceFile.Writer.ValueClassOption option3 = (ValueClassOption) Writer.valueClass(value.getClass());try {writer = SequenceFile.createWriter( conf, option1,option2,option3,Writer.compression(CompressionType.RECORD));for (int i = 0; i < 10; i++) {key.set(1 + i);value.set(DATA[i % DATA.length]);System.out.printf("[%s]\t%s\t%s\n", writer.getLength(), key,value);writer.append(key, value);}} finally {IOUtils.closeStream(writer);}}}

运行结果如下:


2015-11-06 22:15:05,027 INFO  compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.deflate][128]1One, two, buckle my shoe[173]2Three, four, shut the door[220]3Five, six, pick up sticks[264]4Seven, eight, lay them straight[314]5Nine, ten, a big fat hen[359]6One, two, buckle my shoe[404]7Three, four, shut the door[451]8Five, six, pick up sticks[495]9Seven, eight, lay them straight[545]10Nine, ten, a big fat hen

生成的文件:




读操作


新的API里提供的option参数:


FileOption -表示读哪个文件
InputStreamOption
StartOption
LengthOption -按照设置的长度变量来决定读取的字节
BufferSizeOption
OnlyHeaderOption


根据最新的API直接上源码:


package org.apache.hadoop.io;import java.io.IOException;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IOUtils;import org.apache.hadoop.io.SequenceFile;import org.apache.hadoop.io.SequenceFile.Reader;import org.apache.hadoop.io.Writable;import org.apache.hadoop.util.ReflectionUtils;public class THT_testSequenceFile3 {public static void main(String[] args) throws IOException {//String uri = args[0];String uri = "file:///D://B.txt";Configuration conf = new Configuration();Path path = new Path(uri);SequenceFile.Reader.Option option1 = Reader.file(path);SequenceFile.Reader.Option option2 = Reader.length(174);//这个参数表示读取的长度SequenceFile.Reader reader = null;try {reader = new SequenceFile.Reader(conf,option1,option2);Writable key = (Writable) ReflectionUtils.newInstance(reader.getKeyClass(), conf);Writable value = (Writable) ReflectionUtils.newInstance(reader.getValueClass(), conf);long position = reader.getPosition();while (reader.next(key, value)) {String syncSeen = reader.syncSeen() ? "*" : "";System.out.printf("[%s%s]\t%s\t%s\n", position, syncSeen, key,value);position = reader.getPosition(); // beginning of next record}} finally {IOUtils.closeStream(reader);}}}

我这儿设置了一个读取长度的参数,只读到第174个字节那,所以运行结果如下:


2015-11-06 22:53:00,602 INFO  compress.CodecPool (CodecPool.java:getDecompressor(181)) - Got brand-new decompressor [.deflate][128]1One, two, buckle my shoe[173]2Three, four, shut the door







1 0
原创粉丝点击