hadoop2 (2017-7-21修改)对SequeneceFile 的(读写)操作
来源:互联网 发布:setup.exe mac 打不开 编辑:程序博客网 时间:2024/06/03 23:03
写操作
根据上一篇的介绍,在hadoop2.x之后,Hadoop中的SequenceFile.Writer将会逐渐摒弃大量的createWriter()重载方法,而整合为更为简洁的createWriter()
方法,除了配置参数外,其他的参数统统使用SequenceFile.Writer.Option来替代,具体有:
新的API里提供的option参数:
FileOption
FileSystemOption
StreamOption
BufferSizeOption
BlockSizeOption
ReplicationOption
KeyClassOption
ValueClassOption
MetadataOption
ProgressableOption
CompressionOption
这些参数能够满足各种不同的需要,参数之间不存在顺序关系,这样减少了代码编写工作量,更为直观,便于理解,下面先来看看这个方法,后边将给出一个具体实例。
createWriter
public static org.apache.hadoop.io.SequenceFile.Writer createWriter(Configuration conf,org.apache.hadoop.io.SequenceFile.Writer.Option opts)
throws IOException
Create a new Writer with the given options.
Parameters:
conf - the configuration to use
opts - the options to create the file with
Returns:
a new Writer
Throws:
IOException
(以下实例已经亲测修改)
权威指南第四版中提供了一个SequenceFileWriteDemo实例:
-
- import java.io.IOException;
- import java.net.URI;
-
- import org.apache.hadoop.conf.Configuration;
- import org.apache.hadoop.fs.FileSystem;
- import org.apache.hadoop.fs.Path;
- import org.apache.hadoop.io.IOUtils;
- import org.apache.hadoop.io.IntWritable;
- import org.apache.hadoop.io.SequenceFile;
- import org.apache.hadoop.io.Text;
-
-
- public class SequenceFileWriteDemo {
-
- private static final String[] DATA = {
- "One, two, buckle my shoe",
- "Three, four, shut the door",
- "Five, six, pick up sticks",
- "Seven, eight, lay them straight",
- "Nine, ten, a big fat hen"
- };
-
- public static void main(String[] args) throws IOException {
- String uri = "file:///E://IDEA//aa.txt";
- Configuration conf = new Configuration();
- conf.set("fs.default.name","hdfs://172.16.11.222:9000");
- FileSystem fs = FileSystem.get(URI.create(uri), conf);
- Path path = new Path(uri);
-
- IntWritable key = new IntWritable();
- Text value = new Text();
- SequenceFile.Writer writer = null;
- try {
- writer = SequenceFile.createWriter(fs, conf, path,
- key.getClass(), value.getClass());
-
- for (int i = 0; i < 100; i++) {
- key.set(100 - i);
- value.set(DATA[i % DATA.length]);
- System.out.printf("[%s]\t%s\t%s\n", writer.getLength(), key, value);
- writer.append(key, value);
- }
- } finally {
- IOUtils.closeStream(writer);
- }
- }
- }
-
对于上面实例中的createWriter()
方法用整合之后的最新的方法来改写一下,代码如下:
- package org.apache.hadoop.io;
-
- import java.io.IOException;
-
- import org.apache.hadoop.conf.Configuration;
- import org.apache.hadoop.fs.Path;
- import org.apache.hadoop.io.IOUtils;
- import org.apache.hadoop.io.IntWritable;
- import org.apache.hadoop.io.SequenceFile;
- import org.apache.hadoop.io.SequenceFile.Writer;
- import org.apache.hadoop.io.SequenceFile.Writer.FileOption;
- import org.apache.hadoop.io.SequenceFile.Writer.KeyClassOption;
- import org.apache.hadoop.io.SequenceFile.Writer.ValueClassOption;
- import org.apache.hadoop.io.Text;
-
- public class THT_testSequenceFile2 {
-
- private static final String[] DATA = { "One, two, buckle my shoe",
- "Three, four, shut the door", "Five, six, pick up sticks",
- "Seven, eight, lay them straight", "Nine, ten, a big fat hen" };
-
- public static void main(String[] args) throws IOException {
-
-
String uri = "file:///E://IDEA//bb.txt"
- Configuration conf = new Configuration();
- conf.set("fs.default.name", "hdfs://172.16.11.222:9000");
- Path path = new Path(uri);
-
- IntWritable key = new IntWritable();
- Text value = new Text();
- SequenceFile.Writer writer = null;
- SequenceFile.Writer.FileOption option1 = (FileOption) Writer.file(path);
- SequenceFile.Writer.KeyClassOption option2 = (KeyClassOption) Writer.keyClass(key.getClass());
- SequenceFile.Writer.ValueClassOption option3 = (ValueClassOption) Writer.valueClass(value.getClass());
-
- try {
-
- writer = SequenceFile.createWriter( conf, option1,option2,option3,Writer.compression(CompressionType.RECORD));
-
- for (int i = 0; i < 10; i++) {
- key.set(1 + i);
- value.set(DATA[i % DATA.length]);
- System.out.printf("[%s]\t%s\t%s\n", writer.getLength(), key,
- value);
- writer.append(key, value);
- }
- } finally {
- IOUtils.closeStream(writer);
- }
- }
- }
运行结果如下:
- 2017-07-20 22:15:05,027 INFO compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.deflate]
- [128] 1 One, two, buckle my shoe
- [173] 2 Three, four, shut the door
- [220] 3 Five, six, pick up sticks
- [264] 4 Seven, eight, lay them straight
- [314] 5 Nine, ten, a big fat hen
- [359] 6 One, two, buckle my shoe
- [404] 7 Three, four, shut the door
- [451] 8 Five, six, pick up sticks
- [495] 9 Seven, eight, lay them straight
- [545] 10 Nine, ten, a big fat hen
生成的文件:
读操作
新的API里提供的option参数:
FileOption -表示读哪个文件
InputStreamOption
StartOption
LengthOption -按照设置的长度变量来决定读取的字节
BufferSizeOption
OnlyHeaderOption
根据最新的API直接上源码:
- package org.apache.hadoop.io;
-
- import java.io.IOException;
-
- import org.apache.hadoop.conf.Configuration;
- import org.apache.hadoop.fs.Path;
- import org.apache.hadoop.io.IOUtils;
- import org.apache.hadoop.io.SequenceFile;
- import org.apache.hadoop.io.SequenceFile.Reader;
- import org.apache.hadoop.io.Writable;
- import org.apache.hadoop.util.ReflectionUtils;
-
- public class THT_testSequenceFile3 {
-
- public static void main(String[] args) throws IOException {
-
- String uri = "file:///E://IDEA//bb.txt";
- Configuration conf = new Configuration();
- Path path = new Path(uri);
- SequenceFile.Reader.Option option1 = Reader.file(path);
- SequenceFile.Reader.Option option2 = Reader.length(174);
- SequenceFile.Reader reader = null;
- try {
- reader = new SequenceFile.Reader(conf,option1,option2);
- Writable key = (Writable) ReflectionUtils.newInstance(
- reader.getKeyClass(), conf);
- Writable value = (Writable) ReflectionUtils.newInstance(
- reader.getValueClass(), conf);
- long position = reader.getPosition();
- while (reader.next(key, value)) {
- String syncSeen = reader.syncSeen() ? "*" : "";
- System.out.printf("[%s%s]\t%s\t%s\n", position, syncSeen, key,
- value);
- position = reader.getPosition();
- }
- } finally {
- IOUtils.closeStream(reader);
- }
- }
- }
我这儿设置了一个读取长度的参数,只读到第174个字节那,所以运行结果如下:
- 2017-07-20 22:15:05,089 INFO compress.CodecPool (CodecPool.java:getDecompressor(181)) - Got brand-new decompressor [.deflate]
- [128] 1 One, two, buckle my shoe
- [173] 2 Three, four, shut the door
阅读全文
0 0
- hadoop2 (2017-7-21修改)对SequeneceFile 的(读写)操作
- .net 对xml文件的读写,添加,修改,删除操作
- s3c2410/s3c2440对nandflash的读写操作 (转)
- QTP中实现对文本文件(txt)的读写操作
- Android下对Cookie的读写操作(附Demo)
- python实现对excel表的读写操作(一)
- Android下对Cookie的读写操作(附Demo)
- 对文件的读写操作
- asp.net 对xml文件的读写,添加,修改,删除操作
- asp.net 对xml文件的读写,添加,修改,删除操作
- asp.net 对xml文件的读写,添加,修改,删除操作
- asp.net 对xml文件的读写,添加,修改,删除操作
- asp.net 对xml文件的读写,添加,修改,删除操作
- asp.net 对xml文件的读写,添加,修改,删除操作
- asp.net 对xml文件的读写,添加,修改,删除操作
- asp.net 对xml文件的读写,添加,修改,删除操作
- asp.net 对xml文件的读写,添加,修改,删除操作(转载)
- asp.net 对xml文件的读写,添加,修改,删除操作
- HTML课堂讲义(5)
- Win7系统搭建FTP服务器
- Linux命令行终端提示符实用技巧
- 服务机器人让你拥抱智慧生活
- Mapper 文件编写技巧
- hadoop2 (2017-7-21修改)对SequeneceFile 的(读写)操作
- Java 结构体之 JavaStruct 使用教程<三> JavaStruct 数组进阶
- 中国标准时间转换成DateTime
- POJ 3468 A Simple Problem with Integers (线段树 区间共加)
- 数据立方体----维度与OLAP
- lhgdialog个人整理
- linux命令---sed
- CUDA Samples: Dot Product
- Android开发控件-VarietyImageView(百变ImageView,可以根据要求指定哪个角是圆角,哪个角是直角)