Hadoop: the definitive guide 第三版 拾遗 第三章 之查看文件及正则表达式
来源:互联网 发布:c执行多条sql语句 编辑:程序博客网 时间:2024/05/24 15:37
一、例3-3的read实现:
package com.tht.hdfs;//cc FileSystemDoubleCat Displays files from a Hadoop filesystem on standard output twice, by using seekimport java.net.URI;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.FSDataInputStream;import org.apache.hadoop.fs.FileSystem;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IOUtils;//vv FileSystemDoubleCatpublic class FileSystemDoubleCat {public static void main(String[] args) throws Exception {// String uri = args[0];String uri = "hdfs://121.1.253.251:9000/in/core-site.xml";Configuration conf = new Configuration();FileSystem fs = FileSystem.get(URI.create(uri), conf);FSDataInputStream in = null; byte b[] = new byte[500];try {in = fs.open(new Path(uri));IOUtils.copyBytes(in, System.out, 4096, false);//in.seek(0); // go back to the start of the file//IOUtils.copyBytes(in, System.out, 4096, false);in.read(83,b,10,300);System.out.println(new String(b)); } finally {IOUtils.closeStream(in);}}}// ^^ FileSystemDoubleCat
在第三版英文原版上有如下解释:
FSDataInputStream also implements the PositionedReadable interface for reading parts
of a file at a given offset:public interface PositionedReadable { public int read(long position, byte[] buffer, int offset, int length) throws IOException; public void readFully(long position, byte[] buffer, int offset, int length) throws IOException; public void readFully(long position, byte[] buffer) throws IOException;}
The read() method reads up to length bytes from the given position in the file into the
buffer at the given offset in the buffer. The return value is the number of bytes actually
read; callers should check this value, as it may be less than length.
二、例3-7的实现:
使用指南中的类:RegexExcludePathFilter(不包含)。
//cc RegexExcludePathFilter A PathFilter for excluding paths that match a regular expression
import org.apache.hadoop.fs.Path;import org.apache.hadoop.fs.PathFilter;//vv RegexExcludePathFilterpublic class RegexExcludePathFilter implements PathFilter {private final String regex;public RegexExcludePathFilter(String regex) { this.regex = regex;}public boolean accept(Path path) { return !path.toString().matches(regex);}}//^^ RegexExcludePathFilter
写一个测试类:
import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.FileUtil; import org.apache.hadoop.fs.Path; import java.io.IOException; import java.net.URI; public class GlobStatus { public static void main(String[] args) throws IOException { String uri = "hdfs://121.1.253.251:9000/in/*"; Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(URI.create(uri), conf); FileStatus[] status = fs.globStatus(new Path(uri),new RegexExcludePathFilter("^.*/")); Path[] listedPaths = FileUtil.stat2Paths(status); for (Path p : listedPaths) { System.out.println(p); } } }globnamematches*星号Matches zero or more characters?问号Matches a single character[ab]字符类Matches a single character in the set {a, b}[^ab]非字符类Matches a single character that is not in the set {a, b}[a-b]字符范围Matches a single character in the (closed) range [a, b],
where a is lexicographically less than or equal to b[^a-b] 非字符范围 Matches a single character that is not in the (closed) range [a, b],
where a is lexicographically less than or equal to b{a,b}或选择Matches either expression a or b\c转义字符Matches character c when it is a metacharacter 通配符及其含义
三、一致模型
看下面的例子:
import java.io.OutputStream;import java.net.URI;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.FileSystem;import org.apache.hadoop.fs.Path;public class CoherencyModel {public static void main(String[] args) throws Exception { String uri = "hdfs://121.1.253.251:9000/in/"; Configuration conf=new Configuration(); FileSystem fs=FileSystem.get(URI.create(uri),conf); Path p = new Path(uri+"/p");//如果改为Path p = new Path("p");则输出结果变为hdfs://121.1.253.251:9000/user/hadoop/p OutputStream out = fs.create(p); out.write("content for tht test".getBytes("UTF-8")); out.flush(); out.close();//隐含执行同步方法sync()。 System.out.println(fs.getFileStatus(p).getPath()); }}
输出为:
hdfs://121.1.253.251:9000/in/p
- Hadoop: the definitive guide 第三版 拾遗 第三章 之查看文件及正则表达式
- Hadoop: the definitive guide 第三版 拾遗 第三章 之查看文件及正则表达式
- Hadoop: the definitive guide 第三版 拾遗 第四章 之hadoop本地库
- Hadoop: the definitive guide 第三版 拾遗 第四章 之hadoop本地库
- Hadoop: the definitive guide 第三版 拾遗 第四章 之CompressionCodec
- Hadoop: the definitive guide 第三版 拾遗 第四章 之SequenceFile操作
- Hadoop: the definitive guide 第三版 拾遗 第四章 之MapFile
- Hadoop: the definitive guide 第三版 拾遗 第五章 之MRUnit
- Hadoop: the definitive guide 第三版 拾遗 第十一章 之Pig
- Hadoop: the definitive guide 第三版 拾遗 第十二章 之Hive初步
- Hadoop: the definitive guide 第三版 拾遗 第十二章 之Hive架构
- Hadoop: the definitive guide 第三版 拾遗 第十二章 之Hive分区表、桶
- Hadoop: the definitive guide 第三版 拾遗 第十二章 之HiveQL命令大全
- Hadoop: the definitive guide 第三版 拾遗 第十三章 之HBase起步
- Hadoop: the definitive guide 第三版 拾遗 第十二章 之Hive分区表、桶
- Hadoop: the definitive guide 第三版 拾遗 第四章 之CompressionCodec
- Hadoop: the definitive guide 第三版 拾遗 第四章 之SequenceFile操作
- Hadoop: the definitive guide 第三版 拾遗 第四章 之MapFile
- 获取Android设备唯一标识码 - Serial Number
- Eclipse下设置github开发环境
- hdu(2817)A sequence of numbers
- C#索引器
- utorrent 在ubuntu上怎么使用
- Hadoop: the definitive guide 第三版 拾遗 第三章 之查看文件及正则表达式
- linux命令(20):find命令之exec
- 'ddkbuild.cmd' 不是内部或外部命令,也不是可运行的程序
- JavaScript程序执行顺序问题总结
- oracle sql优化一(转载)
- Filter过滤器+Cookie机制实现网站访问量统计
- linux命令(21):find命令之xargs
- Table之CSS控制Table内外边框,颜色,大小
- 黑马程序员-异常